Title: | Working with United States ZIP Code and ZIP Code Tabulation Area Data |
---|---|
Description: | Provides a set of functions for working with American postal codes, which are known as ZIP Codes. These include accessing ZIP Code to ZIP Code Tabulation Area (ZCTA) crosswalks, retrieving demographic data for ZCTAs, and tabulating demographic data for three-digit ZCTAs. |
Authors: | Christopher Prener [aut, cre] |
Maintainer: | Christopher Prener <[email protected]> |
License: | Apache License (>= 2) |
Version: | 0.1.1 |
Built: | 2025-02-26 03:03:50 UTC |
Source: | https://github.com/pfizer-opensource/zipper |
This function takes input ZCTA data and aggregates it to three-digit areas, which are considerably larger. These regions are sometimes used in American health care contexts for publishing geographic identifiers.
zi_aggregate(.data, year, extensive = NULL, intensive = NULL, intensive_method = "mean", survey, output = "tidy", zcta = NULL, key = NULL)
zi_aggregate(.data, year, extensive = NULL, intensive = NULL, intensive_method = "mean", survey, output = "tidy", zcta = NULL, key = NULL)
.data |
A tidy set of demographic data containing one or more variables that should be aggregated to three-digit ZCTAs. This data frame or tibble should contain all five-digit ZCTAs within the three digit ZCTAs that you plan to use for aggregating data. See Details below for formatting requirements. |
year |
A four-digit numeric scalar for year. |
extensive |
A character scalar or vector listing all extensive (i.e. count data) variables you wish to aggregate. These will be summed. For American Community Survey data, the margin of error will be calculated by taking the square root of the summed, squared margins of error for each five-digit ZCTA within a given three-digit ZCTA. |
intensive |
A character scalar or vector listing all intensive (i.e.
ratio, percent, or median data) variables you wish to aggregate. These
will be combined using the approach listed for |
intensive_method |
A character scalar; either |
survey |
A character scalar representing the Census product. It can
be either a Decennial Census product (either |
output |
A character scalar; one of |
zcta |
An optional vector of ZCTAs that demographic data are requested
for. If this is |
key |
A Census API key, which can be obtained at
https://api.census.gov/data/key_signup.html. This can be omitted if
|
A tibble containing all aggregated data requested in either
"tidy"
or "wide"
format.
# load sample demographic data mo22_demos <- zi_mo_pop # the above data can be replicated with the following code: # zi_get_demographics(year = 2022, variables = c("B01003_001", "B19013_001"), # survey = "acs5") # load sample geometric data mo22_zcta3 <- zi_mo_zcta3 # the above data can be replicated with the following code: # zi_get_geometry(year = 2022, style = "zcta3", state = "MO", # method = "intersect") # aggregate a single variable zi_aggregate(mo22_demos, year = 2020, extensive = "B01003_001", survey = "acs5", zcta = mo22_zcta3$ZCTA3) # aggregate multiple variables, outputting wide data zi_aggregate(mo22_demos, year = 2020, extensive = "B01003_001", intensive = "B19013_001", survey = "acs5", zcta = mo22_zcta3$ZCTA3, output = "wide")
# load sample demographic data mo22_demos <- zi_mo_pop # the above data can be replicated with the following code: # zi_get_demographics(year = 2022, variables = c("B01003_001", "B19013_001"), # survey = "acs5") # load sample geometric data mo22_zcta3 <- zi_mo_zcta3 # the above data can be replicated with the following code: # zi_get_geometry(year = 2022, style = "zcta3", state = "MO", # method = "intersect") # aggregate a single variable zi_aggregate(mo22_demos, year = 2020, extensive = "B01003_001", survey = "acs5", zcta = mo22_zcta3$ZCTA3) # aggregate multiple variables, outputting wide data zi_aggregate(mo22_demos, year = 2020, extensive = "B01003_001", intensive = "B19013_001", survey = "acs5", zcta = mo22_zcta3$ZCTA3, output = "wide")
This function converts five-digit ZIP Codes to three-digit ZIP Codes. The first three digits of a ZIP Code are known as the ZIP3 Code, and corresponds to the sectional center facility (SCF) that processes mail for a region.
zi_convert(.data, input_var, output_var)
zi_convert(.data, input_var, output_var)
.data |
A data frame containing a column of five-digit ZIP Codes. |
input_var |
A character scalar specifying the column name with the five-digit ZIP Codes in the data frame. |
output_var |
Optional; A character scalar specifying the column name to store the three-digit ZIP Codes in the data frame. |
A tibble containing the original data frame with a new column of three-digit ZIP Codes.
# add new column ## create sample data df <- data.frame(id = c(1:3), zip5 = c("63005", "63139", "63636")) ## convert ZIP Codes to ZIP3, creating a new column zi_convert(.data = df, input_var = zip5, output_var = zip3) # overwrite existing column ## create sample data df <- data.frame(id = c(1:3), zip = c("63005", "63139", "63636")) ## convert ZIP Codes to ZIP3, creating a new column zi_convert(.data = df, input_var = zip)
# add new column ## create sample data df <- data.frame(id = c(1:3), zip5 = c("63005", "63139", "63636")) ## convert ZIP Codes to ZIP3, creating a new column zi_convert(.data = df, input_var = zip5, output_var = zip3) # overwrite existing column ## create sample data df <- data.frame(id = c(1:3), zip = c("63005", "63139", "63636")) ## convert ZIP Codes to ZIP3, creating a new column zi_convert(.data = df, input_var = zip)
This function compares input data containing ZIP Codes with a crosswalk file that will append ZCTAs. This is an important step because not all ZIP Codes have the same five digits as their enclosing ZCTA.
zi_crosswalk(.data, input_var, zip_source = "UDS", source_var, source_result, year = NULL, qtr = NULL, target = NULL, query = NULL, by = NULL, return_max = NULL, key = NULL, return = "id")
zi_crosswalk(.data, input_var, zip_source = "UDS", source_var, source_result, year = NULL, qtr = NULL, target = NULL, query = NULL, by = NULL, return_max = NULL, key = NULL, return = "id")
.data |
An "input object" that is data.frame or tibble that contains ZIP Codes to be crosswalked. |
input_var |
The column in the input data that contains five-digit ZIP Codes. If the input is numeric, it will be transformed to character data and leading zeros will be added. |
zip_source |
Required character scalar or data frame; specifies the
source of ZIP Code crosswalk data. This can be one of either |
source_var |
Character scalar, required when |
source_result |
Character scalar, required when |
year |
Optional four-digit numeric scalar for year; varies based on source.
For |
qtr |
Numeric scalar, required when |
target |
Character scalar, required when |
query |
Scalar or vector, required when |
by |
Character scalar, required when |
return_max |
Logical scalar, required when |
key |
Optional when |
return |
Character scalar, specifies the type of output to return. Can be
one of |
A tibble
with crosswalk values (or optionally, the full
crosswalk file) appended based on the return
argument.
# create sample data df <- data.frame(id = c(1:3), zip5 = c("63005", "63139", "63636")) # UDS crosswalk zi_crosswalk(df, input_var = zip5, zip_source = "UDS", year = 2022) # HUD crosswalk # you will need to replace INSERT_HUD_KEY with your own key ## Not run: zi_crosswalk(df, input_var = zip5, zip_source = "HUD", year = 2023, qtr = 1, target = "COUNTY", query = "MO", by = "residential", return_max = TRUE, key = INSERT_HUD_KEY) ## End(Not run) # custom dictionary ## load sample crosswalk data to simulate custom dictionary mo_xwalk <- zi_mo_hud # prep crosswalk # when a ZIP Code crosses county boundaries, the portion with the largest # number of residential addresses will be returned mo_xwalk <- zi_prep_hud(mo_xwalk, by = "residential", return_max = TRUE) ## crosswalk zi_crosswalk(df, input_var = zip5, zip_source = mo_xwalk, source_var = zip5, source_result = geoid)
# create sample data df <- data.frame(id = c(1:3), zip5 = c("63005", "63139", "63636")) # UDS crosswalk zi_crosswalk(df, input_var = zip5, zip_source = "UDS", year = 2022) # HUD crosswalk # you will need to replace INSERT_HUD_KEY with your own key ## Not run: zi_crosswalk(df, input_var = zip5, zip_source = "HUD", year = 2023, qtr = 1, target = "COUNTY", query = "MO", by = "residential", return_max = TRUE, key = INSERT_HUD_KEY) ## End(Not run) # custom dictionary ## load sample crosswalk data to simulate custom dictionary mo_xwalk <- zi_mo_hud # prep crosswalk # when a ZIP Code crosses county boundaries, the portion with the largest # number of residential addresses will be returned mo_xwalk <- zi_prep_hud(mo_xwalk, by = "residential", return_max = TRUE) ## crosswalk zi_crosswalk(df, input_var = zip5, zip_source = mo_xwalk, source_var = zip5, source_result = geoid)
This function returns demographic data for five-digit ZIP Code Tabulation Areas (ZCTAs), which are rough approximations of many (but not all) USPS ZIP codes.
zi_get_demographics(year, variables = NULL, table = NULL, survey, output = "tidy", zcta = NULL, key = NULL)
zi_get_demographics(year, variables = NULL, table = NULL, survey, output = "tidy", zcta = NULL, key = NULL)
year |
A four-digit numeric scalar for year. |
variables |
A character scalar or vector of variable IDs. |
table |
A character scalar of a table ID (only one table may be requested per call). |
survey |
A character scalar representing the Census product. It can
be either a Decennial Census product (either |
output |
A character scalar; one of |
zcta |
An optional vector of ZCTAs that demographic data are requested
for. If this is |
key |
A Census API key, which can be obtained at
https://api.census.gov/data/key_signup.html. This can be omitted if
|
A tibble containing all demographic data requested in either
"tidy"
or "wide"
format.
# download all ZCTAs zi_get_demographics(year = 2012, variables = "B01003_001", survey = "acs5") # limit output to subset of ZCTAs ## download all ZCTAs in Missouri, intersects method mo20 <- zi_get_geometry(year = 2020, state = "MO", method = "intersect") ## download demographic data zi_get_demographics(year = 2012, variables = "B01003_001", survey = "acs5", zcta = mo20$GEOID)
# download all ZCTAs zi_get_demographics(year = 2012, variables = "B01003_001", survey = "acs5") # limit output to subset of ZCTAs ## download all ZCTAs in Missouri, intersects method mo20 <- zi_get_geometry(year = 2020, state = "MO", method = "intersect") ## download demographic data zi_get_demographics(year = 2012, variables = "B01003_001", survey = "acs5", zcta = mo20$GEOID)
This function returns geometric data for ZIP Code Tabulation
Areas (ZCTAs), which are rough approximations of many (but not all)
USPS ZIP codes. Downloading and processing these data will be heavily
affected by your internet connection, your choice for the cb
argument, and the processing power of your computer (if you select
specific counties).
zi_get_geometry (year, style = "zcta5", return = "id", class = "sf", state = NULL, county = NULL, territory = NULL, cb = FALSE, starts_with = NULL, includes = NULL, excludes = NULL, method, shift_geo = FALSE)
zi_get_geometry (year, style = "zcta5", return = "id", class = "sf", state = NULL, county = NULL, territory = NULL, cb = FALSE, starts_with = NULL, includes = NULL, excludes = NULL, method, shift_geo = FALSE)
year |
A four-digit numeric scalar for year. |
style |
A character scalar - either |
return |
A character scalar; if |
class |
A character scalar; if |
state |
A character scalar or vector with character state abbreviations
(e.x. |
county |
A character scalar or vector with character GEOIDs (e.x.
|
territory |
A character scalar or vector with character territory abbreviations
(e.x. |
cb |
A logical scalar; if This argument does not apply to |
starts_with |
A character scalar or vector containing the first two
digits of a GEOID or ZCTA3 value to return. It defaults to |
includes |
A character scalar or vector containing GEOID's or ZCTA3 values
to include when finalizing output. This may be necessary depending on what
is identified with the |
excludes |
A character scalar or vector containing GEOID's or ZCTA3 values
to exclude when finalizing output. This may be necessary depending on what
is identified with the |
method |
A character scalar - either |
shift_geo |
A logical scalar; if |
This function contains options for both the type of ZCTA and,
optionally, for how state and county data are identified. For type,
either five-digit or three-digit ZCTA geometries are available. The
three-digit ZCTAs were created by geoprocessing the five-digit boundaries
for each year, and then applying a modest amount of simplification
(with sf::st_simplify()
) to reduce file size. The source files
are available on GitHub at https://github.com/chris-prener/zcta3.
Since ZCTAs cross state lines, two methods are used to create these
geometry data for years 2012 and beyond for states and all years for counties.
The "intersect"
method will return ZCTAs that border the states or
counties selected. In most cases, this will result in more ZCTAs being
returned than are actually within the states or counties selected.
Conversely, the "centroid"
method will return only ZCTAs whose
centroids (geographical centers) lie within the states or counties named.
In most cases, this will return fewer ZCTAs than actually lie within the
states or counties selected. Users will need to review their data carefully
and will likely need to use the include
and exclude
arguments
to finalize the geographies returned.
For state-level data in 2010 and 2011, the Census Bureau published individual
state files that will be utilized automatically by zippeR
. If
county-level data are requested for these years, the state-specific file
will be used as a base before identifying ZCTAs within counties using
either the "intersect"
or "centroid"
method described above.
A sf
object with ZCTAs matching the parameters specified above:
either a nationwide file, a specific state or states, or a specific
county or counties.
# five-digit ZCTAs ## download all ZCTAs for 2020 including territories zi_get_geometry(year = 2020, territory = c("AS", "GU", "MP", "PR", "VI"), shift_geo = TRUE) ## download all ZCTAs for 2020 excluding territories zi_get_geometry(year = 2020, shift_geo = TRUE) ## download all ZCTAs in a selection of states, intersects method zi_get_geometry(year = 2020, state = c("IA", "IL", "MO"), method = "intersect") ## download all ZCTAs in a single county - St. Louis City, MO zi_get_geometry(year = 2020, state = "MO", county = "29510", method = "intersect") # three-digit ZCTAs ## download all ZCTAs for 2018 including territories zi_get_geometry(year = 2018, territory = c("AS", "GU", "MP", "PR", "VI"), shift_geo = TRUE)
# five-digit ZCTAs ## download all ZCTAs for 2020 including territories zi_get_geometry(year = 2020, territory = c("AS", "GU", "MP", "PR", "VI"), shift_geo = TRUE) ## download all ZCTAs for 2020 excluding territories zi_get_geometry(year = 2020, shift_geo = TRUE) ## download all ZCTAs in a selection of states, intersects method zi_get_geometry(year = 2020, state = c("IA", "IL", "MO"), method = "intersect") ## download all ZCTAs in a single county - St. Louis City, MO zi_get_geometry(year = 2020, state = "MO", county = "29510", method = "intersect") # three-digit ZCTAs ## download all ZCTAs for 2018 including territories zi_get_geometry(year = 2018, territory = c("AS", "GU", "MP", "PR", "VI"), shift_geo = TRUE)
This function appends information about the city (for five-digit ZIP Codes) or area (for three-digit ZIP Codes) to a data frame containing these values. State is returned for both types of ZIP Codes. The function also optionally returns data on Sectional Center Facilities (SCFs) for three-digit ZIP Codes.
zi_label(.data, input_var, label_source = "UDS", source_var, type = "zip5", include_scf = FALSE, vintage = 2022)
zi_label(.data, input_var, label_source = "UDS", source_var, type = "zip5", include_scf = FALSE, vintage = 2022)
.data |
An "input object" that is data.frame or tibble that contains ZIP Codes to be crosswalked. |
input_var |
The column in the input data that contains five-digit ZIP Codes. If the input is numeric, it will be transformed to character data and leading zeros will be added. |
label_source |
Required character scalar or data frame; specifies the
source of the label data. This could be either |
source_var |
Character scalar, required when |
type |
Character scalar, required when |
include_scf |
A logical scalar required when |
vintage |
Character or numeric scalar, required when |
Labels are approximations of the actual location of a ZIP Code. For five-digit ZIP Codes, the city and state may or may not correspond to an individuals' mailing address city (since multiple cities may be accepted as valid by USPS for a particular ZIP Code) or state (since ZIP Codes may cross state lines).
For three-digit ZIP Codes, the area and state may or may not correspond to
an individuals' mailing address state (since SCFs cover multiple states).
For example, the three digit ZIP Code 010
covers Western Massachusetts
in practice, but is assigned to the state of Connecticut.
A tibble containing the original data with additional columns from the selected label data set appended.
# create sample data df <- data.frame( id = c(1:3), zip5 = c("63005", "63139", "63636"), zip3 = c("630", "631", "636") ) # UDS crosswalk zi_label(df, input_var = zip5, label_source = "UDS", vintage = 2022) # USPS crosswalk zi_label(df, input_var = zip3, label_source = "USPS", type = "zip3", vintage = 202408) # custom dictionary ## load sample ZIP3 label data to simulate custom dictionary mo_label <- zi_mo_usps ## label zi_label(df, input_var = zip3, label_source = mo_label, source_var = zip3, type = "zip3")
# create sample data df <- data.frame( id = c(1:3), zip5 = c("63005", "63139", "63636"), zip3 = c("630", "631", "636") ) # UDS crosswalk zi_label(df, input_var = zip5, label_source = "UDS", vintage = 2022) # USPS crosswalk zi_label(df, input_var = zip3, label_source = "USPS", type = "zip3", vintage = 202408) # custom dictionary ## load sample ZIP3 label data to simulate custom dictionary mo_label <- zi_mo_usps ## label zi_label(df, input_var = zip3, label_source = mo_label, source_var = zip3, type = "zip3")
This function returns a vector of GEOIDs that represent ZCTAs in and around states, depending on the method selected. The two methods included described in Details below.
zi_list_zctas(year, state, method)
zi_list_zctas(year, state, method)
year |
A four-digit numeric scalar for year. |
state |
A scalar or vector with state abbreviations (e.x. |
method |
A character scalar - either |
Since ZCTAs cross state lines, two methods are used to create these
vectors. The "intersect"
method will return ZCTAs that border
the state selected. In most cases, this will result in more ZCTAs
being returned than are actually within the states(s) named in the
state
argument. Conversely, the "centroid"
method will
return only ZCTAs whose centroids (geographical centers) lie within the
states named. In most cases, this will return fewer ZCTAs than
actually lie within the state selected. Users will need to review
their data carefully and, when using other zipperR
functions,
will likely need to use the include
and exclude
arguments
to finalize the geographies returned.
A vector of GEOIDs representing ZCTAs in and around the state selected.
# Missouri ZCTAs, intersect method ## return list mo_zctas <- zi_list_zctas(year = 2021, state = "MO", method = "intersect") ## preview ZCTAs mo_zctas[1:10] # Missouri ZCTAs, centroid method ## return list mo_zctas <- zi_list_zctas(year = 2021, state = "MO", method = "centroid") ## preview ZCTAs mo_zctas[1:10]
# Missouri ZCTAs, intersect method ## return list mo_zctas <- zi_list_zctas(year = 2021, state = "MO", method = "intersect") ## preview ZCTAs mo_zctas[1:10] # Missouri ZCTAs, centroid method ## return list mo_zctas <- zi_list_zctas(year = 2021, state = "MO", method = "centroid") ## preview ZCTAs mo_zctas[1:10]
Spatial data on USPS ZIP Codes are not published by the U.S. Postal Service or the U.S. Census Bureau. Instead, ZIP Codes can be converted to a variety of Census Bureau geographies using crosswalk files. This function reads in ZIP Code to ZIP Code Tabulation Area (ZCTA) crosswalk files from the former UDS Mapper project, which was sunset by the American Academy of Family Physicians in early 2024. It also provides access to the U.S. Department of Housing and Urban Development's ZIP Code crosswalk files, which provide similar functionality for converting ZIP Codes to a variety of geographies including counties.
zi_load_crosswalk(zip_source = "UDS", year, qtr = NULL, target = NULL, query = NULL, key = NULL)
zi_load_crosswalk(zip_source = "UDS", year, qtr = NULL, target = NULL, query = NULL, key = NULL)
zip_source |
Required character scalar; specifies the source of ZIP Code
crosswalk data. This can be one of either |
year |
Required four-digit numeric scalar for year; varies based on source.
For |
qtr |
Numeric scalar, required when |
target |
Character scalar, required when |
query |
Scalar or vector, required when |
key |
Optional when |
A tibble containing the crosswalk file.
# former UDS mapper crosswalks zi_load_crosswalk(zip_source = "UDS", year = 2020) ## Not run: # HUD crosswalks # you will need to replace INSERT_HUD_KEY with your own key ## ZIP Code to CBSA crosswalk for all ZIP Codes zi_load_crosswalk(zip_source = "HUD", year = 2023, qtr = 1, target = "CBSA", query = "all", key = INSERT_HUD_KEY) ## ZIP Code to County crosswalk for all ZIP Codes in Missouri zi_load_crosswalk(zip_source = "HUD", year = 2023, qtr = 1, target = "COUNTY", query = "MO", key = INSERT_HUD_KEY) ## ZIP Code to Tract crosswalk for ZIP Code 63139 in St. Louis City zi_load_crosswalk(zip_source = "HUD", year = 2023, qtr = 1, target = "TRACT", query = 63139, key = INSERT_HUD_KEY) ## End(Not run)
# former UDS mapper crosswalks zi_load_crosswalk(zip_source = "UDS", year = 2020) ## Not run: # HUD crosswalks # you will need to replace INSERT_HUD_KEY with your own key ## ZIP Code to CBSA crosswalk for all ZIP Codes zi_load_crosswalk(zip_source = "HUD", year = 2023, qtr = 1, target = "CBSA", query = "all", key = INSERT_HUD_KEY) ## ZIP Code to County crosswalk for all ZIP Codes in Missouri zi_load_crosswalk(zip_source = "HUD", year = 2023, qtr = 1, target = "COUNTY", query = "MO", key = INSERT_HUD_KEY) ## ZIP Code to Tract crosswalk for ZIP Code 63139 in St. Louis City zi_load_crosswalk(zip_source = "HUD", year = 2023, qtr = 1, target = "TRACT", query = 63139, key = INSERT_HUD_KEY) ## End(Not run)
This function loads a specific label data set that can be used to label five or three-digit ZIP codes in a data frame.
zi_load_labels(source = "UDS", type = "zip5", include_scf = FALSE, vintage = 2022)
zi_load_labels(source = "UDS", type = "zip5", include_scf = FALSE, vintage = 2022)
source |
A required character scalar; specifies the source of the label
data. The only supported sources are |
type |
A required character scalar; one of either |
include_scf |
A logical scalar required when |
vintage |
A required character or numeric scalar; specifying the date
for |
Labels are approximations of the actual location of a ZIP Code. For five-digit ZIP Codes, the city and state may or may not correspond to an individuals' mailing address city (since multiple cities may be accepted as valid by USPS for a particular ZIP Code) or state (since ZIP Codes may cross state lines).
For three-digit ZIP Codes, the area and state may or may not correspond to
an individuals' mailing address state (since SCFs cover multiple states).
For example, the three digit ZIP Code 010
covers Western Massachusetts
in practice, but is assigned to the state of Connecticut.
A tibble with the specified label data for either five or three-digit ZIP Codes.
# zip5 labels via UDS zi_load_labels(source = "UDS", type = "zip5", vintage = 2022) # zip3 labels via USPS zi_load_labels(source = "USPS", type = "zip3", vintage = 202408)
# zip5 labels via UDS zi_load_labels(source = "UDS", type = "zip5", vintage = 2022) # zip3 labels via USPS zi_load_labels(source = "USPS", type = "zip3", vintage = 202408)
This function loads a list of available label data sets that can be used to label ZIP Codes. Currently, only three-digit ZIP Codes are supported.
zi_load_labels_list(type = "zip3")
zi_load_labels_list(type = "zip3")
type |
A character scalar specifying the type of label data to load. The
only supported type is |
A tibble containing date values that can be used with zi_load_labels
.
zi_load_labels_list(type = "zip3")
zi_load_labels_list(type = "zip3")
A tibble containing the HUD ZIP Code to County Crosswalk file for Missouri's ZIP Codes in 2023's first quarter.
data(zi_mo_hud)
data(zi_mo_hud)
A data frame with 1749 rows and 8 variables:
five-digit United States Postal Service ZIP Code
five-digit county FIPS code
for ZIP Codes that cross county boundaries, the proportion of the ZIP Code's residential customers in the given county
for ZIP Codes that cross county boundaries, the proportion of the ZIP Code's commercial customers in the given county
for ZIP Codes that cross county boundaries, the proportion of the ZIP Code's other customers in the given county
for ZIP Codes that cross county boundaries, the proportion of the ZIP Code's total customers in the given county
United States Postal Service city name
United States Postal Service state abbreviation
The data included in zi_mo_hud
can be replicated with the
following code: zi_load_crosswalk(zip_source = "HUD", year = 2023,
qtr = 1, target = "COUNTY", query = "MO")
. This assumes your HUD API key
is stored in your .Rprofile
file as hud_key
.
U.S. Department of Housing and Urban Development's ZIP Code crosswalk files
utils::str(zi_mo_hud) utils::head(zi_mo_hud)
utils::str(zi_mo_hud) utils::head(zi_mo_hud)
A tibble containing the total population and median household income estimates from the 2018-2022 5-year U.S. Census Bureau American Communiy Survey estimates for Missouri five-digit ZIP Code Tabulation Areas (ZCTAs).
data(zi_mo_pop)
data(zi_mo_pop)
A data frame with 2664 rows and 4 variables:
full GEOID string
variable, either B01003_001
(total population) or
B19013_001
(median household income)
value for associated variable
margin of error for associated variable
The data included in zi_mo_pop
can be replicated with the
following code: zi_get_demographics(year = 2022,
variables = c("B01003_001", "B19013_001"), survey = "acs5")
.
U.S. Census Bureau American Community Survey
utils::str(zi_mo_pop) utils::head(zi_mo_pop)
utils::str(zi_mo_pop) utils::head(zi_mo_pop)
A tibble containing the USPS Three-digit ZIP Code labels for August 2024.
data(zi_mo_usps)
data(zi_mo_usps)
A data frame with 37 rows and 3 variables:
three-digit United States Postal Service ZIP Code
area associated with the three-digit ZIP Code
state associated with the three-digit ZIP Code
The data included in zi_mo_usps
can be replicated with the
following code: zi_load_labels(type = "zip3", source = "USPS",
vintage = 202408)
. After downloading the data, subset to
label_state == "MO"
.
U.S. Postal Service Facility Access and Shipment Tracking (FAST) Database
utils::str(zi_mo_usps) utils::head(zi_mo_usps)
utils::str(zi_mo_usps) utils::head(zi_mo_usps)
A simple features data set containing the geometric data for Missouri's three-digit ZIP Code Tabulation Areas (ZCTAs) for 2022, derived from the U.S. Census Bureau's 2022 TIGER/Line shapefiles.
data(zi_mo_zcta3)
data(zi_mo_zcta3)
A data frame with 31 rows and 2 variables:
three-digit ZCTA value
simple features geometry
The data included in zi_mo_zcta3
can be replicated with the
following code: zi_get_geometry(year = 2022, style = "zcta3",
state = "MO", method = "intersect")
.
U.S. Census Bureau's TIGER/Line database
utils::str(zi_mo_zcta3) utils::head(zi_mo_zcta3)
utils::str(zi_mo_zcta3) utils::head(zi_mo_zcta3)
The output from zi_load_crosswalk()
for HUD data requires
additional processing to be used in the zi_crosswalk()
function.
This function prepares the HUD data for use in joins.
zi_prep_hud(.data, by, return_max = TRUE)
zi_prep_hud(.data, by, return_max = TRUE)
.data |
The output from |
by |
Character scalar; the column name to use for identifying the best
match for a given ZIP Code. This could be either |
return_max |
Logical scalar; if |
A tibble that has been further prepared for use as a crosswalk.
# load sample crosswalk data mo_xwalk <- zi_mo_hud # the above data can be replicated with the following code: # zi_load_crosswalk(zip_source = "HUD", year = 2023, qtr = 1, # target = "COUNTY", query = "MO") # prep crosswalk # when a ZIP Code crosses county boundaries, the portion with the largest # number of residential addresses will be returned zi_prep_hud(mo_xwalk, by = "residential", return_max = TRUE)
# load sample crosswalk data mo_xwalk <- zi_mo_hud # the above data can be replicated with the following code: # zi_load_crosswalk(zip_source = "HUD", year = 2023, qtr = 1, # target = "COUNTY", query = "MO") # prep crosswalk # when a ZIP Code crosses county boundaries, the portion with the largest # number of residential addresses will be returned zi_prep_hud(mo_xwalk, by = "residential", return_max = TRUE)
This function repairs two of the four conditions identified
in the validation checks with zi_validate()
. For the other two
conditions, values are conveted NA
. See Details below for the
specific changes made.
zi_repair(x, style = "zcta5")
zi_repair(x, style = "zcta5")
x |
A vector containing ZIP or ZCTA values to be repaired. |
style |
A character scalar - either |
The zi_repair()
function addresses four conditions:
If the input vector is numeric, it will be converted to character data.
If there are values less than five characters (if style = "zcta5"
,
the default), or three characters (if style = "zcta3"
), they will
be padded with leading zeros.
If there are input values over five characters (if style = "zcta5"
,
the default), or three characters (if style = "zcta3"
), they will
be converted to NA
.
If there are input values that have non-numeric characters, they will
be converted to NA
.
Since two of the four steps will result in NA
values, it is strongly
recommended to attempt to manually fix these issues first.
A repaired vector of ZIP or ZCTA values.
# sample five-digit ZIPs with character zips <- c("63088", "63108", "zip") # failed validation zi_validate(zips) # repair zips <- zi_repair(zips) # successful validation zi_validate(zips)
# sample five-digit ZIPs with character zips <- c("63088", "63108", "zip") # failed validation zi_validate(zips) # repair zips <- zi_repair(zips) # successful validation zi_validate(zips)
This function validates vectors of ZIP Code or ZCTA values. It
is used internally throughout zippeR
for data validation, but
is exported to facilitate troubleshooting.
zi_validate(x, style = "zcta5", verbose = FALSE)
zi_validate(x, style = "zcta5", verbose = FALSE)
x |
A vector containing ZIP or ZCTA values to be validated. |
style |
A character scalar - either |
verbose |
A logical scalar; if |
The zi_validate()
function checks for four conditions:
Is the input vector character data? This is important because of USPS's use of leading zeros in ZIP codes and ZCTAs.
Are all values five characters (if style = "zcta5"
, the default),
or three characters (if style = "zcta3"
)?
Are any input values over five characters (if style = "zcta5"
,
the default), or three characters (if style = "zcta3"
)?
Do any input values have non-numeric characters?
The questions provide a basis for repairing issues identified with
zi_repair()
.
Either a logical value (if verbose = FALSE
) or a tibble
containing validation criteria and results.
# sample five-digit ZIPs zips <- c("63088", "63108", "63139") # successful validation zi_validate(zips) # sample five-digit ZIPs in data frame zips <- data.frame(id = c(1:3), ZIP = c("63139", "63108", "00501"), stringsAsFactors = FALSE) # successful validation zi_validate(zips$ZIP) # sample five-digit ZIPs with character zips <- c("63088", "63108", "zip") # failed validation zi_validate(zips) zi_validate(zips, verbose = TRUE)
# sample five-digit ZIPs zips <- c("63088", "63108", "63139") # successful validation zi_validate(zips) # sample five-digit ZIPs in data frame zips <- data.frame(id = c(1:3), ZIP = c("63139", "63108", "00501"), stringsAsFactors = FALSE) # successful validation zi_validate(zips$ZIP) # sample five-digit ZIPs with character zips <- c("63088", "63108", "zip") # failed validation zi_validate(zips) zi_validate(zips, verbose = TRUE)