The documentation and examples within this tutorial were gleaned from the following resources:
tidycensus
tigris
tidyverse
Census Developers
Census Geography Program
Leaflet for R
tidycensus
is an R package that allows users to interface with the U.S. Census Bureau’s decennial census and American Community Survey (ACS) APIs, in order to retrieve demographic and economic data for specified geographies. As noted by its author, Kyle Walker, tidycensus
“returns tidyverse-ready data frames of selected variables, and the option to include simple feature (sf) geometries”.
The main function of the decennial census is to provide counts of people for the purpose of congressional apportionment and federal funding allocation, while the primary purpose of the ACS is to measure the changing social and economic characteristics of the U.S. population, including education, housing, jobs, and more. Due to their differing purposes, variables that exist in the decennial census may not be included in the ACS and vice versa. The same is true between ACS years, due to the evolving nature of the surveys (i.e., questions may be added, removed, or revised).
The decennial census is an enumeration, meaning it aims to count the entire population of the country (at the location where each person usually lives). The decennial census asks a relatively small set of questions of people in homes and group living situations, including how many people live or stay in each home, as well as the sex, age, and race of each person.
ACS data differ from decennial census data in that ACS data are based on an annual sample of households, rather than a complete enumeration. In turn, ACS data points are estimates characterized by a margin of error (MOE). tidycensus
will always return the estimate and MOE for any requested variables. When requesting ACS data with tidycensus
, it is not necessary to specify the “E” or “M” suffix for a variable name. Available survey types include ACS 5-year estimates (acs5
) or ACS 1-year estimates (acs1
). Note that the latter is only available for geographies with populations of 65,000 and greater.
To get started, load the tidycensus
and tidyverse
packages. Additional packages used within this RMarkdown file include tigris, readxl, writexl, arcgisbinding, leaflet, kableExtra, janitor, and extrafont.
A Census API key is also required, which can be obtained from http://api.census.gov/data/key_signup.html. Entry of the key using the census_api_key
function only needs to occur once (i.e., it is tied to RStudio, rather than a single R script or Markdown file).
The following section contains examples of using the tidycensus
package and Census API to retrieve, prepare, reshape, and blend demographic data for analysis and visualization. To save on processing time and avoid local memory limitations, data-intensive code chunks have been commented out (e.g., retrieving large amounts of census blocks using the tidycensus
or tigris
packages). Readers can view the underlying code while the input data is read-in as part of the R Project’s local data.
# Load the 2010 decennial census variables
DC2010_sf1 <- load_variables(year = 2010, dataset = "sf1", cache = TRUE)
# Load the 2014-2018 5-year ACS variables
ACS2018_acs5 <- load_variables(year = 2018, dataset = "acs5", cache = TRUE)
# View a portion of the 2010 decennial census table
DC2010_sf1 %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
name | label | concept |
---|---|---|
H001001 | Total | HOUSING UNITS |
H002001 | Total | URBAN AND RURAL |
H002002 | Total!!Urban | URBAN AND RURAL |
H002003 | Total!!Urban!!Inside urbanized areas | URBAN AND RURAL |
H002004 | Total!!Urban!!Inside urban clusters | URBAN AND RURAL |
H002005 | Total!!Rural | URBAN AND RURAL |
# View a portion of the 2018 ACS table
ACS2018_acs5 %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
name | label | concept |
---|---|---|
B00001_001 | Estimate!!Total | UNWEIGHTED SAMPLE COUNT OF THE POPULATION |
B00002_001 | Estimate!!Total | UNWEIGHTED SAMPLE HOUSING UNITS |
B01001_001 | Estimate!!Total | SEX BY AGE |
B01001_002 | Estimate!!Total!!Male | SEX BY AGE |
B01001_003 | Estimate!!Total!!Male!!Under 5 years | SEX BY AGE |
B01001_004 | Estimate!!Total!!Male!!5 to 9 years | SEX BY AGE |
Retrieve the 2010 decennial census total population for all U.S. states.
# Retrieve the 2010 decennial census total population for all U.S. states.
st_2010_dc <- tidycensus::get_decennial( # API call
geography = 'state', # Specify geography
variables = 'P001001', # Specify variable
year = 2010, # Specify year
geometry = FALSE, # Include or omit spatial geometry
output = 'wide', # Set to wide or tidy/long format
cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)
# View data
st_2010_dc %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
GEOID | NAME | P001001 |
---|---|---|
01 | Alabama | 4779736 |
02 | Alaska | 710231 |
04 | Arizona | 6392017 |
05 | Arkansas | 2915918 |
06 | California | 37253956 |
22 | Louisiana | 4533372 |
Retrieve 2010 decennial census variables and geometries for a select set of a specified geography.
# Retrieve 2010 decennial census variables and geometries for a select set of a specified geography.
co_2010_dc <- tidycensus::get_decennial( # API call
geography = 'county', # Specify geography
state = 'OR', # Specify state abbreviation
county = c('Multnomah', 'Washington', 'Clackamas'), # Specify a list of counties
variables = c(pop_tot = 'P001001', # Specify and name multiple variables
pop_sex_m_tot = 'P012002',
pop_sex_f_tot = 'P012026'),
year = 2010, # Specify year
geometry = TRUE, # Include or omit spatial geometry
output = 'wide', # Set to wide or tidy/long format
cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)
# View data
co_2010_dc %>% st_set_geometry(NULL) %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
GEOID | NAME | pop_tot | pop_sex_m_tot | pop_sex_f_tot |
---|---|---|---|---|
41067 | Washington County, Oregon | 529710 | 260524 | 269186 |
41005 | Clackamas County, Oregon | 375992 | 184925 | 191067 |
41051 | Multnomah County, Oregon | 735334 | 363645 | 371689 |
Retrieve 2018 ACS 5-year variables and geometries for a select set of a specified geography.
Concerning geometries, tidycensus
used the geographic coordinate system NAD 1983 (EPSG: 4269), which is the default for Census spatial data files. tidycensus
uses the Census cartographic boundary shapefiles for faster processing; if you prefer the TIGER/Line shapefiles (i.e., Topologically Integrated Geographic Encoding and Referencing), set cb = FALSE
in the function call. Per Census documentation, the cartographic boundary files are simplified representations of selected geographic areas from the Census Bureau’s Master Address File (MAF)/TIGER geographic database. These boundary files are specifically designed for small scale thematic mapping. When possible, generalization is performed with intent to maintain the hierarchical relationships among geographies and to maintain the alignment of geographies within a file set for a given year. To improve the appearance of shapes, areas are represented with fewer vertices than detailed TIGER/Line equivalents. Some small holes or discontiguous parts of areas are not included in generalized files. Generalized boundary files are clipped to a simplified version of the U.S. outline. As a result, some off-shore areas may be excluded from the generalized files. Consult this TIGER Data Products Guide to determine which file type is best for your purposes.
# Retrieve 2018 ACS 5-year variables and geometries for a select set of a specified geography.
co_2018_acs5 <- tidycensus::get_acs( # API call
geography = 'county', # Specify geography
state = 'OR', # Specify state abbreviation
county = c('Multnomah', 'Washington', 'Clackamas'), # Specify a list of counties
variables = c(hh_medinc_total = 'B19013_001', # Specify and name multiple variables
hh_foodst_total = 'B22003_001',
hh_foodst_rec = 'B22003_002'),
year = 2018, # Specify year
survey = 'acs5', # Specify survey type
geometry = TRUE, # Include or omit spatial geometry
cb = FALSE, # Specify cartographic boundary files or TIGER/Line shapefiles
output = 'wide', # Set to wide or tidy/long format
cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)
# View data
co_2018_acs5 %>% st_set_geometry(NULL) %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
GEOID | NAME | hh_medinc_totalE | hh_medinc_totalM | hh_foodst_totalE | hh_foodst_totalM | hh_foodst_recE | hh_foodst_recM |
---|---|---|---|---|---|---|---|
41005 | Clackamas County, Oregon | 76597 | 1050 | 155456 | 776 | 16694 | 859 |
41051 | Multnomah County, Oregon | 64337 | 779 | 321968 | 1216 | 55018 | 1390 |
41067 | Washington County, Oregon | 78010 | 1086 | 216507 | 1013 | 22213 | 1020 |
Concerning the structure of the data frame, a wide format contains a single row for each observation with many columns representing all variables (human-readable), while a long/tidy format contains many rows per observation (assuming more than one variable) with name-value pairs for each variable and associated value (machine-readable. For more details, see Hadley Wickham’s seminal paper Tidy Data.
Retrieve 2015 ACS 1-year variables and geometries for a select set of a specified geography.
# Retrieve 2015 ACS 1-year variables and geometries for a select set of a specified geography.
co_2015_acs1 <- tidycensus::get_acs( # API call
geography = 'county', # Specify geography
state = 'OR', # Specify state abbreviation
county = c('Multnomah', 'Washington', 'Clackamas'), # Specify a list of counties
variables = c(hh_medinc_total = 'B19013_001', # Specify and name multiple variables
hh_foodst_total = 'B22003_001',
hh_foodst_rec = 'B22003_002'),
year = 2015, # Specify year
survey = 'acs1', # Specify survey type
geometry = T, # Include or omit spatial geometry
# By default, cartographic boundary files are used
output = 'tidy', # Set to wide or tidy/long format
cache_table = T # Cache the table so it can be called quicker in the future, or not to save memory
)
Compare the structure of the data sets.
# View the 2018 ACS 5-year table in wide format
co_2018_acs5 %>% st_set_geometry(NULL) %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
GEOID | NAME | hh_medinc_totalE | hh_medinc_totalM | hh_foodst_totalE | hh_foodst_totalM | hh_foodst_recE | hh_foodst_recM |
---|---|---|---|---|---|---|---|
41005 | Clackamas County, Oregon | 76597 | 1050 | 155456 | 776 | 16694 | 859 |
41051 | Multnomah County, Oregon | 64337 | 779 | 321968 | 1216 | 55018 | 1390 |
41067 | Washington County, Oregon | 78010 | 1086 | 216507 | 1013 | 22213 | 1020 |
# View the 2015 ACS 1-year table in long/tidy format
co_2015_acs1 %>% st_set_geometry(NULL) %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
GEOID | NAME | variable | estimate | moe |
---|---|---|---|---|
41005 | Clackamas County, Oregon | hh_medinc_total | 69629 | 2898 |
41005 | Clackamas County, Oregon | hh_foodst_total | 152414 | 1886 |
41005 | Clackamas County, Oregon | hh_foodst_rec | 17527 | 1737 |
41051 | Multnomah County, Oregon | hh_medinc_total | 59231 | 2100 |
41051 | Multnomah County, Oregon | hh_foodst_total | 311797 | 3172 |
41051 | Multnomah County, Oregon | hh_foodst_rec | 61844 | 2753 |
41067 | Washington County, Oregon | hh_medinc_total | 70447 | 1703 |
41067 | Washington County, Oregon | hh_foodst_total | 211139 | 2446 |
41067 | Washington County, Oregon | hh_foodst_rec | 24253 | 2324 |
Retrieve 2018 ACS 5-year variables and geometries for a geography within a larger geography (e.g., all counties within a state).
# Retrieve 2018 ACS 5-year variables and geometries for all members of a specified geography within a larger geography (e.g., all counties within a state).
county_vector <- tidycensus::fips_codes %>% # Retrieve table of all state/county names and associated FIPS codes
filter(state_name == "Oregon") %>% # Specify state
select(county_code, county) # Maintain only the county code and name variables
# Retrieve variables and geometries for all counties within the vector
co_OR_2018_acs5 <- tidycensus::get_acs( # API call
geography = 'county', # Specify geography
state = 'OR', # Specify state abbreviation
county = county_vector$county_code, # Vector of all counties
variables = c(hh_medinc_total = 'B19013_001', # Specify and name multiple variables
hh_foodst_total = 'B22003_001',
hh_foodst_rec = 'B22003_002'),
year = 2018, # Specify year
survey = 'acs5', # Specify survey type
geometry = TRUE, # Include or omit spatial geometry
cb = TRUE, # Specify cartographic boundary files or TIGER/Line shapefiles
output = 'tidy', # Set to wide or tidy/long format
cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)
# View data
co_OR_2018_acs5 %>% st_set_geometry(NULL) %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
GEOID | NAME | variable | estimate | moe |
---|---|---|---|---|
41005 | Clackamas County, Oregon | hh_medinc_total | 76597 | 1050 |
41005 | Clackamas County, Oregon | hh_foodst_total | 155456 | 776 |
41005 | Clackamas County, Oregon | hh_foodst_rec | 16694 | 859 |
41021 | Gilliam County, Oregon | hh_medinc_total | 42976 | 4505 |
41021 | Gilliam County, Oregon | hh_foodst_total | 848 | 60 |
41021 | Gilliam County, Oregon | hh_foodst_rec | 141 | 38 |
Retrieve a complete table of 2018 ACS 5-year variables and geometries for a select set of a specified geography.
# Retrieve a complete table of 2018 ACS 5-year variables and geometries for a select set of a specified geography.
co_2018_acs5_income <- tidycensus::get_acs( # API call
geography = 'county', # Specify geography
state = 'OR', # Specify state abbreviation
county = c('Multnomah', 'Washington', 'Clackamas'), # Specify a list of counties
table = "B19001",
year = 2018, # Specify year
survey = 'acs5', # Specify survey type
geometry = TRUE, # Include or omit spatial geometry
cb = FALSE, # Specify cartographic boundary files or TIGER/Line shapefiles
output = 'wide', # Set to wide or tidy/long format
cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)
At a time in 2019, retrieving all decennial census block group data for a specified state or county generates an error. This has since been resolved, but it provides a good example of how smaller nested geographies can be aggregated to larger geographies (e.g., blocks to block groups).
This works now…
Retrieve block group data for an entire county.
bg_2010_dc <- tidycensus::get_decennial( # API call
geography = 'block group', # Specify geography for which to retrieve data
state = 'WA', # Specify state abbreviation
county = 'King', # Specify county
variables = 'P001001', # Specify variable
year = 2010, # Specify year
geometry = FALSE, # Include or omit spatial geometry
output = 'wide', # Set to wide or tidy/long format
cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)
But if it didn’t…
Retrieve block data for an entire county.
# Retrieve decennial census variables for all blocks
bl_2010_dc <- tidycensus::get_decennial(
geography = 'block', # Specify statistical area for which to retrieve data
state = 'WA', # Specify state abbreviation
county = 'King', # Specify county
variables = 'P001001',
year = 2010, # Specify year
geometry = FALSE, # Include or omit spatial geometry
output = 'wide', # Set to wide or tidy/long format
cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)
Aggregate data to block groups via creation of the block group GEOID by removing the last 3 characters in the block GEOID and grouping by/summarizing to block groups.
bl_2010_dc_to_bg <- bl_2010_dc %>%
mutate(GEOID_BG = stringr::str_sub(GEOID, 1, -4)) %>% # Create BG GEOID
select(GEOID_BG, 1:length(.), -GEOID, -NAME) %>% # Reorder/remove columns
rename(GEOID = GEOID_BG) %>% # Rename GEOID
gather(variable, est, 2:length(.)) %>% # Reshape from wide to long form by creating name-value pairs for all variables
group_by(GEOID, variable) %>% # Group by GEOID and variable names
summarize(est = sum(est, na.rm = T)) %>% # Sum values per group
ungroup() %>% # Ungroup
spread(variable, est) # Reshape from long to wide form by extending values into their own columns
Compare the data sets.
# View the comparison data set
bgbl_2010_dc %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
GEOID | NAME | P001001.x | P001001.y |
---|---|---|---|
530330001001 | Block Group 1, Census Tract 1, King County, Washington | 1250 | 1250 |
530330001002 | Block Group 2, Census Tract 1, King County, Washington | 1234 | 1234 |
530330001003 | Block Group 3, Census Tract 1, King County, Washington | 1337 | 1337 |
530330001004 | Block Group 4, Census Tract 1, King County, Washington | 1492 | 1492 |
530330001005 | Block Group 5, Census Tract 1, King County, Washington | 942 | 942 |
530330002001 | Block Group 1, Census Tract 2, King County, Washington | 1086 | 1086 |
In some cases, you may want retrieve the tabular and spatial data separately (to avoid very large data sets during analysis) and join the data sets after analysis. In those cases, tidycensus
can be used in combination with tigris
, which is an R package that allows users to directly download and use TIGER/Line shapefiles from the US Census Bureau.
Retrieve geometries for a specified geography for a geography within a larger geography (e.g., all tracts within a county).
tidyensus
to retrieve tabular datatigris
to retrieve spatial geometriesRetrieve variables for a specified geography via the tidycensus
package.
# Create a vector of variables for use in the `tidycensus` call.
DC2010_sf1_var <- c(pop_sex_total = 'P012001',
pop_sex_m_tot = 'P012002',
pop_sex_m_00_04 = 'P012003',
pop_sex_m_05_09 = 'P012004',
pop_sex_m_10_14 = 'P012005',
pop_sex_m_15_17 = 'P012006',
pop_sex_m_18_19 = 'P012007',
pop_sex_f_tot = 'P012026',
pop_sex_f_00_04 = 'P012027',
pop_sex_f_05_09 = 'P012028',
pop_sex_f_10_14 = 'P012029',
pop_sex_f_15_17 = 'P012030',
pop_sex_f_18_19 = 'P012031')
# Retrieve variables for specified geography
tr_2010_dc <- tidycensus::get_decennial( # API call
geography = 'tract', # Specify geography
variables = DC2010_sf1_var, # Vector of variables
state = 'OR', # Specify state abbreviation
county = county_vector$county_code, # Vector of all counties
year = 2010, # Specify year
geometry = F, # Include or omit spatial geometry
output = 'wide', # Set to wide or long/tidy format
cache_table = T # Cache the table so it can be called quicker in the future
)
Retrieve geometries for a specified geography via the tigris
package.
By default tigris
retrieves the most recent vintage of a data set, so specify a different year if-needed. The default coordinate reference system (CRS) is NAD83 (EPSG 4269 https://spatialreference.org/ref/epsg/nad83/). The default type of geometry is TIGER/Line file. If cb is set to TRUE, tigris
will download a generalized (1:500k) set of geometries.
# A call of gc causes a garbage collection to take place. The primary purpose of calling gc is for the report on memory usage. Use this before and after a data-heavy processing task.
gc()
tr_2010_dc_shp <- tigris::tracts("OR", year = 2010, cb = FALSE) # Specify state abbreviation, year, and geometry type
gc()
Join the variables to the geometries.
tr_2010_dc_geo <- tr_2010_dc_shp %>%
inner_join(tr_2010_dc, by = c("GEOID10" = "GEOID")) # Join based on GEOID
class(tr_2010_dc_geo) # Note the object's class
st_crs(tr_2010_dc_geo) # Note the object's coordinate system
Note that the resulting object’s class is dependent on the join order. The left side’s class takes priority; therefore, in the above, the attributes (right side) are being joined to the geometries (left side), so the resulting object class is sf
(simple feature). If the order is flipped (below) and the geometries (right side) are joined to the attributes (let side), the object class is not sf
. To make it so, add %>% st_as_sf()
tr_2010_dc_geo_REVERSE1 <- tr_2010_dc %>%
inner_join(tr_2010_dc_shp, by = c("GEOID" = "GEOID10"))
class(tr_2010_dc_geo_REVERSE1)
st_crs(tr_2010_dc_geo_REVERSE1)
tr_2010_dc_geo_REVERSE2 <- tr_2010_dc %>%
inner_join(tr_2010_dc_shp, by = c("GEOID" = "GEOID10")) %>%
st_as_sf()
class(tr_2010_dc_geo_REVERSE2)
st_crs(tr_2010_dc_geo_REVERSE2)
The following visualizations are created using the ggplot2
package (which is part of the tidyverse
), mapview
package, and leaflet
for R package.
Create a plot of 2010 decennial census state populations.
st_2010_dc %>%
ggplot(aes(x = P001001, y = reorder(NAME, P001001,))) + # Set the x-axis to the variable and the y-axis to the observation (sort in descending order)
ggtitle("2010 State Population") + # Add a title
xlab("Total Population") + # Name the x-axis
ylab("State Name") + # Name the y-axis
geom_point() # Add point geometry
Create a plot of 2014-2018 ACS 5-year estimates and MOEs for all Oregon counties.
co_OR_2018_acs5 %>%
mutate(NAME = gsub(" County, Oregon", "", NAME)) %>% # Rename unnecessary portion of county name
filter(variable == 'hh_medinc_total') %>% # Query a single variable
ggplot(aes(x = estimate, y = reorder(NAME, estimate))) + # Set the x-axis to the variable and the y-axis to the observation (and sort)
geom_errorbarh(aes(xmin = estimate - moe, xmax = estimate + moe)) + # Create error bars
geom_point(color = "red", size = 3) + # Symbolize the points
labs(title = "Median Household Income by County", # Add title
subtitle = "2014-2018 (5-year) American Community Survey", # Add subtitle
y = "County Name", # Label y-axis
x = "ACS estimate (bars represent margin of error)") # Label x-axis
Create a choropleth map of a single variable for a geography within a single county.
# Retrieve variables and geographies for census block groups in Multnomah County, Oregon
bg_Mult_2018_acs5 <- tidycensus::get_acs( # API call
geography = 'block group', # Specify geography
state = 'OR', # Specify state name
county = c('Multnomah'), # Specify county name
variables = c(hh_medinc_total = 'B19013_001'),
year = 2018, # Specify year
survey = 'acs5', # Specify survey type
geometry = T, # Include or omit spatial geometry
cb = FALSE, # Use TIGER
output = 'tidy', # Set to wide or tidy/long format
cache_table = T # Cache the table so it can be called quicker in the future, or not to save memory
)
# Create choropleth map
bg_Mult_2018_acs5 %>%
ggplot(aes(fill = estimate)) + # Specify the aesthetic as fill based on the estimate values
geom_sf(color = NA) + # Specify outline color
coord_sf(crs = 26910) + # Specify coordinate system/projection (e.g., UTM10N)
ggtitle("Multnomah County Median Household Income", subtitle = "2014-2018 (5-year) ACS Data") + # Specify title and subtitle
scale_fill_viridis_c(option = "magma") # Specify fill symbology
Created faceted choropleths maps of multiple variables across geographies within a single county.
As mentioned by Kyle Walker in his tidycensus
tutorial, “one of the most powerful features of ggplot2
is its support for small multiples, which works very well with the tidy data format returned by tidycensus
. Many Census and ACS variables return counts, which are generally inappropriate for choropleth mapping. In turn, get_decennial
and get_acs
have an optional argument, summary_var
, that can work as a multi-group denominator when appropriate.” For example, view the racial/ethnic population distribution within a given county.
# Create a vector of variables
DC2010_sf1_race_vars <- c(White = "P005003",
Black = "P005004",
Asian = "P005006",
Hispanic = "P004003")
# Retrieve variables and geographies for census block groups in Multnomah County, Oregon
bg_Mult_2010_dc <- tidycensus::get_decennial( # API call
geography = 'tract', # Specify geography
state = 'OR', # Specify state name
county = c('Multnomah'), # Specify county name
variables = DC2010_sf1_race_vars, # Specify the vector of variables
year = 2010, # Specify year
geometry = T, # Include or omit spatial geometry
output = 'tidy', # Set to wide or tidy/long format
summary_var = "P001001" # Specify summary variable (e.g., total population)
)
bg_Mult_2010_dc %>%
mutate(pct = 100 * (value / summary_value)) %>% # Calculate the race/ethnicity proportion of the total population
ggplot(aes(fill = pct)) + # Specify the aesthetic as fill based on the proportion values
facet_wrap(~variable) + # Facet wrap on race/ethnicity
geom_sf(color = NA) + # Specify outline color
coord_sf(crs = 26910) + # Specify coordinate system/projection (e.g., UTM10N)
ggtitle("Multnomah County", subtitle = "Population by Race/Ethnicity") + # Specify title and subtitle
scale_fill_viridis_c() # Specify fill symbology
Create an interactive map using leaflet
.
# Use leaflet's color ramp creator
medinc_colors <- leaflet::colorBin(
palette = 'viridis', # This will accept any ColorBrewer color pallette
domain = bg_Mult_2018_acs5$estimate, # Specify the variable
)
leaflet(bg_Mult_2018_acs5) %>%
addProviderTiles(providers$OpenStreetMap, group = 'Open Street Map') %>%
addProviderTiles(providers$Stamen.Toner, group = 'Stamen Toner') %>%
addProviderTiles(providers$CartoDB.Positron, group = 'Carto DB') %>%
addProviderTiles(providers$Esri.NatGeoWorldMap, group = 'Esri NatGeo') %>%
addPolygons(
weight = 1, # Specify outline width
color = 'black', # Specify outline color
fillColor = medinc_colors(bg_Mult_2018_acs5$estimate), # Use created color ramp
fillOpacity = 0.7, # Specify fill opacity
group = "Multnomah Co. Block Groups (2018 ACS 5-year)", # Specify legend item name
popup = paste0( # Create pop-up
"<b>2018 ACS 5-year Block Group</b> ",
paste0("<br><b>Median HH Income Estimate</b>: ", prettyNum(bg_Mult_2018_acs5$estimate, ','), "<br><b>Median HH Income MOE</b>: ", prettyNum(bg_Mult_2018_acs5$moe, ','))
)
) %>%
addLegend("bottomright",
pal = medinc_colors,
values = bg_Mult_2018_acs5$estimate,
title = "Median Household Income",
opacity = 1) %>%
# Layers control
addLayersControl(
overlayGroups = c("Multnomah Co. Block Groups (2018 ACS 5-year)"),
baseGroups = c('Open Street Map', 'Stamen Toner', 'Carto DB', 'Esri NatGeo'),
options = layersControlOptions(collapsed = FALSE)
)
Write-out data to various file formats.
# readr::write_csv(DC2010_sf1, "./Data/DC2010_sf1.csv")
# library(writexl)
# write_xlsx(st_2010_dc, "./Data/st_2010_dc.xlsx")
# saveRDS(bg_Mult_2018_acs5,"./Data/bg_Mult_2018_acs5.rds")
# sf::st_write(bg_Mult_2010_dc,"./Data/bg_Mult_2010_dc.shp", delete_dsn = TRUE)
# library(arcgisbinding)
# arc.check_product() # Initialize connection to ArcGIS
# arc.write(path = "./Data/Data.gdb/co_OR_2018_acs5", data = co_OR_2018_acs5, shape_info = list(type = 'Polygon', WKT = 'GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]'), overwrite = TRUE)
Read-in previously retrieved data in various file formats to avoid duplicative calls to Census APIs.
# readin_csv <- readr::read_csv("./Data/DC2010_sf1.csv")
# library(readxl)
# readin_xlsx <- read_excel("./Data/st_2010_dc.xlsx", sheet = "sheet_name")
# readin_rds <- readRDS("./Data/bg_Mult_2018_acs5.rds")
# readin_shp <- sf::st_read("./Data/bg_Mult_2010_dc.shp")
# library(arcgisbinding)
# arc.check_product() # Initialize connection to ArcGIS
# arc.open reads in the feature class as an object with "formal class arc.feature_impl"
# arc.select converts the data set into a "arc.data" / "data.frame" object
# arc.data2sf converts the data set into an equivalent sf object
# readin_fc <- arc.open("./Data/Data.gdb/co_OR_2018_acs5") %>% arc.select() %>% arc.data2sf()
A work by Alex Brasch