Reference & Startup

The documentation and examples within this tutorial were gleaned from the following resources:

tidycensus
tigris
tidyverse
Census Developers
Census Geography Program
Leaflet for R

tidycensus is an R package that allows users to interface with the U.S. Census Bureau’s decennial census and American Community Survey (ACS) APIs, in order to retrieve demographic and economic data for specified geographies. As noted by its author, Kyle Walker, tidycensus “returns tidyverse-ready data frames of selected variables, and the option to include simple feature (sf) geometries”.

The main function of the decennial census is to provide counts of people for the purpose of congressional apportionment and federal funding allocation, while the primary purpose of the ACS is to measure the changing social and economic characteristics of the U.S. population, including education, housing, jobs, and more. Due to their differing purposes, variables that exist in the decennial census may not be included in the ACS and vice versa. The same is true between ACS years, due to the evolving nature of the surveys (i.e., questions may be added, removed, or revised).

The decennial census is an enumeration, meaning it aims to count the entire population of the country (at the location where each person usually lives). The decennial census asks a relatively small set of questions of people in homes and group living situations, including how many people live or stay in each home, as well as the sex, age, and race of each person.

ACS data differ from decennial census data in that ACS data are based on an annual sample of households, rather than a complete enumeration. In turn, ACS data points are estimates characterized by a margin of error (MOE). tidycensus will always return the estimate and MOE for any requested variables. When requesting ACS data with tidycensus, it is not necessary to specify the “E” or “M” suffix for a variable name. Available survey types include ACS 5-year estimates (acs5) or ACS 1-year estimates (acs1). Note that the latter is only available for geographies with populations of 65,000 and greater.

Install and load packages

To get started, load the tidycensus and tidyverse packages. Additional packages used within this RMarkdown file include tigris, readxl, writexl, arcgisbinding, leaflet, kableExtra, janitor, and extrafont.
A Census API key is also required, which can be obtained from http://api.census.gov/data/key_signup.html. Entry of the key using the census_api_key function only needs to occur once (i.e., it is tied to RStudio, rather than a single R script or Markdown file).

# Uncomment as-needed
# install.packages("tidyverse")
# install.packages("tidycensus")

# library(tidyverse)
# library(tidycensus)

# census_api_key("YOUR API KEY GOES HERE")

Usage

The following section contains examples of using the tidycensus package and Census API to retrieve, prepare, reshape, and blend demographic data for analysis and visualization. To save on processing time and avoid local memory limitations, data-intensive code chunks have been commented out (e.g., retrieving large amounts of census blocks using the tidycensus or tigris packages). Readers can view the underlying code while the input data is read-in as part of the R Project’s local data.

Review Variables

# Load the 2010 decennial census variables
DC2010_sf1 <- load_variables(year = 2010, dataset = "sf1", cache = TRUE)

# Load the 2014-2018 5-year ACS variables
ACS2018_acs5 <- load_variables(year = 2018, dataset = "acs5", cache = TRUE)

# View a portion of the 2010 decennial census table
DC2010_sf1 %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

name	label	concept
H001001	Total	HOUSING UNITS
H002001	Total	URBAN AND RURAL
H002002	Total!!Urban	URBAN AND RURAL
H002003	Total!!Urban!!Inside urbanized areas	URBAN AND RURAL
H002004	Total!!Urban!!Inside urban clusters	URBAN AND RURAL
H002005	Total!!Rural	URBAN AND RURAL

# View a portion of the 2018 ACS table
ACS2018_acs5 %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

name	label	concept
B00001_001	Estimate!!Total	UNWEIGHTED SAMPLE COUNT OF THE POPULATION
B00002_001	Estimate!!Total	UNWEIGHTED SAMPLE HOUSING UNITS
B01001_001	Estimate!!Total	SEX BY AGE
B01001_002	Estimate!!Total!!Male	SEX BY AGE
B01001_003	Estimate!!Total!!Male!!Under 5 years	SEX BY AGE
B01001_004	Estimate!!Total!!Male!!5 to 9 years	SEX BY AGE

Example 1 - Decennial Census

Retrieve the 2010 decennial census total population for all U.S. states.

Define a geography (e.g., states, counties, tracts)
Define a single variable

# Retrieve the 2010 decennial census total population for all U.S. states.
st_2010_dc <- tidycensus::get_decennial( # API call
  geography = 'state', # Specify geography
  variables = 'P001001', # Specify variable
  year = 2010, # Specify year
  geometry = FALSE, # Include or omit spatial geometry
  output = 'wide', # Set to wide or tidy/long format
  cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)

# View data
st_2010_dc %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

GEOID	NAME	P001001
01	Alabama	4779736
02	Alaska	710231
04	Arizona	6392017
05	Arkansas	2915918
06	California	37253956
22	Louisiana	4533372

Example 2 - Decennial Census

Retrieve 2010 decennial census variables and geometries for a select set of a specified geography.

Define a geography and specify a subset
Define and name multiple variables
Retrieve geometries

# Retrieve 2010 decennial census variables and geometries for a select set of a specified geography.
co_2010_dc <- tidycensus::get_decennial( # API call
  geography = 'county', # Specify geography
  state = 'OR', # Specify state abbreviation
  county = c('Multnomah', 'Washington', 'Clackamas'), # Specify a list of counties
  variables = c(pop_tot = 'P001001', # Specify and name multiple variables
                pop_sex_m_tot = 'P012002',
                pop_sex_f_tot = 'P012026'),
  year = 2010, # Specify year
  geometry = TRUE, # Include or omit spatial geometry
  output = 'wide', # Set to wide or tidy/long format
  cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)

# View data
co_2010_dc %>% st_set_geometry(NULL) %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

GEOID	NAME	pop_tot	pop_sex_m_tot	pop_sex_f_tot
41067	Washington County, Oregon	529710	260524	269186
41005	Clackamas County, Oregon	375992	184925	191067
41051	Multnomah County, Oregon	735334	363645	371689

Example 3 - ACS

Retrieve 2018 ACS 5-year variables and geometries for a select set of a specified geography.

Define a geography and specify a subset
Define and name multiple variables
Retrieve geometries, specifically the TIGER/Line shapefiles

Concerning geometries, tidycensus used the geographic coordinate system NAD 1983 (EPSG: 4269), which is the default for Census spatial data files. tidycensus uses the Census cartographic boundary shapefiles for faster processing; if you prefer the TIGER/Line shapefiles (i.e., Topologically Integrated Geographic Encoding and Referencing), set cb = FALSE in the function call. Per Census documentation, the cartographic boundary files are simplified representations of selected geographic areas from the Census Bureau’s Master Address File (MAF)/TIGER geographic database. These boundary files are specifically designed for small scale thematic mapping. When possible, generalization is performed with intent to maintain the hierarchical relationships among geographies and to maintain the alignment of geographies within a file set for a given year. To improve the appearance of shapes, areas are represented with fewer vertices than detailed TIGER/Line equivalents. Some small holes or discontiguous parts of areas are not included in generalized files. Generalized boundary files are clipped to a simplified version of the U.S. outline. As a result, some off-shore areas may be excluded from the generalized files. Consult this TIGER Data Products Guide to determine which file type is best for your purposes.

# Retrieve 2018 ACS 5-year variables and geometries for a select set of a specified geography.
co_2018_acs5 <- tidycensus::get_acs( # API call
  geography = 'county', # Specify geography
  state = 'OR', # Specify state abbreviation
  county = c('Multnomah', 'Washington', 'Clackamas'), # Specify a list of counties
  variables = c(hh_medinc_total = 'B19013_001', # Specify and name multiple variables
                hh_foodst_total = 'B22003_001', 
                hh_foodst_rec = 'B22003_002'),
  year = 2018, # Specify year
  survey = 'acs5', # Specify survey type
  geometry = TRUE, # Include or omit spatial geometry
  cb = FALSE, # Specify cartographic boundary files or TIGER/Line shapefiles
  output = 'wide', # Set to wide or tidy/long format
  cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)

# View data
co_2018_acs5 %>% st_set_geometry(NULL) %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

GEOID	NAME	hh_medinc_totalE	hh_medinc_totalM	hh_foodst_totalE	hh_foodst_totalM	hh_foodst_recE	hh_foodst_recM
41005	Clackamas County, Oregon	76597	1050	155456	776	16694	859
41051	Multnomah County, Oregon	64337	779	321968	1216	55018	1390
41067	Washington County, Oregon	78010	1086	216507	1013	22213	1020

Concerning the structure of the data frame, a wide format contains a single row for each observation with many columns representing all variables (human-readable), while a long/tidy format contains many rows per observation (assuming more than one variable) with name-value pairs for each variable and associated value (machine-readable. For more details, see Hadley Wickham’s seminal paper Tidy Data.

Example 4 - ACS

Retrieve 2015 ACS 1-year variables and geometries for a select set of a specified geography.

Define a geography and specify a subset
Define and name multiple variables
Retrieve geometries, specifically the cartographic boundaries
Output in long/tidy format

# Retrieve 2015 ACS 1-year variables and geometries for a select set of a specified geography.
co_2015_acs1 <- tidycensus::get_acs( # API call
  geography = 'county', # Specify geography
  state = 'OR', # Specify state abbreviation
  county = c('Multnomah', 'Washington', 'Clackamas'), # Specify a list of counties
  variables = c(hh_medinc_total = 'B19013_001', # Specify and name multiple variables
                hh_foodst_total = 'B22003_001', 
                hh_foodst_rec = 'B22003_002'),
  year = 2015, # Specify year
  survey = 'acs1', # Specify survey type
  geometry = T, # Include or omit spatial geometry
  # By default, cartographic boundary files are used
  output = 'tidy', # Set to wide or tidy/long format
  cache_table = T # Cache the table so it can be called quicker in the future, or not to save memory
)

Compare the structure of the data sets.

# View the 2018 ACS 5-year table in wide format
co_2018_acs5 %>% st_set_geometry(NULL) %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

GEOID	NAME	hh_medinc_totalE	hh_medinc_totalM	hh_foodst_totalE	hh_foodst_totalM	hh_foodst_recE	hh_foodst_recM
41005	Clackamas County, Oregon	76597	1050	155456	776	16694	859
41051	Multnomah County, Oregon	64337	779	321968	1216	55018	1390
41067	Washington County, Oregon	78010	1086	216507	1013	22213	1020

# View the 2015 ACS 1-year table in long/tidy format
co_2015_acs1 %>% st_set_geometry(NULL) %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

GEOID	NAME	variable	estimate	moe
41005	Clackamas County, Oregon	hh_medinc_total	69629	2898
41005	Clackamas County, Oregon	hh_foodst_total	152414	1886
41005	Clackamas County, Oregon	hh_foodst_rec	17527	1737
41051	Multnomah County, Oregon	hh_medinc_total	59231	2100
41051	Multnomah County, Oregon	hh_foodst_total	311797	3172
41051	Multnomah County, Oregon	hh_foodst_rec	61844	2753
41067	Washington County, Oregon	hh_medinc_total	70447	1703
41067	Washington County, Oregon	hh_foodst_total	211139	2446
41067	Washington County, Oregon	hh_foodst_rec	24253	2324

Example 5 - ACS

Retrieve 2018 ACS 5-year variables and geometries for a geography within a larger geography (e.g., all counties within a state).

Create a vector of all members of a specified geography (e.g., all counties in Oregon)
Define and name multiple variables
Retrieve geometries, specifically the cartographic boundaries

# Retrieve 2018 ACS 5-year variables and geometries for all members of a specified geography within a larger geography (e.g., all counties within a state).
county_vector <- tidycensus::fips_codes %>% # Retrieve table of all state/county names and associated FIPS codes
  filter(state_name == "Oregon") %>% # Specify state
  select(county_code, county) # Maintain only the county code and name variables

# Retrieve variables and geometries for all counties within the vector
co_OR_2018_acs5 <- tidycensus::get_acs( # API call
  geography = 'county', # Specify geography
  state = 'OR', # Specify state abbreviation
  county = county_vector$county_code, # Vector of all counties
  variables = c(hh_medinc_total = 'B19013_001', # Specify and name multiple variables
                hh_foodst_total = 'B22003_001', 
                hh_foodst_rec = 'B22003_002'),
  year = 2018, # Specify year
  survey = 'acs5', # Specify survey type
  geometry = TRUE, # Include or omit spatial geometry
  cb = TRUE, # Specify cartographic boundary files or TIGER/Line shapefiles
  output = 'tidy', # Set to wide or tidy/long format
  cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)

# View data
co_OR_2018_acs5 %>% st_set_geometry(NULL) %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

GEOID	NAME	variable	estimate	moe
41005	Clackamas County, Oregon	hh_medinc_total	76597	1050
41005	Clackamas County, Oregon	hh_foodst_total	155456	776
41005	Clackamas County, Oregon	hh_foodst_rec	16694	859
41021	Gilliam County, Oregon	hh_medinc_total	42976	4505
41021	Gilliam County, Oregon	hh_foodst_total	848	60
41021	Gilliam County, Oregon	hh_foodst_rec	141	38

Example 6 - ACS

Retrieve a complete table of 2018 ACS 5-year variables and geometries for a select set of a specified geography.

Define a geography and specify a subset
Define a table of variables
Retrieve geometries, specifically the TIGER/Line shapefiles

# Retrieve a complete table of 2018 ACS 5-year variables and geometries for a select set of a specified geography.
co_2018_acs5_income <- tidycensus::get_acs( # API call
  geography = 'county', # Specify geography
  state = 'OR', # Specify state abbreviation
  county = c('Multnomah', 'Washington', 'Clackamas'), # Specify a list of counties
  table = "B19001",
  year = 2018, # Specify year
  survey = 'acs5', # Specify survey type
  geometry = TRUE, # Include or omit spatial geometry
  cb = FALSE, # Specify cartographic boundary files or TIGER/Line shapefiles
  output = 'wide', # Set to wide or tidy/long format
  cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)

Example 7 - ACS

At a time in 2019, retrieving all decennial census block group data for a specified state or county generates an error. This has since been resolved, but it provides a good example of how smaller nested geographies can be aggregated to larger geographies (e.g., blocks to block groups).

This works now…

Retrieve block group data for an entire county.

bg_2010_dc <- tidycensus::get_decennial( # API call
  geography = 'block group', # Specify geography for which to retrieve data
  state = 'WA', # Specify state abbreviation
  county = 'King', # Specify county
  variables = 'P001001', # Specify variable
  year = 2010, # Specify year
  geometry = FALSE, # Include or omit spatial geometry
  output = 'wide', # Set to wide or tidy/long format
  cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)

But if it didn’t…

Retrieve block data for an entire county.

# Retrieve decennial census variables for all blocks
bl_2010_dc <- tidycensus::get_decennial(
  geography = 'block', # Specify statistical area for which to retrieve data
  state = 'WA', # Specify state abbreviation
  county = 'King', # Specify county
  variables = 'P001001',
  year = 2010, # Specify year
  geometry = FALSE, # Include or omit spatial geometry
  output = 'wide', # Set to wide or tidy/long format
  cache_table = TRUE # Cache the table so it can be called quicker in the future, or not to save memory
)

Aggregate data to block groups via creation of the block group GEOID by removing the last 3 characters in the block GEOID and grouping by/summarizing to block groups.

bl_2010_dc_to_bg <- bl_2010_dc %>% 
  mutate(GEOID_BG = stringr::str_sub(GEOID, 1, -4)) %>% # Create BG GEOID
  select(GEOID_BG, 1:length(.), -GEOID, -NAME) %>% # Reorder/remove columns
  rename(GEOID = GEOID_BG) %>% # Rename GEOID
  gather(variable, est, 2:length(.)) %>% # Reshape from wide to long form by creating name-value pairs for all variables
  group_by(GEOID, variable) %>% # Group by GEOID and variable names
  summarize(est = sum(est, na.rm = T)) %>% # Sum values per group
  ungroup() %>% # Ungroup
  spread(variable, est) # Reshape from long to wide form by extending values into their own columns

Compare the data sets.

bgbl_2010_dc <- bg_2010_dc %>%
  inner_join(bl_2010_dc_to_bg, by = 'GEOID')

# View the comparison data set
bgbl_2010_dc %>% head() %>% kable() %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

GEOID	NAME	P001001.x	P001001.y
530330001001	Block Group 1, Census Tract 1, King County, Washington	1250	1250
530330001002	Block Group 2, Census Tract 1, King County, Washington	1234	1234
530330001003	Block Group 3, Census Tract 1, King County, Washington	1337	1337
530330001004	Block Group 4, Census Tract 1, King County, Washington	1492	1492
530330001005	Block Group 5, Census Tract 1, King County, Washington	942	942
530330002001	Block Group 1, Census Tract 2, King County, Washington	1086	1086

Example 8 - ACS and tigris

In some cases, you may want retrieve the tabular and spatial data separately (to avoid very large data sets during analysis) and join the data sets after analysis. In those cases, tidycensus can be used in combination with tigris, which is an R package that allows users to directly download and use TIGER/Line shapefiles from the US Census Bureau.

Retrieve geometries for a specified geography for a geography within a larger geography (e.g., all tracts within a county).

Create a vector of variables
Create a vector of geographies
Use tidyensus to retrieve tabular data
Use tigris to retrieve spatial geometries
Join the tabular and spatial data

Retrieve variables for a specified geography via the tidycensus package.

# Create a vector of variables for use in the `tidycensus` call.
DC2010_sf1_var <- c(pop_sex_total = 'P012001',
                pop_sex_m_tot = 'P012002',
                pop_sex_m_00_04 = 'P012003',
                pop_sex_m_05_09 = 'P012004',
                pop_sex_m_10_14 = 'P012005',
                pop_sex_m_15_17 = 'P012006',
                pop_sex_m_18_19 = 'P012007',
                pop_sex_f_tot = 'P012026',
                pop_sex_f_00_04 = 'P012027',
                pop_sex_f_05_09 = 'P012028', 
                pop_sex_f_10_14 = 'P012029', 
                pop_sex_f_15_17 = 'P012030', 
                pop_sex_f_18_19 = 'P012031')

# Retrieve variables for specified geography
tr_2010_dc <- tidycensus::get_decennial( # API call
geography = 'tract', # Specify geography
variables = DC2010_sf1_var, # Vector of variables
state = 'OR', # Specify state abbreviation
county = county_vector$county_code, # Vector of all counties
year = 2010, # Specify year
geometry = F, # Include or omit spatial geometry
output = 'wide', # Set to wide or long/tidy format
cache_table = T # Cache the table so it can be called quicker in the future
)

Retrieve geometries for a specified geography via the tigris package.

By default tigris retrieves the most recent vintage of a data set, so specify a different year if-needed. The default coordinate reference system (CRS) is NAD83 (EPSG 4269 https://spatialreference.org/ref/epsg/nad83/). The default type of geometry is TIGER/Line file. If cb is set to TRUE, tigris will download a generalized (1:500k) set of geometries.

# A call of gc causes a garbage collection to take place. The primary purpose of calling gc is for the report on memory usage. Use this before and after a data-heavy processing task.
gc() 
tr_2010_dc_shp <- tigris::tracts("OR", year = 2010, cb = FALSE) # Specify state abbreviation, year, and geometry type
gc()

Join the variables to the geometries.

tr_2010_dc_geo <- tr_2010_dc_shp %>%
  inner_join(tr_2010_dc, by = c("GEOID10" = "GEOID")) # Join based on GEOID

class(tr_2010_dc_geo) # Note the object's class
st_crs(tr_2010_dc_geo) # Note the object's coordinate system

Note that the resulting object’s class is dependent on the join order. The left side’s class takes priority; therefore, in the above, the attributes (right side) are being joined to the geometries (left side), so the resulting object class is sf (simple feature). If the order is flipped (below) and the geometries (right side) are joined to the attributes (let side), the object class is not sf. To make it so, add %>% st_as_sf()

tr_2010_dc_geo_REVERSE1 <- tr_2010_dc %>%
  inner_join(tr_2010_dc_shp, by = c("GEOID" = "GEOID10"))

class(tr_2010_dc_geo_REVERSE1)
st_crs(tr_2010_dc_geo_REVERSE1)

tr_2010_dc_geo_REVERSE2 <- tr_2010_dc %>%
  inner_join(tr_2010_dc_shp, by = c("GEOID" = "GEOID10")) %>%
  st_as_sf()

class(tr_2010_dc_geo_REVERSE2)
st_crs(tr_2010_dc_geo_REVERSE2)

Visualization

The following visualizations are created using the ggplot2 package (which is part of the tidyverse), mapview package, and leaflet for R package.

Dot Plot

Create a plot of 2010 decennial census state populations.

st_2010_dc %>%
  ggplot(aes(x = P001001, y = reorder(NAME, P001001,))) + # Set the x-axis to the variable and the y-axis to the observation (sort in descending order)
  ggtitle("2010 State Population") + # Add a title
  xlab("Total Population") + # Name the x-axis
  ylab("State Name") + # Name the y-axis
  geom_point() # Add point geometry

Dot Plot MOEs

Create a plot of 2014-2018 ACS 5-year estimates and MOEs for all Oregon counties.

co_OR_2018_acs5 %>%
  mutate(NAME = gsub(" County, Oregon", "", NAME)) %>% # Rename unnecessary portion of county name
  filter(variable == 'hh_medinc_total') %>% # Query a single variable
  ggplot(aes(x = estimate, y = reorder(NAME, estimate))) +  # Set the x-axis to the variable and the y-axis to the observation (and sort)
  geom_errorbarh(aes(xmin = estimate - moe, xmax = estimate + moe)) + # Create error bars
  geom_point(color = "red", size = 3) + # Symbolize the points
  labs(title = "Median Household Income by County", # Add title
       subtitle = "2014-2018 (5-year) American Community Survey", # Add subtitle
       y = "County Name", # Label y-axis
       x = "ACS estimate (bars represent margin of error)") # Label x-axis

Choropleth Map

Create a choropleth map of a single variable for a geography within a single county.

# Retrieve variables and geographies for census block groups in Multnomah County, Oregon
bg_Mult_2018_acs5 <- tidycensus::get_acs( # API call
  geography = 'block group', # Specify geography
  state = 'OR', # Specify state name
  county = c('Multnomah'), # Specify county name
  variables = c(hh_medinc_total = 'B19013_001'),
  year = 2018, # Specify year
  survey = 'acs5', # Specify survey type
  geometry = T, # Include or omit spatial geometry
  cb = FALSE, # Use TIGER
  output = 'tidy', # Set to wide or tidy/long format
  cache_table = T # Cache the table so it can be called quicker in the future, or not to save memory
)

# Create choropleth map
bg_Mult_2018_acs5 %>%
  ggplot(aes(fill = estimate)) + # Specify the aesthetic as fill based on the estimate values
  geom_sf(color = NA) + # Specify outline color
  coord_sf(crs = 26910) + # Specify coordinate system/projection (e.g., UTM10N)
  ggtitle("Multnomah County Median Household Income", subtitle = "2014-2018 (5-year) ACS Data") + # Specify title and subtitle
  scale_fill_viridis_c(option = "magma") # Specify fill symbology

Faceted Choropleth Map

Created faceted choropleths maps of multiple variables across geographies within a single county.

As mentioned by Kyle Walker in his tidycensus tutorial, “one of the most powerful features of ggplot2 is its support for small multiples, which works very well with the tidy data format returned by tidycensus. Many Census and ACS variables return counts, which are generally inappropriate for choropleth mapping. In turn, get_decennial and get_acs have an optional argument, summary_var, that can work as a multi-group denominator when appropriate.” For example, view the racial/ethnic population distribution within a given county.

# Create a vector of variables 
DC2010_sf1_race_vars <- c(White = "P005003", 
              Black = "P005004", 
              Asian = "P005006", 
              Hispanic = "P004003")

# Retrieve variables and geographies for census block groups in Multnomah County, Oregon
bg_Mult_2010_dc <- tidycensus::get_decennial( # API call
  geography = 'tract', # Specify geography
  state = 'OR', # Specify state name
  county = c('Multnomah'), # Specify county name
  variables = DC2010_sf1_race_vars, # Specify the vector of variables
  year = 2010, # Specify year
  geometry = T, # Include or omit spatial geometry
  output = 'tidy', # Set to wide or tidy/long format
  summary_var = "P001001" # Specify summary variable (e.g., total population)
)

bg_Mult_2010_dc %>%
  mutate(pct = 100 * (value / summary_value)) %>% # Calculate the race/ethnicity proportion of the total population
  ggplot(aes(fill = pct)) + # Specify the aesthetic as fill based on the proportion values
  facet_wrap(~variable) + # Facet wrap on race/ethnicity
  geom_sf(color = NA) + # Specify outline color
  coord_sf(crs = 26910) + # Specify coordinate system/projection (e.g., UTM10N)
  ggtitle("Multnomah County", subtitle = "Population by Race/Ethnicity") + # Specify title and subtitle
  scale_fill_viridis_c() # Specify fill symbology

mapview

Create an unformatted, interactive map using the mapview package.

mapview(co_2018_acs5)

leaflet

Create an interactive map using leaflet.

# Use leaflet's color ramp creator
medinc_colors <- leaflet::colorBin(
  palette = 'viridis', # This will accept any ColorBrewer color pallette
  domain = bg_Mult_2018_acs5$estimate, # Specify the variable
)

leaflet(bg_Mult_2018_acs5) %>% 
  addProviderTiles(providers$OpenStreetMap, group = 'Open Street Map')  %>%
  addProviderTiles(providers$Stamen.Toner, group = 'Stamen Toner')  %>%
  addProviderTiles(providers$CartoDB.Positron, group = 'Carto DB')  %>%
  addProviderTiles(providers$Esri.NatGeoWorldMap, group = 'Esri NatGeo') %>%
  addPolygons(
    weight = 1, # Specify outline width
    color = 'black', # Specify outline color
    fillColor = medinc_colors(bg_Mult_2018_acs5$estimate), # Use created color ramp
    fillOpacity = 0.7, # Specify fill opacity
    group = "Multnomah Co. Block Groups (2018 ACS 5-year)", # Specify legend item name
    popup = paste0( # Create pop-up
      "<b>2018 ACS 5-year Block Group</b> ",
      paste0("<br><b>Median HH Income Estimate</b>: ", prettyNum(bg_Mult_2018_acs5$estimate, ','), "<br><b>Median HH Income MOE</b>: ", prettyNum(bg_Mult_2018_acs5$moe, ','))
    )
  ) %>%
  addLegend("bottomright",
            pal = medinc_colors,
            values = bg_Mult_2018_acs5$estimate,
            title = "Median Household Income",
            opacity = 1) %>%
  # Layers control
  addLayersControl(
    overlayGroups = c("Multnomah Co. Block Groups (2018 ACS 5-year)"),
    baseGroups = c('Open Street Map', 'Stamen Toner', 'Carto DB', 'Esri NatGeo'),
    options = layersControlOptions(collapsed = FALSE)
  )

Output & Input

Write-out data to various file formats.

# readr::write_csv(DC2010_sf1, "./Data/DC2010_sf1.csv")

# library(writexl)
# write_xlsx(st_2010_dc, "./Data/st_2010_dc.xlsx")

# saveRDS(bg_Mult_2018_acs5,"./Data/bg_Mult_2018_acs5.rds")

# sf::st_write(bg_Mult_2010_dc,"./Data/bg_Mult_2010_dc.shp", delete_dsn = TRUE)

# library(arcgisbinding)
# arc.check_product() # Initialize connection to ArcGIS
# arc.write(path = "./Data/Data.gdb/co_OR_2018_acs5", data = co_OR_2018_acs5, shape_info = list(type = 'Polygon', WKT = 'GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137,298.257222101]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]'), overwrite = TRUE)

Read-in previously retrieved data in various file formats to avoid duplicative calls to Census APIs.

# readin_csv <- readr::read_csv("./Data/DC2010_sf1.csv")

# library(readxl)
# readin_xlsx <- read_excel("./Data/st_2010_dc.xlsx", sheet = "sheet_name")

# readin_rds <- readRDS("./Data/bg_Mult_2018_acs5.rds")

# readin_shp  <- sf::st_read("./Data/bg_Mult_2010_dc.shp")

# library(arcgisbinding)
# arc.check_product() # Initialize connection to ArcGIS
# arc.open reads in the feature class as an object with  "formal class  arc.feature_impl"
# arc.select converts the data set into a "arc.data" / "data.frame" object
# arc.data2sf converts the data set into an equivalent sf object
# readin_fc <- arc.open("./Data/Data.gdb/co_OR_2018_acs5") %>% arc.select() %>% arc.data2sf()

A work by Alex Brasch

tidycensus Tutorial

Alex Brasch

10/11/2020