Here’s my first followup to my 7/16/2020 post on using tidycensus in a post-American
Factfinder era.
Attached to this e-mail is a short text file (“r” suffix) that can be edited for your
use.
Example #0. Setting up tidycensus.
This is an introduction to the use of the R-package tidycensus in extracting data from the
US Census Bureau’s American Community Survey. I’m adding snippets of R code from my
R-scripts, and attaching the full r-script to this message.
First things first: Acquaint yourself with the American Community Survey. What I would
strongly recommend is to download and print out copies of the various ACS survey
questionnaires. Know what was asked!
Decennial Census questionnaires:
https://www.census.gov/history/www/through_the_decades/questionnaires/
<https://www.census.gov/history/www/through_the_decades/questionnaires/>
American Community Survey questionnaires:
https://www.census.gov/programs-surveys/acs/methodology/questionnaire-archi…
<https://www.census.gov/programs-surveys/acs/methodology/questionnaire-archive.html>
Next, I would recommend downloading the “table shells” from the Census Bureau’s website,
and not rely on just on the tidycensus “load_variables” function. Get the table shells for
all of the years: the ACS does change ever so often, and so do the tables! I find it
useful to have part of my computer screen opened with the table shells visible in Excel.
ACS Table Shells:
https://www.census.gov/programs-surveys/acs/technical-documentation/table-s…
<https://www.census.gov/programs-surveys/acs/technical-documentation/table-shells.html>
I find it useful to have on hand a guide to the ACS table numbering scheme, so you know
your “B” and “C” and “S” and “GCT” tables and the two-digit subject indicator (“08” –
Journey-to-Work”).
https://censusreporter.org/topics/table-codes/
<https://censusreporter.org/topics/table-codes/>
Download and install the free software package R Studio. There are other YouTube videos
you can watch about learning/installing R and R Studio, and I won’t cover those here.
https://rstudio.com/products/rstudio/download/#download
<https://rstudio.com/products/rstudio/download/#download>
Launch R Studio. There are a few add-on packages that first need to be installed onto your
computer, and then “loaded” into your working R session.
# Step 1 Install R packages. If installed in previous sessions, there is no need to
re-install.
# You may need to install the packages "tidyr" and "sp" for
"tidycensus" to be properly installed.
install.packages("tidyverse")
install.packages("tidycensus")
install.packages("janitor")
# Step 2: Load relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
library(janitor)
Acquire a Census API key from the Census Bureau. It’s free. It’s a 40 character string
that identifies a unique API user and helps the Census Bureau improve their tools to
access census data. They’ll e-mail you a key in no time at all.
https://www.census.gov/data/developers/updates/new-discovery-tool.html
<https://www.census.gov/data/developers/updates/new-discovery-tool.html>
https://api.census.gov/data/key_signup.html
<https://api.census.gov/data/key_signup.html>
Install your 40-character API key into your R “environment.” Just one time and no need to
concern yourself ever again about this key.
# Step 3: Load the User's Census API Key.
# Census API Key was installed in previous sessions, so no need to re-install
# un-comment out the following statement with the user's API key.
# census_api_key("fortycharacterkeysentbyCensusBureau",install=TRUE)
The last section of this introduction relates to using the “load_variables” as a tool to
assist in selecting various variables. I prefer to download the ACS Table Shells into
Excel, and then have appropriate Table Shells opened, alongside R Studio, to aid me in
variable selection and naming.
# Step 4: Explore the Data Variables using the load_variables() function
# Use the function load_variables() to view all of the possible variables for analysis
# load_variables works for both decennial census and American Community Survey databases
acs18_variable_list <- load_variables(year = 2018, dataset = "acs5", cache =
TRUE)
acs18p_variable_list <- load_variables(year = 2018, dataset = "acs5/profile",
cache = TRUE)
# Maybe write out the data frame to the desktop, for easier in use in Excel?
write.csv(acs18_variable_list,'acs18_variable_list.csv', row.names=FALSE)
View(acs18_variable_list)
As of this summer 2020, tidycensus can be used to extract the “base” and “collapsed”
tables for all years of the ACS, from 2005 through 2018 “single year” databases; the
five-year ACS databases starting with 2005/09 through to 2014/18; and the decennial census
files for 2010, 2000 and 1990. For the decennial censuses, databases include the SF1
(Summary File #1) for 1990, 2000 and 2010; and the SF3 (Summary File #3) for 1990 and
2000. (There was no long form census in the 2010 Census, so, thus no long-form-based SF3
data for 2010!)
(I have yet to explore how to pull data from the decennial censuses using tidycensus, and
would be grateful to hear news of successes/failures.)
A word of warning: R is very case sensitive. Something like View(acs18_variable_list) will
work okay, but view(acs18_variable_list) will not work!!
That’s the end of Step #0… Setting up Tidycensus!
Chuck Purvis, Hayward, California
Retired Person (formerly of the Metropolitan Transportation Commission, San Francisco,
California)\