Example #0. Setting up tidycensus.
This is an introduction to the use of the R-package tidycensus in
extracting data from the US Census Bureau’s American Community Survey. I’m
adding snippets of R code from my R-scripts, and attaching the full r-script to
this message.
First things first: Acquaint yourself with the American Community Survey. What I would strongly recommend is to download and print out copies of the various ACS survey questionnaires. Know what was asked!
Decennial Census questionnaires:
https://www.census.gov/history/www/through_the_decades/questionnaires/
American Community Survey questionnaires:
https://www.census.gov/programs-surveys/acs/methodology/questionnaire-archive.html
Next, I would recommend downloading the “table shells” from the
Census Bureau’s website, and not rely on just on the tidycensus
“load_variables” function. Get the table shells for all of the years: the ACS
does change ever so often, and so do the tables! I find it useful to have part
of my computer screen opened with the table shells visible in Excel.
ACS Table Shells:
https://www.census.gov/programs-surveys/acs/technical-documentation/table-shells.html
I find it useful to have on hand a guide to the ACS table numbering scheme, so you know your “B” and “C” and “S” and “GCT” tables and the two-digit subject indicator (“08” – Journey-to-Work”). https://censusreporter.org/topics/table-codes/
Download and install the free software package R Studio. There are other YouTube videos you can watch about learning/installing R and R Studio, and I won’t cover those here.
https://rstudio.com/products/rstudio/download/#download
Launch R Studio. There are a few add-on packages that first need
to be installed onto your computer, and then “loaded” into your working R
session.
# Step 1 Install
R packages. If installed in previous sessions, there is no need to re-install.
# You may need to
install the packages "tidyr" and "sp" for
"tidycensus" to be properly installed.
install.packages("tidyverse")
install.packages("tidycensus")
install.packages("janitor")
# Step 2: Load
relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
library(janitor)
Acquire a Census API key from the Census Bureau. It’s free. It’s a
40 character string that identifies a unique API user and helps the Census
Bureau improve their tools to access census data. They’ll e-mail you a key in
no time at all.
https://www.census.gov/data/developers/updates/new-discovery-tool.html
https://api.census.gov/data/key_signup.html
Install your 40-character API key into your R “environment.” Just
one time and no need to concern yourself ever again about this key.
# Step 3: Load
the User's Census API Key.
# Census API Key
was installed in previous sessions, so no need to re-install
# un-comment out
the following statement with the user's API key.
#
census_api_key("fortycharacterkeysentbyCensusBureau",install=TRUE)
The last section of this introduction relates to using the
“load_variables” as a tool to assist in selecting various variables. I prefer
to download the ACS Table Shells into Excel, and then have appropriate Table
Shells opened, alongside R Studio, to aid me in variable selection and naming.
# Step 4: Explore
the Data Variables using the load_variables() function
# Use the
function load_variables() to view all of the possible variables for analysis
# load_variables
works for both decennial census and American Community Survey databases
acs18_variable_list
<- load_variables(year = 2018, dataset = "acs5", cache = TRUE)
acs18p_variable_list
<- load_variables(year = 2018, dataset = "acs5/profile", cache =
TRUE)
# Maybe write out
the data frame to the desktop, for easier in use in Excel?
write.csv(acs18_variable_list,'acs18_variable_list.csv',
row.names=FALSE)
View(acs18_variable_list)
As of this summer 2020, tidycensus can be used to extract the
“base” and “collapsed” tables for all years of the ACS, from 2005 through 2018
“single year” databases; the five-year ACS databases starting with 2005/09
through to 2014/18; and the decennial census files for 2010, 2000 and 1990. For
the decennial censuses, databases include the SF1 (Summary File #1) for 1990,
2000 and 2010; and the SF3 (Summary File #3) for 1990 and 2000. (There was no
long form census in the 2010 Census, so, thus no long-form-based SF3 data for
2010!)
(I have yet to explore how to pull data from the decennial
censuses using tidycensus, and would be grateful to hear news of successes/failures.)
A word of warning: R is very case sensitive. Something like View(acs18_variable_list) will
work okay, but view(acs18_variable_list) will
not work!!
That’s the end of Step #0… Setting up Tidycensus!