Example #1. Tidycensus examples: one year, multiple geographies, multiple variables
This is an example of a simple script to “pull” select variables from the ACS using my
Census API key and the R package tidycensus.
Step #0. Always need to load the relevant packages/libraries when starting up a new
R-session. Otherwise, it won’t work.
Comments start with hashtag “#”. It’s more obvious when using R-Studio.
# Step 0: Load relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
library(janitor)
Example #1.1 pulls the 2018 ACS data for San Francisco Bay Area counties, for table C03002
(population by race/ethnicity). It’s pulled into a “data frame” called “county1”.
Though the keywords (survey, year, geography, etc.) can be in any order within the
“get_acs()” statement, I prefer leading with:
1. Survey=”acs1” – am I using the 1-year or 5-year databases
2. Year=2018 - what’s the last year of the 1-yr/5-year database
3. Geography=”county” – what level of geography am I pulling? US, State? County?
Congressional District? Place?
See the tidycensus documentation, and the author’s website, for all of this and more!
https://walker-data.com/tidycensus/articles/basic-usage.html
<https://walker-data.com/tidycensus/articles/basic-usage.html>
https://cran.r-project.org/web/packages/tidycensus/tidycensus.pdf
<https://cran.r-project.org/web/packages/tidycensus/tidycensus.pdf>
https://www.rdocumentation.org/packages/tidycensus/
<https://www.rdocumentation.org/packages/tidycensus/>
# Simple Example #1.1: Population by Race/Ethnicity, 2018, SF Bay, Table C03002
# Note that tidycensus can use either the County Name or the County FIPS Code number.
# Experiment with output="wide" versus output="tidy"
("tidy" is the default.)
#####################################################################################
county1 <- get_acs(survey="acs1", year=2018, geography =
"county", state = "CA",
# county=c(1,13,41,55,75,81,85,95,97),
county=c("Alameda","Contra
Costa","Marin","Napa","San Francisco",
"San Mateo","Santa
Clara","Solano","Sonoma"),
show_call = TRUE, output="wide",
table="C03002")
Example #1.2 is a variation on the previous script portion and pulls out population by
race/ethnicity for ALL California counties, 2014/18 five-year ACS. If I used “ACS1” and
“2018”, I’d only obtain data for the largest counties with 65,000+ total population!
# Simple Example #1.2: Population by Race/Ethnicity, 2014-2018, All California Counties,
Table B03002
# If the list of counties is excluded,
# then data is pulled for all counties in the State
######################################################################################
AllCalCounties <- get_acs(survey="acs5", year=2018, geography =
"county",
state = "CA", show_call = TRUE, output="wide",
table="B03002")
Example #1.3 pulls out population by race/ethnicity for ALL Congressional Districts in
California, for the single year 2018 ACS.
# Simple Example #1.3: Population by Race/Ethnicity, 2018, California Congress Dists,
Table C03002
# This example pulls the congressional districts from California. Eliminate
state="CA" to get congressional districts from the entire United States
######################################################################################
congdist1 <- get_acs(survey="acs1", year=2018, geography =
"congressional district",
state = "CA", show_call = TRUE, output="wide",
table="C03002")
Example #1.4 Names the variables using mnemonic names for population by race/ethnicity,
2018, single year ACS, Bay Area counties. I’m using the janitor package “adorn_totals”
function to sum up regional totals.
The tidycensus package will append “E” to variable estimates and “M” to variable margins
of error (90 percent confidence level, by default). So, the variable “White_NH_E” will
mean, to me, at least, “Estimates of White Non-Hispanic Population” and “White_NH_M” will
mean: “Margin of Error, 90% confidence level, of White Non-Hispanic Population.”
# Simple Example #1.4.1: Population by Race/Ethnicity: Bay Counties: Naming Variables.
# User-defined mnemonic variable names, since "C03002_001_E" doesn't fall
trippingly on the tongue!
# the underscore is useful since tidycensus will append "E" to estimates and
"M" to margin of error
# variables, e.g., "Total_E" and "Total_M"
######################################################################################
county2 <- get_acs(survey="acs1", year=2018, geography =
"county", state = "CA",
county=c(1,13,41,55,75,81,85,95,97),
show_call = TRUE, output="wide",
variables = c(Total_ = "C03002_001", # Universe is Total
Population
White_NH_ = "C03002_003", # Non-Hispanic White
Black_NH_ = "C03002_004", # Non-Hispanic Black
AIAN_NH_ = "C03002_005", # NH, American Indian
& Alaskan Native
Asian_NH_ = "C03002_006", # Non-Hispanic Asian
NHOPI_NH_ = "C03002_007", # NH, Native Hawaiian
& Other Pacific Isl.
Other_NH_ = "C03002_008", # Non-Hispanic Other
Multi_NH_ = "C03002_009", # Two-or-More Races,
Non-Hispanic
Hispanic_ = "C03002_012")) # Hispanic/Latino
# Sometimes the results of TIDYCENSUS aren't sorted, so:
county2 <- county2[order(county2$GEOID),]
###########################################################################
# Simple Example #1.4.2: Add a new record: SF Bay Area, as sum of records 1-9
# adorn_totals is a function from the package janitor.
# The name="06888" is arbitrary, just a filler for the GEOID column.
tempxxx <- adorn_totals(county2,name="06888")
tempxxx[10,2]="San Francisco Bay Area"
county3 <- tempxxx
# Set a working directory, and write out CSV files as wanted.
# This is an example for a Mac, with the folder tidycensus_work on the desktop, and
# the folder output within tidycensus_work
setwd("~/Desktop/tidycensus_work/output")
write.csv(county3,"ACS18_BayAreaCounties.csv")
#############################################################################
At the end of this step I’m writing out CSV (comma separated value) files which I then
open in Excel for finishing touches to tables, manually editing the variable names to
something les cryptic:
That’s all for today!
Chuck Purvis,
Retired Person, Hayward, California
(Formerly of the Metropolitan Transportation Commission, San Francisco, California)
Take care!!