Example #1. Tidycensus examples: one year, multiple geographies, multiple variables
This is an example of a simple script to “pull” select variables from the ACS using my Census API key and the R package tidycensus.
Step #0. Always need to load the relevant packages/libraries
when starting up a new R-session. Otherwise, it won’t work.
# Step 0: Load relevant libraries into each
R-session.
library(tidyverse)
library(tidycensus)
library(janitor)
Example #1.1 pulls the 2018 ACS data for San Francisco
Bay Area counties, for table C03002 (population by race/ethnicity). It’s pulled
into a “data frame” called “county1”.
1. Survey=”acs1”
– am I using the 1-year or 5-year databases
2. Year=2018 - what’s the last year of the 1-yr/5-year
database
3. Geography=”county”
– what level of geography am I pulling? US, State? County? Congressional
District? Place?
See the tidycensus documentation, and the author’s
website, for all of this and more!
https://walker-data.com/tidycensus/articles/basic-usage.html
https://cran.r-project.org/web/packages/tidycensus/tidycensus.pdf
https://www.rdocumentation.org/packages/tidycensus/
# Simple Example #1.1: Population by
Race/Ethnicity, 2018, SF Bay, Table C03002
#
Note that tidycensus can use either the County Name or the County FIPS
Code number.
#
Experiment with output="wide" versus output="tidy"
("tidy" is the default.)
#####################################################################################
county1
<- get_acs(survey="acs1", year=2018, geography =
"county", state = "CA",
#
county=c(1,13,41,55,75,81,85,95,97),
county=c("Alameda","Contra
Costa","Marin","Napa","San Francisco",
"San
Mateo","Santa Clara","Solano","Sonoma"),
show_call = TRUE, output="wide",
table="C03002")
Example #1.2 is a variation on the previous script
portion and pulls out population by race/ethnicity for ALL California counties,
2014/18 five-year ACS. If I used “ACS1” and “2018”, I’d only obtain data for
the largest counties with 65,000+ total population!
# Simple Example #1.2: Population by
Race/Ethnicity, 2014-2018, All California Counties, Table B03002
#
If the list of counties is excluded,
# then
data is pulled for all counties in the State
######################################################################################
AllCalCounties <- get_acs(survey="acs5",
year=2018, geography = "county",
state = "CA",
show_call = TRUE,
output="wide", table="B03002")
Example #1.3 pulls out population by race/ethnicity for
ALL Congressional Districts in California, for the single year 2018 ACS.
# Simple Example #1.3: Population by
Race/Ethnicity, 2018, California Congress Dists, Table C03002
#
This example pulls the congressional districts from California.
Eliminate state="CA" to get congressional districts from the entire
United States
######################################################################################
congdist1 <-
get_acs(survey="acs1", year=2018, geography = "congressional
district",
state = "CA",
show_call = TRUE, output="wide", table="C03002")
Example #1.4 Names the variables using mnemonic names for population by race/ethnicity, 2018, single year ACS, Bay Area counties. I’m using the janitor package “adorn_totals” function to sum up regional totals.
The tidycensus package will append “E” to variable
estimates and “M” to variable margins of error (90 percent confidence level, by
default). So, the variable “White_NH_E” will mean, to me, at least, “Estimates
of White Non-Hispanic Population” and “White_NH_M” will mean: “Margin of Error,
90% confidence level, of White Non-Hispanic Population.”
# Simple Example #1.4.1: Population by
Race/Ethnicity: Bay Counties: Naming Variables.
#
User-defined mnemonic variable names, since "C03002_001_E"
doesn't fall trippingly on the tongue!
#
the underscore is useful since tidycensus will append "E" to
estimates and "M" to margin of error
#
variables, e.g., "Total_E" and "Total_M"
######################################################################################
county2
<- get_acs(survey="acs1", year=2018, geography =
"county", state = "CA",
county=c(1,13,41,55,75,81,85,95,97),
show_call = TRUE,
output="wide",
variables = c(Total_ = "C03002_001", # Universe is Total Population
White_NH_ =
"C03002_003", # Non-Hispanic
White
Black_NH_ =
"C03002_004", # Non-Hispanic
Black
AIAN_NH_ = "C03002_005", # NH, American Indian & Alaskan Native
Asian_NH_ =
"C03002_006", # Non-Hispanic
Asian
NHOPI_NH_ =
"C03002_007", # NH, Native
Hawaiian & Other Pacific Isl.
Other_NH_ = "C03002_008", # Non-Hispanic Other
Multi_NH_ =
"C03002_009", # Two-or-More
Races, Non-Hispanic
Hispanic_ =
"C03002_012")) # Hispanic/Latino
# Sometimes the results of TIDYCENSUS aren't
sorted, so:
county2 <-
county2[order(county2$GEOID),]
###########################################################################
# Simple Example #1.4.2: Add a new record: SF
Bay Area, as sum of records 1-9
# adorn_totals is a function from the
package janitor.
# The name="06888" is arbitrary,
just a filler for the GEOID column.
tempxxx <-
adorn_totals(county2,name="06888")
tempxxx[10,2]="San Francisco Bay
Area"
county3 <- tempxxx
# Set a working directory, and write out CSV
files as wanted.
# This is an example for a Mac, with the
folder tidycensus_work on the desktop, and
# the folder output within tidycensus_work
setwd("~/Desktop/tidycensus_work/output")
write.csv(county3,"ACS18_BayAreaCounties.csv")
#############################################################################
At the end of this step I’m writing out CSV (comma
separated value) files which I then open in Excel for finishing touches to
tables, manually editing the variable names to something les cryptic: