Example #1. Tidycensus examples: one year, multiple geographies, multiple variables

This is an example of a simple script to “pull” select variables from the ACS using my Census API key and the R package tidycensus.

Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. Otherwise, it won’t work.

Comments start with hashtag “#”. It’s more obvious when using R-Studio.

# Step 0: Load relevant libraries into each R-session.

library(tidyverse)

library(tidycensus)

library(janitor)

Example #1.1 pulls the 2018 ACS data for San Francisco Bay Area counties, for table C03002 (population by race/ethnicity). It’s pulled into a “data frame” called “county1”.

Though the keywords (survey, year, geography, etc.) can be in any order within the “get_acs()” statement, I prefer leading with:

1. Survey=”acs1” – am I using the 1-year or 5-year databases

2. Year=2018 - what’s the last year of the 1-yr/5-year database

3. Geography=”county” – what level of geography am I pulling? US, State? County? Congressional District? Place?

See the tidycensus documentation, and the author’s website, for all of this and more!

https://walker-data.com/tidycensus/articles/basic-usage.html

https://cran.r-project.org/web/packages/tidycensus/tidycensus.pdf

https://www.rdocumentation.org/packages/tidycensus/

# Simple Example #1.1: Population by Race/Ethnicity, 2018, SF Bay, Table C03002

# Note that tidycensus can use either the County Name or the County FIPS Code number.

# Experiment with output="wide" versus output="tidy" ("tidy" is the default.)

#####################################################################################

county1 <- get_acs(survey="acs1", year=2018, geography = "county", state = "CA",

# county=c(1,13,41,55,75,81,85,95,97),

county=c("Alameda","Contra Costa","Marin","Napa","San Francisco",

"San Mateo","Santa Clara","Solano","Sonoma"),

show_call = TRUE, output="wide",

table="C03002")

Example #1.2 is a variation on the previous script portion and pulls out population by race/ethnicity for ALL California counties, 2014/18 five-year ACS. If I used “ACS1” and “2018”, I’d only obtain data for the largest counties with 65,000+ total population!

# Simple Example #1.2: Population by Race/Ethnicity, 2014-2018, All California Counties, Table B03002

# If the list of counties is excluded,

# then data is pulled for all counties in the State

######################################################################################

AllCalCounties <- get_acs(survey="acs5", year=2018, geography = "county",

state = "CA", show_call = TRUE, output="wide", table="B03002")

Example #1.3 pulls out population by race/ethnicity for ALL Congressional Districts in California, for the single year 2018 ACS.

# Simple Example #1.3: Population by Race/Ethnicity, 2018, California Congress Dists, Table C03002

# This example pulls the congressional districts from California. Eliminate state="CA" to get congressional districts from the entire United States

######################################################################################

congdist1 <- get_acs(survey="acs1", year=2018, geography = "congressional district",

state = "CA", show_call = TRUE, output="wide", table="C03002")

Example #1.4 Names the variables using mnemonic names for population by race/ethnicity, 2018, single year ACS, Bay Area counties. I’m using the janitor package “adorn_totals” function to sum up regional totals.

The tidycensus package will append “E” to variable estimates and “M” to variable margins of error (90 percent confidence level, by default). So, the variable “White_NH_E” will mean, to me, at least, “Estimates of White Non-Hispanic Population” and “White_NH_M” will mean: “Margin of Error, 90% confidence level, of White Non-Hispanic Population.”

# Simple Example #1.4.1: Population by Race/Ethnicity: Bay Counties: Naming Variables.

# User-defined mnemonic variable names, since "C03002_001_E" doesn't fall trippingly on the tongue!

# the underscore is useful since tidycensus will append "E" to estimates and "M" to margin of error

# variables, e.g., "Total_E" and "Total_M"

######################################################################################

county2 <- get_acs(survey="acs1", year=2018, geography = "county", state = "CA",

county=c(1,13,41,55,75,81,85,95,97),

show_call = TRUE, output="wide",

variables = c(Total_ = "C03002_001", # Universe is Total Population

White_NH_ = "C03002_003", # Non-Hispanic White

Black_NH_ = "C03002_004", # Non-Hispanic Black

AIAN_NH_ = "C03002_005", # NH, American Indian & Alaskan Native

Asian_NH_ = "C03002_006", # Non-Hispanic Asian

NHOPI_NH_ = "C03002_007", # NH, Native Hawaiian & Other Pacific Isl.

Other_NH_ = "C03002_008", # Non-Hispanic Other

Multi_NH_ = "C03002_009", # Two-or-More Races, Non-Hispanic

Hispanic_ = "C03002_012")) # Hispanic/Latino

# Sometimes the results of TIDYCENSUS aren't sorted, so:

county2 <- county2[order(county2$GEOID),]

###########################################################################

# Simple Example #1.4.2: Add a new record: SF Bay Area, as sum of records 1-9

# adorn_totals is a function from the package janitor.

# The name="06888" is arbitrary, just a filler for the GEOID column.

tempxxx <- adorn_totals(county2,name="06888")

tempxxx[10,2]="San Francisco Bay Area"

county3 <- tempxxx

# Set a working directory, and write out CSV files as wanted.

# This is an example for a Mac, with the folder tidycensus_work on the desktop, and

# the folder output within tidycensus_work

setwd("~/Desktop/tidycensus_work/output")

write.csv(county3,"ACS18_BayAreaCounties.csv")

#############################################################################

At the end of this step I’m writing out CSV (comma separated value) files which I then open in Excel for finishing touches to tables, manually editing the variable names to something les cryptic: