Example #2. More Complex Tidycensus examples: multiple years, multiple geographies, multiple variables.

This is a more complex example of a script to “pull” select variables from the ACS using my Census API key and the R package tidycensus.

Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. Otherwise, it won’t work.

# Step 0: Load relevant libraries into each R-session.

library(tidyverse)

library(tidycensus)

In this set of examples, I’m extracting single year ACS variables (2005-2018) for all large (65,000+ population) places in the State of California.

# The get_acs function is run for each year of the single-year ACS data, from 2005 to 2018.

# Note that group quarters data was not collected in 2005, but started in 2006.

# Note the "_05_" included in the variable name in the first data "pull". That's a # # mnemonic device that tells us it's for the year 2005.

# Example 2.1 through 2.14: Run get_acs for large California Places, 2005-2018

# Example 2.15: Merge together data frames into a VERY wide database...lots of columns!

# Example 2.16: Merge in a file of Large San Francisco Bay Area places, and subset file.

#-------------------------------------------------------------------------------

place05 <- get_acs(survey="acs1", year=2005, geography = "place", state = "CA",

show_call = TRUE, output="wide",

variables = c(TotalPop_05_ = "B06001_001", # Total Population

Med_HHInc_05_ = "B19013_001", # Median Household Income

Agg_HHInc_05_ = "B19025_001", # Aggregate Household Income

HHldPop_05_ = "B11002_001", # Population in Households

Househlds_05_ = "B25003_001", # Total Households

Owner_OccDU_05_= "B25003_002", # Owner-Occupied Dwelling Units

Rent_OccDU_05_ = "B25003_003", # Renter-Occupied Dwelling Units

Med_HHVal_05_ = "B25077_001")) # Median Value of Owner-Occ DUs

place05$Avg_HHSize_05 <- place05$HHldPop_05_E / place05$Househlds_05_E

place05$MeanHHInc_05 <- place05$Agg_HHInc_05_E / place05$Househlds_05_E

#------------------------------------------------------------------------------------

place06 <- get_acs(survey="acs1", year=2006, geography = "place", state = "CA",

show_call = TRUE, output="wide",

variables = c(TotalPop_06_ = "B06001_001", # Total Population

Med_HHInc_06_ = "B19013_001", # Median Household Income

Agg_HHInc_06_ = "B19025_001", # Aggregate Household Income

HHldPop_06_ = "B11002_001", # Population in Households

Househlds_06_ = "B25003_001", # Total Households

Owner_OccDU_06_= "B25003_002", # Owner-Occupied Dwelling Units

Rent_OccDU_06_ = "B25003_003", # Renter-Occupied Dwelling Units

Med_HHVal_06_ = "B25077_001")) # Median Value of Owner-Occ DUs

place06$Avg_HHSize_06 <- place06$HHldPop_06_E / place06$Househlds_06_E

place06$MeanHHInc_06 <- place06$Agg_HHInc_06_E / place06$Househlds_06_E

#------------------------------------------------------------------------------------

These sets of codes are repeated for each single year ACS of interest, say for 2005 through 2018. Smarter “R” programmers will be able to tell me about “do loops” to make this process more efficient with magical wild cards.

The following step merges the data frames using the GEOID/NAME variables. This create a very “wide” database. One record per geography, and each column representing the variable/year combinations.

The “merge” function in “R” allows only two data frames to be joined by common columns at a time. I have yet to find a “R” function that allows me to merge all of the data frames at once.

#####################################################################################

# Example 2.15: Merge together data frames into a VERY wide database...lots of columns!

# Merge the dataframes, adding a year in each step. All=TRUE is needed if # of places is different.

# (R-language newbie script...There are probably more terse/exotic ways of doing this!)

place0506 <- merge(place05, place06, by = c('GEOID','NAME'), all=TRUE)

place0507 <- merge(place0506,place07, by = c('GEOID','NAME'), all=TRUE)

place0508 <- merge(place0507,place08, by = c('GEOID','NAME'), all=TRUE)

place0509 <- merge(place0508,place09, by = c('GEOID','NAME'), all=TRUE)

place0510 <- merge(place0509,place10, by = c('GEOID','NAME'), all=TRUE)

place0511 <- merge(place0510,place11, by = c('GEOID','NAME'), all=TRUE)

place0512 <- merge(place0511,place12, by = c('GEOID','NAME'), all=TRUE)

place0513 <- merge(place0512,place13, by = c('GEOID','NAME'), all=TRUE)

place0514 <- merge(place0513,place14, by = c('GEOID','NAME'), all=TRUE)

place0515 <- merge(place0514,place15, by = c('GEOID','NAME'), all=TRUE)

place0516 <- merge(place0515,place16, by = c('GEOID','NAME'), all=TRUE)

place0517 <- merge(place0516,place17, by = c('GEOID','NAME'), all=TRUE)

place0518 <- merge(place0517,place18, by = c('GEOID','NAME'), all=TRUE)

place_all <- place0518

View(place_all)

Sometimes you want to create smaller data frames with just a select number of columns. Here’s a good approach for that.

# The following functions output useful lists to the R-studio console which can then be edited

names(place_all)

dput(names(place_all)) # most useful for subsetting variables

# The purpose here is to re-order and select variables into a much more compact

# database, for eventual exporting into a CSV file, and then into Excel for finishing touches.

selvars <- c("GEOID", "NAME",

"TotalPop_05_E", "TotalPop_06_E", "TotalPop_07_E", "TotalPop_08_E",

"TotalPop_09_E", "TotalPop_10_E", "TotalPop_11_E", "TotalPop_12_E",

"TotalPop_13_E", "TotalPop_14_E", "TotalPop_15_E", "TotalPop_16_E",

"TotalPop_17_E", "TotalPop_18_E")

# note the brackets for outputing new data frame from previous data frame....

place_all2 <- place_all[selvars]

# View the Selected Variables Table

View(place_all2)

# Set directory for exported data files, MacOS directory style

setwd("~/Desktop/tidycensus_work/output")

# Export the data frames to CSV files, for importing to Excel, and applying finishing touches

write.csv(place_all2,"ACS_AllYears_TotalPop_Calif_Places.csv")

write.csv(place_all, "ACS_AllYears_BaseVar_Calif_Places.csv")

In this last example, I’m reading in a file of large places in the Bay Area (manually derived from the CSV file created previous) in order to subset Bay Area “large places” from State of California “large places”.

#####################################################################################

# Example 2.16: Merge in a file of Large San Francisco Bay Area places, and subset file.

# Read in a file with the Large SF Bay Area Places, > 65,000 population

# and merge with the All Large California Places

bayplace <- read.csv("BayArea_Places_65K.csv")

Bayplace1 <- merge(bayplace,place_all, by = c('NAME'))

Bayplace2 <- merge(bayplace,place_all2, by = c('NAME'))

write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places.csv")

write.csv(Bayplace2,"ACS_AllYears_TotalPop_BayArea_Places.csv")

This concludes Example #2: “multiple geographies / multiple years/ multiple variables” with only one record (row) per each geography.