Here’s my writeup on creating on what I call a “stacked” output from tidycensus: one record per each unique geography / year combination. This approach may be preferred if you’re trying to create a “data profile” for a specific geographic area, with rows representing the years, and columns representing the various variables of interest (total population, household population, workers by means of transportation to work, etc.)

I’m almost done! Hope this helps!

Chuck Purvis,

Hayward, California

Example #3. More Complex Tidycensus examples: multiple years, multiple geographies, multiple variables. “Stacked” results.

This is an example of stacking “R” data frames, where each record (row) represents a unique geography/year combination.

Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. I’m loading the “R” package “plyr” which helps in stacking / concatenating / pancaking data frames.

# Step 0: Load relevant libraries into each R-session.

library(tidyverse)

library(tidycensus)

library(janitor)

library(plyr) # This is needed for a function to concatenate a lot of files in one statement!

In this set of examples, I’m extracting single year ACS variables (2005-2018) for all large (65,000+ population) places in the State of California. Very similar to Example #2, but with one record (row) per each geography/year combination.

# Example 3.1 through 3.14: Run get_acs for large California Places, 2005-2018

# Example 3.15: Concatenate (pancake) data frames: lots of records

# Example 3.16: Merge in a file of Large San Francisco Bay Area places, and subset file.

# Example 3.17: Extract data for one place using a string search on the place name

#------------------------------------------------------------------------------------

# Set a list of variables to extract in each iteration of get_acs

# This is a LOT more efficient for variable naming!!!

selvars <- c(TotalPop_ = "B06001_001", # Total Population

Med_HHInc_ = "B19013_001", # Median Household Income

Agg_HHInc_ = "B19025_001", # Aggregate Household Income

HHldPop_ = "B11002_001", # Population in Households

Househlds_ = "B25003_001", # Total Households

Owner_OccDU_= "B25003_002", # Owner-Occupied Dwelling Units

Rent_OccDU_ = "B25003_003", # Renter-Occupied Dwelling Units

Med_HHVal_ = "B25077_001")

#------------------------------------------------------------------------------------

temp2005 <- get_acs(survey="acs1", year=2005, geography = "place", state = "CA",

show_call = TRUE,output="wide", variables = selvars)

temp2005$Year <- "2005"

#------------------------------------------------------------------------------------

temp2006 <- get_acs(survey="acs1", year=2006, geography = "place", state = "CA",

show_call = TRUE,output="wide", variables = selvars)

temp2006$Year <- "2006"

#------------------------------------------------------------------------------------

temp2007 <- get_acs(survey="acs1", year=2007, geography = "place", state = "CA",

show_call = TRUE,output="wide", variables = selvars)

temp2007$Year <- "2007"

#------------------------------------------------------------------------------------

These sets of codes are repeated for each ACS single-year of interest. Note that I’m adding a new variable “Year” to each data frame. Otherwise, I have no indication of the year of each data frame, other than the actual name of the file!

In the following “R” step, I’m using the “dplyr” function “rbind.fill” to concatenate a lot of data frames!

# Example 3.15: Concatenate (pancake) data frames: lots of records

# Concatenate All Years .....

# rbind can only concatenate two dataframes at a time. rbind.fill can do 2-or-more data

# frames to concatenate. It's a plyr function.

# temp0506 <- rbind(temp2005,temp2006)

# temp0507 <- rbind(temp0506,temp2007)

tempall <- rbind.fill(temp2005,temp2006,temp2007,temp2008,temp2009,

temp2010,temp2011,temp2012,temp2013,temp2014,

temp2015,temp2016,temp2017,temp2018)

# Add a couple of useful variables!

# need to have a if/then to catch zero values.. work on this later.

# tempall$Avg_HHSize <- tempall$HHldPop_E / tempall$Househlds_E

# tempall$MeanHHInc <- tempall$Agg_HHInc_E / tempall$Househlds_E

# Sort the Results by GEOID and then by Year

tempalls <- tempall[order(tempall$GEOID,tempall$Year),]

setwd("~/Desktop/tidycensus_work/output")

# Export the data frames to CSV files, for importing to Excel, and applying finishing touches

write.csv(tempalls,"ACS_AllYears_Calif_Places_Stacked.csv")

In the following step I’m extracting data for large places in the San Francisco Bay Area.

# Example 3.16: Merge in a file of Large San Francisco Bay Area places, and subset file.

# Read in a file with the Large SF Bay Area Places, > 65,000 population

# and merge with the All Large California Places

bayplace <- read.csv("BayArea_Places_65K.csv")

Bayplace1 <- merge(bayplace,tempalls, by = c('NAME'))

Bayplace1 <- Bayplace1[order(Bayplace1$GEOID.x,Bayplace1$Year),]

write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places_Stacked.csv")

dput(names(Bayplace1))

In the following step I’m extracting data for just “Hayward” city in the San Francisco Bay Area. This uses the “R” function “grepl”. (That’s grep-ell, not grep-one).

# Example 3.17: Extract data for one place using a string search on the place name

# Extract one place at a time from Bayplace1

Hayward <- filter(Bayplace1, grepl("Hayward",NAME,fixed=TRUE))

Hayward <- Hayward[order(Hayward$Year),]

Hayward$Avg_HHSize <- Hayward$HHldPop_E / Hayward$Househlds_E

Hayward$MeanHHInc <- Hayward$Agg_HHInc_E / Hayward$Househlds_E

selvarxxx <- c("Year","NAME", "GEOID.x", "NAME2",

"TotalPop_E", "Med_HHInc_E",

"Agg_HHInc_E", "HHldPop_E", "Househlds_E",

"Owner_OccDU_E", "Rent_OccDU_E",

"Med_HHVal_E", "Avg_HHSize", "MeanHHInc" )

Hayward2 <- Hayward[selvarxxx]

write.csv(Hayward2,"ACS_AllYears_BaseVar_Hayward_Stacked.csv")

#####################################################################################

This concludes Example #3: “multiple geographies / multiple years/ multiple variables” with only one record (row) per each geography/year combination, or the “stacked” output.