Here’s my writeup on creating on what I call a “stacked” output from tidycensus: one
record per each unique geography / year combination. This approach may be preferred if
you’re trying to create a “data profile” for a specific geographic area, with rows
representing the years, and columns representing the various variables of interest (total
population, household population, workers by means of transportation to work, etc.)
I’m almost done! Hope this helps!
Chuck Purvis,
Hayward, California
Example #3. More Complex Tidycensus examples: multiple years, multiple geographies,
multiple variables. “Stacked” results.
This is an example of stacking “R” data frames, where each record (row) represents a
unique geography/year combination.
Step #0. Always need to load the relevant packages/libraries when starting up a new
R-session. I’m loading the “R” package “plyr” which helps in stacking / concatenating /
pancaking data frames.
# Step 0: Load relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
library(janitor)
library(plyr) # This is needed for a function to concatenate a lot of files in one
statement!
In this set of examples, I’m extracting single year ACS variables (2005-2018) for all
large (65,000+ population) places in the State of California. Very similar to Example #2,
but with one record (row) per each geography/year combination.
# Example 3.1 through 3.14: Run get_acs for large California Places, 2005-2018
# Example 3.15: Concatenate (pancake) data frames: lots of records
# Example 3.16: Merge in a file of Large San Francisco Bay Area places, and
subset file.
# Example 3.17: Extract data for one place using a string search on the place
name
#------------------------------------------------------------------------------------
# Set a list of variables to extract in each iteration of get_acs
# This is a LOT more efficient for variable naming!!!
selvars <- c(TotalPop_ = "B06001_001", # Total Population
Med_HHInc_ = "B19013_001", # Median Household Income
Agg_HHInc_ = "B19025_001", # Aggregate Household Income
HHldPop_ = "B11002_001", # Population in Households
Househlds_ = "B25003_001", # Total Households
Owner_OccDU_= "B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_ = "B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_ = "B25077_001")
#------------------------------------------------------------------------------------
temp2005 <- get_acs(survey="acs1", year=2005, geography = "place",
state = "CA",
show_call = TRUE,output="wide", variables = selvars)
temp2005$Year <- "2005"
#------------------------------------------------------------------------------------
temp2006 <- get_acs(survey="acs1", year=2006, geography = "place",
state = "CA",
show_call = TRUE,output="wide", variables = selvars)
temp2006$Year <- "2006"
#------------------------------------------------------------------------------------
temp2007 <- get_acs(survey="acs1", year=2007, geography = "place",
state = "CA",
show_call = TRUE,output="wide", variables = selvars)
temp2007$Year <- "2007"
#------------------------------------------------------------------------------------
These sets of codes are repeated for each ACS single-year of interest. Note that I’m
adding a new variable “Year” to each data frame. Otherwise, I have no indication of the
year of each data frame, other than the actual name of the file!
In the following “R” step, I’m using the “dplyr” function “rbind.fill” to concatenate a
lot of data frames!
# Example 3.15: Concatenate (pancake) data frames: lots of records
# Concatenate All Years .....
# rbind can only concatenate two dataframes at a time. rbind.fill can do 2-or-more data
# frames to concatenate. It's a plyr function.
# temp0506 <- rbind(temp2005,temp2006)
# temp0507 <- rbind(temp0506,temp2007)
tempall <- rbind.fill(temp2005,temp2006,temp2007,temp2008,temp2009,
temp2010,temp2011,temp2012,temp2013,temp2014,
temp2015,temp2016,temp2017,temp2018)
# Add a couple of useful variables!
# need to have a if/then to catch zero values.. work on this later.
# tempall$Avg_HHSize <- tempall$HHldPop_E / tempall$Househlds_E
# tempall$MeanHHInc <- tempall$Agg_HHInc_E / tempall$Househlds_E
# Sort the Results by GEOID and then by Year
tempalls <- tempall[order(tempall$GEOID,tempall$Year),]
setwd("~/Desktop/tidycensus_work/output")
# Export the data frames to CSV files, for importing to Excel, and applying finishing
touches
write.csv(tempalls,"ACS_AllYears_Calif_Places_Stacked.csv")
In the following step I’m extracting data for large places in the San Francisco Bay Area.
# Example 3.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
# Read in a file with the Large SF Bay Area Places, > 65,000 population
# and merge with the All Large California Places
bayplace <- read.csv("BayArea_Places_65K.csv")
Bayplace1 <- merge(bayplace,tempalls, by = c('NAME'))
Bayplace1 <- Bayplace1[order(Bayplace1$GEOID.x,Bayplace1$Year),]
write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places_Stacked.csv")
dput(names(Bayplace1))
In the following step I’m extracting data for just “Hayward” city in the San Francisco Bay
Area. This uses the “R” function “grepl”. (That’s grep-ell, not grep-one).
# Example 3.17: Extract data for one place using a string search on the place name
# Extract one place at a time from Bayplace1
#
Hayward <- filter(Bayplace1, grepl("Hayward",NAME,fixed=TRUE))
Hayward <- Hayward[order(Hayward$Year),]
Hayward$Avg_HHSize <- Hayward$HHldPop_E / Hayward$Househlds_E
Hayward$MeanHHInc <- Hayward$Agg_HHInc_E / Hayward$Househlds_E
selvarxxx <- c("Year","NAME", "GEOID.x",
"NAME2",
"TotalPop_E", "Med_HHInc_E",
"Agg_HHInc_E", "HHldPop_E", "Househlds_E",
"Owner_OccDU_E", "Rent_OccDU_E",
"Med_HHVal_E", "Avg_HHSize", "MeanHHInc" )
Hayward2 <- Hayward[selvarxxx]
write.csv(Hayward2,"ACS_AllYears_BaseVar_Hayward_Stacked.csv")
#####################################################################################
This concludes Example #3: “multiple geographies / multiple years/ multiple variables”
with only one record (row) per each geography/year combination, or the “stacked” output.