Life After AFF - Tidycensus Example Set #3 - ctpp-news

31 Aug 2020

Here’s my writeup on creating on what I call a “stacked” output from tidycensus: one
record per each unique geography / year combination. This approach may be preferred if
you’re trying to create a “data profile” for a specific geographic area, with rows
representing the years, and columns representing the various variables of interest (total
population, household population, workers by means of transportation to work, etc.)

I’m almost done! Hope this helps!

Chuck Purvis,
Hayward, California

Example #3. More Complex Tidycensus examples: multiple years, multiple geographies,
multiple variables. “Stacked” results.

This is an example of stacking “R” data frames, where each record (row) represents a
unique geography/year combination.

Step #0. Always need to load the relevant packages/libraries when starting up a new
R-session. I’m loading the “R” package “plyr” which helps in stacking / concatenating /
pancaking data frames.

# Step 0: Load relevant libraries into each R-session.

library(tidyverse)

library(tidycensus)

library(janitor)

library(plyr) # This is needed for a function to concatenate a lot of files in one
statement!

In this set of examples, I’m extracting single year ACS variables (2005-2018) for all
large (65,000+ population) places in the State of California. Very similar to Example #2,
but with one record (row) per each geography/year combination.

#  Example 3.1 through 3.14: Run get_acs for large California Places, 2005-2018

#  Example 3.15:             Concatenate (pancake) data frames: lots of records

#  Example 3.16:             Merge in a file of Large San Francisco Bay Area places, and
subset file.

#  Example 3.17:             Extract data for one place using a string search on the place
name

#------------------------------------------------------------------------------------

# Set a list of variables to extract in each iteration of get_acs

#  This is a LOT more efficient for variable naming!!!

selvars  <- c(TotalPop_   = "B06001_001", # Total Population

              Med_HHInc_  = "B19013_001", # Median Household Income

              Agg_HHInc_  = "B19025_001", # Aggregate Household Income

              HHldPop_    = "B11002_001", # Population in Households

              Househlds_  = "B25003_001", # Total Households

              Owner_OccDU_= "B25003_002", # Owner-Occupied Dwelling Units

              Rent_OccDU_ = "B25003_003", # Renter-Occupied Dwelling Units

              Med_HHVal_  = "B25077_001")

#------------------------------------------------------------------------------------

temp2005  <- get_acs(survey="acs1", year=2005, geography = "place",
  state = "CA",

                     show_call = TRUE,output="wide", variables = selvars)

temp2005$Year       <- "2005"

#------------------------------------------------------------------------------------

temp2006  <- get_acs(survey="acs1", year=2006, geography = "place",
  state = "CA",

                     show_call = TRUE,output="wide", variables = selvars)

temp2006$Year       <- "2006"

#------------------------------------------------------------------------------------

temp2007  <- get_acs(survey="acs1", year=2007, geography = "place",
  state = "CA",

                     show_call = TRUE,output="wide", variables = selvars)

temp2007$Year       <- "2007"

#------------------------------------------------------------------------------------

These sets of codes are repeated for each ACS single-year of interest. Note that I’m
adding a new variable “Year” to each data frame. Otherwise, I have no indication of the
year of each data frame, other than the actual name of the file!

In the following “R” step, I’m using the “dplyr” function “rbind.fill” to concatenate a
lot of data frames!

#  Example 3.15:             Concatenate (pancake) data frames: lots of records

#  Concatenate All Years .....

#  rbind can only concatenate two dataframes at a time. rbind.fill can do 2-or-more data

#   frames to concatenate. It's a plyr function.

# temp0506 <- rbind(temp2005,temp2006)

# temp0507 <- rbind(temp0506,temp2007)

tempall <- rbind.fill(temp2005,temp2006,temp2007,temp2008,temp2009,

                      temp2010,temp2011,temp2012,temp2013,temp2014,

                      temp2015,temp2016,temp2017,temp2018)

# Add a couple of useful variables!

# need to have a if/then to catch zero values.. work on this later.

# tempall$Avg_HHSize <- tempall$HHldPop_E / tempall$Househlds_E

# tempall$MeanHHInc  <- tempall$Agg_HHInc_E / tempall$Househlds_E

# Sort the Results by GEOID and then by Year

tempalls <- tempall[order(tempall$GEOID,tempall$Year),]

setwd("~/Desktop/tidycensus_work/output")

# Export the data frames to CSV files, for importing to Excel, and applying finishing
touches

write.csv(tempalls,"ACS_AllYears_Calif_Places_Stacked.csv")

In the following step I’m extracting data for large places in the San Francisco Bay Area.

#  Example 3.16: Merge in a file of Large San Francisco Bay Area places, and subset file.

# Read in a file with the Large SF Bay Area Places, > 65,000 population

# and merge with the All Large California Places

bayplace <- read.csv("BayArea_Places_65K.csv")

Bayplace1 <- merge(bayplace,tempalls,  by = c('NAME'))

Bayplace1 <- Bayplace1[order(Bayplace1$GEOID.x,Bayplace1$Year),]

write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places_Stacked.csv")

dput(names(Bayplace1))

In the following step I’m extracting data for just “Hayward” city in the San Francisco Bay
Area. This uses the “R” function “grepl”. (That’s grep-ell, not grep-one).

#  Example 3.17:  Extract data for one place using a string search on the place name

#     Extract one place at a time from Bayplace1

#

Hayward <- filter(Bayplace1, grepl("Hayward",NAME,fixed=TRUE))

Hayward <- Hayward[order(Hayward$Year),]

Hayward$Avg_HHSize <- Hayward$HHldPop_E / Hayward$Househlds_E

Hayward$MeanHHInc  <- Hayward$Agg_HHInc_E / Hayward$Househlds_E

selvarxxx <- c("Year","NAME", "GEOID.x",
"NAME2", 

             "TotalPop_E", "Med_HHInc_E",

             "Agg_HHInc_E", "HHldPop_E", "Househlds_E",

             "Owner_OccDU_E", "Rent_OccDU_E",

             "Med_HHVal_E", "Avg_HHSize", "MeanHHInc" )

Hayward2 <- Hayward[selvarxxx]

write.csv(Hayward2,"ACS_AllYears_BaseVar_Hayward_Stacked.csv")

#####################################################################################

This concludes Example #3: “multiple geographies / multiple years/ multiple variables”
with only one record (row) per each geography/year combination, or the “stacked” output.