For some reason my Tidycensus Example #2 didn’t get posted. I’ll retry. Maybe there’s a size limit, since the previous attempt had a fairly large pic embedded in it??
Chuck Purvis, Hayward, California….
Example #2. More Complex Tidycensus examples: multiple years, multiple geographies, multiple variables.
This is a more complex example of a script to “pull” select variables from the ACS using my Census API key and the R package tidycensus.
Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. Otherwise, it won’t work.
# Step 0: Load relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
In this set of examples, I’m extracting single year ACS variables (2005-2018) for all large (65,000+ population) places in the State of California.
# The get_acs function is run for each year of the single-year ACS data, from 2005 to 2018.
# Note that group quarters data was not collected in 2005, but started in 2006.
# Note the "_05_" included in the variable name in the first data "pull". That's a # # mnemonic device that tells us it's for the year 2005.
# Example 2.1 through 2.14: Run get_acs for large California Places, 2005-2018
# Example 2.15: Merge together data frames into a VERY wide database...lots of columns!
# Example 2.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
#-------------------------------------------------------------------------------
place05 <- get_acs(survey="acs1", year=2005, geography = "place", state = "CA",
show_call = TRUE, output="wide",
variables = c(TotalPop_05_ = "B06001_001", # Total Population
Med_HHInc_05_ = "B19013_001", # Median Household Income
Agg_HHInc_05_ = "B19025_001", # Aggregate Household Income
HHldPop_05_ = "B11002_001", # Population in Households
Househlds_05_ = "B25003_001", # Total Households
Owner_OccDU_05_= "B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_05_ = "B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_05_ = "B25077_001")) # Median Value of Owner-Occ DUs
place05$Avg_HHSize_05 <- place05$HHldPop_05_E / place05$Househlds_05_E
place05$MeanHHInc_05 <- place05$Agg_HHInc_05_E / place05$Househlds_05_E
#------------------------------------------------------------------------------------
place06 <- get_acs(survey="acs1", year=2006, geography = "place", state = "CA",
show_call = TRUE, output="wide",
variables = c(TotalPop_06_ = "B06001_001", # Total Population
Med_HHInc_06_ = "B19013_001", # Median Household Income
Agg_HHInc_06_ = "B19025_001", # Aggregate Household Income
HHldPop_06_ = "B11002_001", # Population in Households
Househlds_06_ = "B25003_001", # Total Households
Owner_OccDU_06_= "B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_06_ = "B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_06_ = "B25077_001")) # Median Value of Owner-Occ DUs
place06$Avg_HHSize_06 <- place06$HHldPop_06_E / place06$Househlds_06_E
place06$MeanHHInc_06 <- place06$Agg_HHInc_06_E / place06$Househlds_06_E
#------------------------------------------------------------------------------------
These sets of codes are repeated for each single year ACS of interest, say for 2005 through 2018. Smarter “R” programmers will be able to tell me about “do loops” to make this process more efficient with magical wild cards.
The following step merges the data frames using the GEOID/NAME variables. This create a very “wide” database. One record per geography, and each column representing the variable/year combinations.
The “merge” function in “R” allows only two data frames to be joined by common columns at a time. I have yet to find a “R” function that allows me to merge all of the data frames at once.
#####################################################################################
# Example 2.15: Merge together data frames into a VERY wide database...lots of columns!
# Merge the dataframes, adding a year in each step. All=TRUE is needed if # of places is different.
#
# (R-language newbie script...There are probably more terse/exotic ways of doing this!)
place0506 <- merge(place05, place06, by = c('GEOID','NAME'), all=TRUE)
place0507 <- merge(place0506,place07, by = c('GEOID','NAME'), all=TRUE)
place0508 <- merge(place0507,place08, by = c('GEOID','NAME'), all=TRUE)
place0509 <- merge(place0508,place09, by = c('GEOID','NAME'), all=TRUE)
place0510 <- merge(place0509,place10, by = c('GEOID','NAME'), all=TRUE)
place0511 <- merge(place0510,place11, by = c('GEOID','NAME'), all=TRUE)
place0512 <- merge(place0511,place12, by = c('GEOID','NAME'), all=TRUE)
place0513 <- merge(place0512,place13, by = c('GEOID','NAME'), all=TRUE)
place0514 <- merge(place0513,place14, by = c('GEOID','NAME'), all=TRUE)
place0515 <- merge(place0514,place15, by = c('GEOID','NAME'), all=TRUE)
place0516 <- merge(place0515,place16, by = c('GEOID','NAME'), all=TRUE)
place0517 <- merge(place0516,place17, by = c('GEOID','NAME'), all=TRUE)
place0518 <- merge(place0517,place18, by = c('GEOID','NAME'), all=TRUE)
place_all <- place0518
View(place_all)
Sometimes you want to create smaller data frames with just a select number of columns. Here’s a good approach for that.
# The following functions output useful lists to the R-studio console which can then be edited
names(place_all)
dput(names(place_all)) # most useful for subsetting variables
# The purpose here is to re-order and select variables into a much more compact
# database, for eventual exporting into a CSV file, and then into Excel for finishing touches.
selvars <- c("GEOID", "NAME",
"TotalPop_05_E", "TotalPop_06_E", "TotalPop_07_E", "TotalPop_08_E",
"TotalPop_09_E", "TotalPop_10_E", "TotalPop_11_E", "TotalPop_12_E",
"TotalPop_13_E", "TotalPop_14_E", "TotalPop_15_E", "TotalPop_16_E",
"TotalPop_17_E", "TotalPop_18_E")
# note the brackets for outputing new data frame from previous data frame....
place_all2 <- place_all[selvars]
# View the Selected Variables Table
View(place_all2)
# Set directory for exported data files, MacOS directory style
setwd("~/Desktop/tidycensus_work/output")
# Export the data frames to CSV files, for importing to Excel, and applying finishing touches
write.csv(place_all2,"ACS_AllYears_TotalPop_Calif_Places.csv")
write.csv(place_all, "ACS_AllYears_BaseVar_Calif_Places.csv")
In this last example, I’m reading in a file of large places in the Bay Area (manually derived from the CSV file created previous) in order to subset Bay Area “large places” from State of California “large places”.
#####################################################################################
# Example 2.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
# Read in a file with the Large SF Bay Area Places, > 65,000 population
# and merge with the All Large California Places
bayplace <- read.csv("BayArea_Places_65K.csv")
Bayplace1 <- merge(bayplace,place_all, by = c('NAME'))
Bayplace2 <- merge(bayplace,place_all2, by = c('NAME'))
write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places.csv")
write.csv(Bayplace2,"ACS_AllYears_TotalPop_BayArea_Places.csv")
This concludes Example #2: “multiple geographies / multiple years/ multiple variables” with only one record (row) per each geography.
This is a followup to my 8/6/20 e-mail to the CTPP-News listserv. Just two more after this one. An “R” script is attached.
Example #2. More Complex Tidycensus examples: multiple years, multiple geographies, multiple variables.
This is a more complex example of a script to “pull” select variables from the ACS using my Census API key and the R package tidycensus.
Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. Otherwise, it won’t work.
# Step 0: Load relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
In this set of examples, I’m extracting single year ACS variables (2005-2018) for all large (65,000+ population) places in the State of California.
# The get_acs function is run for each year of the single-year ACS data, from 2005 to 2018.
# Note that group quarters data was not collected in 2005, but started in 2006.
# Note the "_05_" included in the variable name in the first data "pull". That's a # # mnemonic device that tells us it's for the year 2005.
# Example 2.1 through 2.14: Run get_acs for large California Places, 2005-2018
# Example 2.15: Merge together data frames into a VERY wide database...lots of columns!
# Example 2.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
#-------------------------------------------------------------------------------
place05 <- get_acs(survey="acs1", year=2005, geography = "place", state = "CA",
show_call = TRUE, output="wide",
variables = c(TotalPop_05_ = "B06001_001", # Total Population
Med_HHInc_05_ = "B19013_001", # Median Household Income
Agg_HHInc_05_ = "B19025_001", # Aggregate Household Income
HHldPop_05_ = "B11002_001", # Population in Households
Househlds_05_ = "B25003_001", # Total Households
Owner_OccDU_05_= "B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_05_ = "B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_05_ = "B25077_001")) # Median Value of Owner-Occ DUs
place05$Avg_HHSize_05 <- place05$HHldPop_05_E / place05$Househlds_05_E
place05$MeanHHInc_05 <- place05$Agg_HHInc_05_E / place05$Househlds_05_E
#------------------------------------------------------------------------------------
place06 <- get_acs(survey="acs1", year=2006, geography = "place", state = "CA",
show_call = TRUE, output="wide",
variables = c(TotalPop_06_ = "B06001_001", # Total Population
Med_HHInc_06_ = "B19013_001", # Median Household Income
Agg_HHInc_06_ = "B19025_001", # Aggregate Household Income
HHldPop_06_ = "B11002_001", # Population in Households
Househlds_06_ = "B25003_001", # Total Households
Owner_OccDU_06_= "B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_06_ = "B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_06_ = "B25077_001")) # Median Value of Owner-Occ DUs
place06$Avg_HHSize_06 <- place06$HHldPop_06_E / place06$Househlds_06_E
place06$MeanHHInc_06 <- place06$Agg_HHInc_06_E / place06$Househlds_06_E
#------------------------------------------------------------------------------------
These sets of codes are repeated for each single year ACS of interest, say for 2005 through 2018. Smarter “R” programmers will be able to tell me about “do loops” to make this process more efficient with magical wild cards.
The following step merges the data frames using the GEOID/NAME variables. This create a very “wide” database. One record per geography, and each column representing the variable/year combinations.
The “merge” function in “R” allows only two data frames to be joined by common columns at a time. I have yet to find a “R” function that allows me to merge all of the data frames at once.
#####################################################################################
# Example 2.15: Merge together data frames into a VERY wide database...lots of columns!
# Merge the dataframes, adding a year in each step. All=TRUE is needed if # of places is different.
#
# (R-language newbie script...There are probably more terse/exotic ways of doing this!)
place0506 <- merge(place05, place06, by = c('GEOID','NAME'), all=TRUE)
place0507 <- merge(place0506,place07, by = c('GEOID','NAME'), all=TRUE)
place0508 <- merge(place0507,place08, by = c('GEOID','NAME'), all=TRUE)
place0509 <- merge(place0508,place09, by = c('GEOID','NAME'), all=TRUE)
place0510 <- merge(place0509,place10, by = c('GEOID','NAME'), all=TRUE)
place0511 <- merge(place0510,place11, by = c('GEOID','NAME'), all=TRUE)
place0512 <- merge(place0511,place12, by = c('GEOID','NAME'), all=TRUE)
place0513 <- merge(place0512,place13, by = c('GEOID','NAME'), all=TRUE)
place0514 <- merge(place0513,place14, by = c('GEOID','NAME'), all=TRUE)
place0515 <- merge(place0514,place15, by = c('GEOID','NAME'), all=TRUE)
place0516 <- merge(place0515,place16, by = c('GEOID','NAME'), all=TRUE)
place0517 <- merge(place0516,place17, by = c('GEOID','NAME'), all=TRUE)
place0518 <- merge(place0517,place18, by = c('GEOID','NAME'), all=TRUE)
place_all <- place0518
View(place_all)
Sometimes you want to create smaller data frames with just a select number of columns. Here’s a good approach for that.
# The following functions output useful lists to the R-studio console which can then be edited
names(place_all)
dput(names(place_all)) # most useful for subsetting variables
# The purpose here is to re-order and select variables into a much more compact
# database, for eventual exporting into a CSV file, and then into Excel for finishing touches.
selvars <- c("GEOID", "NAME",
"TotalPop_05_E", "TotalPop_06_E", "TotalPop_07_E", "TotalPop_08_E",
"TotalPop_09_E", "TotalPop_10_E", "TotalPop_11_E", "TotalPop_12_E",
"TotalPop_13_E", "TotalPop_14_E", "TotalPop_15_E", "TotalPop_16_E",
"TotalPop_17_E", "TotalPop_18_E")
# note the brackets for outputing new data frame from previous data frame....
place_all2 <- place_all[selvars]
# View the Selected Variables Table
View(place_all2)
# Set directory for exported data files, MacOS directory style
setwd("~/Desktop/tidycensus_work/output")
# Export the data frames to CSV files, for importing to Excel, and applying finishing touches
write.csv(place_all2,"ACS_AllYears_TotalPop_Calif_Places.csv")
write.csv(place_all, "ACS_AllYears_BaseVar_Calif_Places.csv")
In this last example, I’m reading in a file of large places in the Bay Area (manually derived from the CSV file created previous) in order to subset Bay Area “large places” from State of California “large places”.
#####################################################################################
# Example 2.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
# Read in a file with the Large SF Bay Area Places, > 65,000 population
# and merge with the All Large California Places
bayplace <- read.csv("BayArea_Places_65K.csv")
Bayplace1 <- merge(bayplace,place_all, by = c('NAME'))
Bayplace2 <- merge(bayplace,place_all2, by = c('NAME'))
write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places.csv")
write.csv(Bayplace2,"ACS_AllYears_TotalPop_BayArea_Places.csv")
Here’s a screenshot of a summary table showing total population in large places in the San Francisco Bay Area, 2005-2018. (It’s a big table, better viewed in landscape mode, letter size!)
This concludes Example #2: “multiple geographies / multiple years/ multiple variables” with only one record (row) per each geography.
This is the last tidycensus example that I’ve prepared. Hope this proves useful!
Stay safe!
Chuck Purvis,
Hayward, California
formerly of the Metropolitan Transportation Commission, San Francisco California.
clpurvis(a)att.net
Example #4. Explore the Geographies that Tidycensus can pull out.
This is an example of using tidycensus to extract 2014/18 ACS data for all available geographies for the entire USA. It’s just a test of capabilities.
Some geographic levels (county subdivision, state upper and lower houses) don’t appear to be working.
There is some extra code in the “PUMA step” to tally the number of PUMAs per US state, and the minimum and maximum total population levels for each PUMA in the US.
# Step 0: Load relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
library(janitor)
library(plyr) # This is needed for a function to concatenate a lot of files in one statement!
library(dplyr)
# Add the variable Geography_Name to each data frame. Maybe concatenate/pancake these dataframes?
selvars <- c(TotalPop_ = "B06001_001", # Total Population
SamplePop_ = "B00001_001", # Unweighted Sample Count of Population
HHUnits_ = "B25002_001", # Total Housing Units
Househlds_ = "B25002_002", # Total Households
SampleDU_ = "B00002_001") # Unweighted Sample Count of Dwelling Units
#------------------------------------------------------------------------------------
us <- get_acs(survey="acs5", year=2018, geography = "us",
show_call = TRUE,output="wide", variables = selvars)
us$Geography_Name <- "us"
#------------------------------------------------------------------------------------
region <- get_acs(survey="acs5", year=2018, geography = "region",
show_call = TRUE,output="wide", variables = selvars)
region$Geography_Name <- "region"
#------------------------------------------------------------------------------------
division <- get_acs(survey="acs5", year=2018, geography = "division",
show_call = TRUE,output="wide", variables = selvars)
division$Geography_Name <- "division"
#------------------------------------------------------------------------------------
state <- get_acs(survey="acs5", year=2018, geography = "state",
show_call = TRUE,output="wide", variables = selvars)
state$Geography_Name <- "state"
setwd("~/Desktop/tidycensus_work/output")
write.csv(state,"ACS1418_USA_State_1.csv")
#------------------------------------------------------------------------------------
county <- get_acs(survey="acs5", year=2018, geography = "county",
show_call = TRUE,output="wide", variables = selvars)
county$Geography_Name <- "county"
#------------------------------------------------------------------------------------
# County Subdivision isn't working ...returns an API error (unknown/unsupported geography)
countysubdiv <- get_acs(survey="acs5", year=2018, geography = "county subdivision",
show_call = TRUE,output="wide", variables = selvars)
countysubdiv$Geography_Name <- "countysubdiv"
#------------------------------------------------------------------------------------
# Pull just the tracts in Alameda County, California
tract <- get_acs(survey="acs5", year=2018, geography = "tract", state="CA",
county="Alameda",show_call = TRUE,output="wide", variables = selvars)
tract$Geography_Name <- "tract"
#------------------------------------------------------------------------------------
# Pull just the block groups in Alameda County, California
blockgroup<- get_acs(survey="acs5", year=2018, geography = "block group", state="CA", county="Alameda",show_call = TRUE,output="wide", variables = selvars)
blockgroup$Geography_Name <- "blockgroup"
#------------------------------------------------------------------------------------
place <- get_acs(survey="acs5", year=2018, geography = "place",
show_call = TRUE,output="wide", variables = selvars)
place$Geography_Name <- "place"
#------------------------------------------------------------------------------------
urban <- get_acs(survey="acs5", year=2018, geography = "urban area",
show_call = TRUE,output="wide", variables = selvars)
urban$Geography_Name <- "urban"
#------------------------------------------------------------------------------------
congdist <- get_acs(survey="acs5", year=2018, geography = "congressional district",
show_call = TRUE,output="wide", variables = selvars)
congdist$Geography_Name <- "congdist"
#------------------------------------------------------------------------------------
puma <- get_acs(survey="acs5", year=2018, geography = "public use microdata area",
show_call = TRUE,output="wide", variables = selvars)
puma$TotalPop2 <- puma$TotalPop_E * 1.0
puma$Tally <- 1.0
puma$State <- substr(puma$GEOID,1,2)
puma$Geography_Name <- "puma"
pumas <- puma[order(puma$State,puma$GEOID),]
summary(pumas)
sum1 <- aggregate(pumas[,3:12],
by = list(pumas$State),
FUN = sum, na.rm=TRUE)
min1 <- aggregate(pumas[,3:12],
by = list(pumas$State),
FUN = min, na.rm=TRUE)
max1 <- aggregate(pumas[,3:12],
by = list(pumas$State),
FUN = max, na.rm=TRUE)
setwd("~/Desktop/tidycensus_work/output")
write.csv(sum1,"ACS1418_USA_PUMA_sum_by_State_1.csv")
write.csv(min1,"ACS1418_USA_PUMA_min_by_State_1.csv")
write.csv(max1,"ACS1418_USA_PUMA_max_by_State_1.csv")
write.csv(pumas,"ACS1418_USA_PUMA_All_1.csv")
sum2 <- pumas %>%
group_by(State) %>%
summarize_at("TotalPop_E",
list(name=sum))
#------------------------------------------------------------------------------------
csa <- get_acs(survey="acs5", year=2018, geography = "combined statistical area",
show_call = TRUE,output="wide", variables = selvars)
csa$Geography_Name <- "csa"
#------------------------------------------------------------------------------------
msamisa <- get_acs(survey="acs5", year=2018, geography = "metropolitan statistical area/micropolitan statistical area",
show_call = TRUE,output="wide", variables = selvars)
msamisa$Geography_Name <- "msamisa"
#------------------------------------------------------------------------------------
zcta <- get_acs(survey="acs5", year=2018, geography = "zcta",
show_call = TRUE,output="wide", variables = selvars)
zcta$Geography_Name <- "zcta"
#------------------------------------------------------------------------------------
# State Senate and House aren't working ...returns an API error (unknown/unsupported geography)
statesenate <- get_acs(survey="acs5", year=2018, geography = "state legislative district (upper chamber)",
show_call = TRUE,output="wide", variables = selvars)
statesenate$Geography_Name <- "statesenate"
#------------------------------------------------------------------------------------
statehouse <- get_acs(survey="acs5", year=2018, geography = "state legislative district (lower chamber)",
show_call = TRUE,output="wide", variables = selvars)
statehouse$Geography_Name <- "statehouse"
#------------------------------------------------------------------------------------
This concludes Example #4: “exploring tidycensus geography.”
Here’s my writeup on creating on what I call a “stacked” output from tidycensus: one record per each unique geography / year combination. This approach may be preferred if you’re trying to create a “data profile” for a specific geographic area, with rows representing the years, and columns representing the various variables of interest (total population, household population, workers by means of transportation to work, etc.)
I’m almost done! Hope this helps!
Chuck Purvis,
Hayward, California
Example #3. More Complex Tidycensus examples: multiple years, multiple geographies, multiple variables. “Stacked” results.
This is an example of stacking “R” data frames, where each record (row) represents a unique geography/year combination.
Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. I’m loading the “R” package “plyr” which helps in stacking / concatenating / pancaking data frames.
# Step 0: Load relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
library(janitor)
library(plyr) # This is needed for a function to concatenate a lot of files in one statement!
In this set of examples, I’m extracting single year ACS variables (2005-2018) for all large (65,000+ population) places in the State of California. Very similar to Example #2, but with one record (row) per each geography/year combination.
# Example 3.1 through 3.14: Run get_acs for large California Places, 2005-2018
# Example 3.15: Concatenate (pancake) data frames: lots of records
# Example 3.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
# Example 3.17: Extract data for one place using a string search on the place name
#------------------------------------------------------------------------------------
# Set a list of variables to extract in each iteration of get_acs
# This is a LOT more efficient for variable naming!!!
selvars <- c(TotalPop_ = "B06001_001", # Total Population
Med_HHInc_ = "B19013_001", # Median Household Income
Agg_HHInc_ = "B19025_001", # Aggregate Household Income
HHldPop_ = "B11002_001", # Population in Households
Househlds_ = "B25003_001", # Total Households
Owner_OccDU_= "B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_ = "B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_ = "B25077_001")
#------------------------------------------------------------------------------------
temp2005 <- get_acs(survey="acs1", year=2005, geography = "place", state = "CA",
show_call = TRUE,output="wide", variables = selvars)
temp2005$Year <- "2005"
#------------------------------------------------------------------------------------
temp2006 <- get_acs(survey="acs1", year=2006, geography = "place", state = "CA",
show_call = TRUE,output="wide", variables = selvars)
temp2006$Year <- "2006"
#------------------------------------------------------------------------------------
temp2007 <- get_acs(survey="acs1", year=2007, geography = "place", state = "CA",
show_call = TRUE,output="wide", variables = selvars)
temp2007$Year <- "2007"
#------------------------------------------------------------------------------------
These sets of codes are repeated for each ACS single-year of interest. Note that I’m adding a new variable “Year” to each data frame. Otherwise, I have no indication of the year of each data frame, other than the actual name of the file!
In the following “R” step, I’m using the “dplyr” function “rbind.fill” to concatenate a lot of data frames!
# Example 3.15: Concatenate (pancake) data frames: lots of records
# Concatenate All Years .....
# rbind can only concatenate two dataframes at a time. rbind.fill can do 2-or-more data
# frames to concatenate. It's a plyr function.
# temp0506 <- rbind(temp2005,temp2006)
# temp0507 <- rbind(temp0506,temp2007)
tempall <- rbind.fill(temp2005,temp2006,temp2007,temp2008,temp2009,
temp2010,temp2011,temp2012,temp2013,temp2014,
temp2015,temp2016,temp2017,temp2018)
# Add a couple of useful variables!
# need to have a if/then to catch zero values.. work on this later.
# tempall$Avg_HHSize <- tempall$HHldPop_E / tempall$Househlds_E
# tempall$MeanHHInc <- tempall$Agg_HHInc_E / tempall$Househlds_E
# Sort the Results by GEOID and then by Year
tempalls <- tempall[order(tempall$GEOID,tempall$Year),]
setwd("~/Desktop/tidycensus_work/output")
# Export the data frames to CSV files, for importing to Excel, and applying finishing touches
write.csv(tempalls,"ACS_AllYears_Calif_Places_Stacked.csv")
In the following step I’m extracting data for large places in the San Francisco Bay Area.
# Example 3.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
# Read in a file with the Large SF Bay Area Places, > 65,000 population
# and merge with the All Large California Places
bayplace <- read.csv("BayArea_Places_65K.csv")
Bayplace1 <- merge(bayplace,tempalls, by = c('NAME'))
Bayplace1 <- Bayplace1[order(Bayplace1$GEOID.x,Bayplace1$Year),]
write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places_Stacked.csv")
dput(names(Bayplace1))
In the following step I’m extracting data for just “Hayward” city in the San Francisco Bay Area. This uses the “R” function “grepl”. (That’s grep-ell, not grep-one).
# Example 3.17: Extract data for one place using a string search on the place name
# Extract one place at a time from Bayplace1
#
Hayward <- filter(Bayplace1, grepl("Hayward",NAME,fixed=TRUE))
Hayward <- Hayward[order(Hayward$Year),]
Hayward$Avg_HHSize <- Hayward$HHldPop_E / Hayward$Househlds_E
Hayward$MeanHHInc <- Hayward$Agg_HHInc_E / Hayward$Househlds_E
selvarxxx <- c("Year","NAME", "GEOID.x", "NAME2",
"TotalPop_E", "Med_HHInc_E",
"Agg_HHInc_E", "HHldPop_E", "Househlds_E",
"Owner_OccDU_E", "Rent_OccDU_E",
"Med_HHVal_E", "Avg_HHSize", "MeanHHInc" )
Hayward2 <- Hayward[selvarxxx]
write.csv(Hayward2,"ACS_AllYears_BaseVar_Hayward_Stacked.csv")
#####################################################################################
This concludes Example #3: “multiple geographies / multiple years/ multiple variables” with only one record (row) per each geography/year combination, or the “stacked” output.
Apologies if you have already seen this. According to my read Table
B98001 (Unweighted Housing Unit Sample) will be published for nation,
state, county and place in both the 1-year and 5-year ACS beginning with
the 2019 ACS release, and I assume moving forward. What struck me was
that there was no reference to the B98001 information ever being
available for tracks or below ever again.
-------- Forwarded Message --------
Subject: Research Matters Blog: ACS Updates on Disclosure Avoidance and
Release Plans
Date: Thu, 20 Aug 2020 12:20:48 -0500
From: U.S. Census Bureau <census(a)subscriptions.census.gov>
Reply-To: census(a)subscriptions.census.gov
To: edc(a)berwyned.com
Research Matters Blog: ACS Updates on Disclosure Avoidance and Release
Plans
Registered United States Census Bureau Logo
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDAsInVy…>
*ACS Updates on Disclosure Avoidance and Release Plans*
*RESEARCH MATTERS BLOG*
AUG. 20, 2020**
Written By: Dr. John M. Abowd, chief scientist and associate director
for Research and Methodology, and Donna M. Daily, chief, American
Community Survey Office
Despite changes and delays during this unprecedented time, we are happy
to report that the U.S. Census Bureau is on track to release the 2019
American Community Survey (ACS) 1-year estimates as scheduled Sept. 17,
2020. Check out our website for the full data release schedule and other
details
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDEsInVy…>_._
As we prepare to release the 2019 ACS products, we want to remind data
users of our commitment to protect respondent privacy and
confidentiality. Prior blogs
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDIsInVy…>
outlined steps the Census Bureau is taking to modernize the procedures
we use to protect respondent data. Our adoption of formal privacy will
allow us to strengthen safeguards and increase transparency about the
impact of privacy protections on data accuracy.
As Deputy Director Ron Jarmin previously stated
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDMsInVy…>,
we do not plan to implement formal privacy for the full suite of ACS
data products before 2025. In the interim, we will continue to evaluate
existing privacy protections and bolster them as necessary to address
privacy risks that emerge. Our goal is to maintain the utility of the
ACS as the preeminent federal survey for federal, state and local data
users, while remaining committed to our legal obligations to protect
confidentiality.**
One area where we are strengthening our disclosure avoidance methods is
the count of final interviews published in our quality measures and
other detailed tables. We will continue to publish the quality measures
tables that provide this information for /select/ geographies. But we
are discontinuing the tables that included this information for /all/
geographies.^^[1] <#_ftn1> In addition, we are adding “noise” to the
interview counts to cut the risk of disclosure while still providing a
general indicator of data quality for the geography of interest. We will
continue to publish the household sample sizes selected for invitation
to complete the ACS, without added noise or rounding. The reason: These
sample sizes are properties of the ACS design, not its realized sample,
and provide a more robust indicator of data quality for very small
geographies.
Table B98001 (Unweighted Housing Unit Sample) will be published for
nation, state, county and place in both the 1-year and 5-year ACS
beginning with the 2019 ACS release. Previously, the table was not
published for places in the 1-year ACS. A new table will be created of
final person interviews, which will be released for nation, state,
county and place in the 1-year and 5-year ACS. This will provide the
same information as the former B00001, but will be restricted to the
same summary levels as B98001.
The Census Bureau has a tradition and public expectation of producing
high quality statistics while protecting the confidentiality of
respondents. We will work closely with our scientific and data user
communities as we explore options for modernizing ACS privacy
protections while ensuring the data products’ continued high quality and
fitness-for-use. The Census Bureau is funding collaboration
opportunities with external researchers on the issue of formal privacy
for sample surveys. A key deliverable of such collaboration will be
establishing effective data user engagement. We will provide more
information about this effort as it becomes available.
Send comments or questions to <ACSprivacy(a)census.gov>.
------------------------------------------------------------------------
[1]
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDQsInVy…>
These data were published in B00001 and B00002 in the 1-year and 5-year
ACS products as well as K200001 and K200002 in the 1-year supplemental
ACS product.
Learn More
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDUsInVy…>
Resources for the Media
We're here to help you get the most out of our tipsheets, press
releases, trainings, and media services. If you have a question, give us
a call at 301-763-3030 or contact our Public Information Office at
pio(a)census.gov <mailto:pio@census.gov>.
Share This
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDYsInVy…>
*Stay connected with us!*
Join the conversation on social media.
facebook
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDcsInVy…>twitter
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDgsInVy…>linkedin
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMDksInVy…>youtube
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMTAsInVy…>instagram
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMTEsInVy…>
SUBSCRIBER SERVICES:
Subscriber Settings
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMTIsInVy…>
| Remove me from All Subscriptions
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMTMsInF1…> |
Help
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMTQsInVy…>
------------------------------------------------------------------------
This is an official email from the U.S. Census Bureau. If you have any
questions or comments, please contact us
(http://www.census.gov/about/contact-us.html
<https://lnks.gd/l/eyJhbGciOiJIUzI1NiJ9.eyJidWxsZXRpbl9saW5rX2lkIjoxMTUsInVy…>).
Good Morning,
The marketing department at AASHTO is taking a crack at helping us get more on the map. One major benefit to this is that when our next funding cycle rolls around (2023) we will be a bit ahead in the "materials for decision makers" department. To this end, if you have some success story or testimonial regarding the CTPP, I would love to hear about it. It may be tweeted, posted to Facebook or Instagram (if visual), or used in future marketing. If you would rather send them to me, and not the whole list, that would be great: pweinberger(a)aashto.org<mailto:pweinberger@aashto.org>. I've included a couple examples below my signature, to give you ideas.
Thanks,
Penelope Weinberger
CTPP Program Manager
AASHTO
Ctpp.transportation.org
"The CTPP program is a vital component for understanding travel in the state of Florida. The data, along with the technical support provided by AASHTO helps Florida understand the nature of the workforce in Florida, how, when, and where they travel for work, and the impacts on congestion and transportation operations. The CTPP data is a cost effective tool for helping Florida DOT achieve its mission of providing for the mobility of people and ensuring economic prosperity by helping provide a data driven solution to transportation problems."
- Florida Department of Transportation
Several major model development projects in the state of Colorado have used CTPP Journey-to-Work data, including development of activity-based models for the Denver region and for the entire state. CTPP is one of the "go-to" sources of travel pattern data that is genuinely independent of the travel survey data commonly used to estimate these models. Unbiased, independent data of this type is very hard to come by, and the CTPP is a critical piece of just about any model development project puzzle.
Example #1. Tidycensus examples: one year, multiple geographies, multiple variables
This is an example of a simple script to “pull” select variables from the ACS using my Census API key and the R package tidycensus.
Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. Otherwise, it won’t work.
Comments start with hashtag “#”. It’s more obvious when using R-Studio.
# Step 0: Load relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
library(janitor)
Example #1.1 pulls the 2018 ACS data for San Francisco Bay Area counties, for table C03002 (population by race/ethnicity). It’s pulled into a “data frame” called “county1”.
Though the keywords (survey, year, geography, etc.) can be in any order within the “get_acs()” statement, I prefer leading with:
1. Survey=”acs1” – am I using the 1-year or 5-year databases
2. Year=2018 - what’s the last year of the 1-yr/5-year database
3. Geography=”county” – what level of geography am I pulling? US, State? County? Congressional District? Place?
See the tidycensus documentation, and the author’s website, for all of this and more!
https://walker-data.com/tidycensus/articles/basic-usage.html <https://walker-data.com/tidycensus/articles/basic-usage.html>
https://cran.r-project.org/web/packages/tidycensus/tidycensus.pdf <https://cran.r-project.org/web/packages/tidycensus/tidycensus.pdf>
https://www.rdocumentation.org/packages/tidycensus/ <https://www.rdocumentation.org/packages/tidycensus/>
# Simple Example #1.1: Population by Race/Ethnicity, 2018, SF Bay, Table C03002
# Note that tidycensus can use either the County Name or the County FIPS Code number.
# Experiment with output="wide" versus output="tidy" ("tidy" is the default.)
#####################################################################################
county1 <- get_acs(survey="acs1", year=2018, geography = "county", state = "CA",
# county=c(1,13,41,55,75,81,85,95,97),
county=c("Alameda","Contra Costa","Marin","Napa","San Francisco",
"San Mateo","Santa Clara","Solano","Sonoma"),
show_call = TRUE, output="wide",
table="C03002")
Example #1.2 is a variation on the previous script portion and pulls out population by race/ethnicity for ALL California counties, 2014/18 five-year ACS. If I used “ACS1” and “2018”, I’d only obtain data for the largest counties with 65,000+ total population!
# Simple Example #1.2: Population by Race/Ethnicity, 2014-2018, All California Counties, Table B03002
# If the list of counties is excluded,
# then data is pulled for all counties in the State
######################################################################################
AllCalCounties <- get_acs(survey="acs5", year=2018, geography = "county",
state = "CA", show_call = TRUE, output="wide", table="B03002")
Example #1.3 pulls out population by race/ethnicity for ALL Congressional Districts in California, for the single year 2018 ACS.
# Simple Example #1.3: Population by Race/Ethnicity, 2018, California Congress Dists, Table C03002
# This example pulls the congressional districts from California. Eliminate state="CA" to get congressional districts from the entire United States
######################################################################################
congdist1 <- get_acs(survey="acs1", year=2018, geography = "congressional district",
state = "CA", show_call = TRUE, output="wide", table="C03002")
Example #1.4 Names the variables using mnemonic names for population by race/ethnicity, 2018, single year ACS, Bay Area counties. I’m using the janitor package “adorn_totals” function to sum up regional totals.
The tidycensus package will append “E” to variable estimates and “M” to variable margins of error (90 percent confidence level, by default). So, the variable “White_NH_E” will mean, to me, at least, “Estimates of White Non-Hispanic Population” and “White_NH_M” will mean: “Margin of Error, 90% confidence level, of White Non-Hispanic Population.”
# Simple Example #1.4.1: Population by Race/Ethnicity: Bay Counties: Naming Variables.
# User-defined mnemonic variable names, since "C03002_001_E" doesn't fall trippingly on the tongue!
# the underscore is useful since tidycensus will append "E" to estimates and "M" to margin of error
# variables, e.g., "Total_E" and "Total_M"
######################################################################################
county2 <- get_acs(survey="acs1", year=2018, geography = "county", state = "CA",
county=c(1,13,41,55,75,81,85,95,97),
show_call = TRUE, output="wide",
variables = c(Total_ = "C03002_001", # Universe is Total Population
White_NH_ = "C03002_003", # Non-Hispanic White
Black_NH_ = "C03002_004", # Non-Hispanic Black
AIAN_NH_ = "C03002_005", # NH, American Indian & Alaskan Native
Asian_NH_ = "C03002_006", # Non-Hispanic Asian
NHOPI_NH_ = "C03002_007", # NH, Native Hawaiian & Other Pacific Isl.
Other_NH_ = "C03002_008", # Non-Hispanic Other
Multi_NH_ = "C03002_009", # Two-or-More Races, Non-Hispanic
Hispanic_ = "C03002_012")) # Hispanic/Latino
# Sometimes the results of TIDYCENSUS aren't sorted, so:
county2 <- county2[order(county2$GEOID),]
###########################################################################
# Simple Example #1.4.2: Add a new record: SF Bay Area, as sum of records 1-9
# adorn_totals is a function from the package janitor.
# The name="06888" is arbitrary, just a filler for the GEOID column.
tempxxx <- adorn_totals(county2,name="06888")
tempxxx[10,2]="San Francisco Bay Area"
county3 <- tempxxx
# Set a working directory, and write out CSV files as wanted.
# This is an example for a Mac, with the folder tidycensus_work on the desktop, and
# the folder output within tidycensus_work
setwd("~/Desktop/tidycensus_work/output")
write.csv(county3,"ACS18_BayAreaCounties.csv")
#############################################################################
At the end of this step I’m writing out CSV (comma separated value) files which I then open in Excel for finishing touches to tables, manually editing the variable names to something les cryptic:
That’s all for today!
Chuck Purvis,
Retired Person, Hayward, California
(Formerly of the Metropolitan Transportation Commission, San Francisco, California)
Take care!!
Here’s my first followup to my 7/16/2020 post on using tidycensus in a post-American Factfinder era.
Attached to this e-mail is a short text file (“r” suffix) that can be edited for your use.
Example #0. Setting up tidycensus.
This is an introduction to the use of the R-package tidycensus in extracting data from the US Census Bureau’s American Community Survey. I’m adding snippets of R code from my R-scripts, and attaching the full r-script to this message.
First things first: Acquaint yourself with the American Community Survey. What I would strongly recommend is to download and print out copies of the various ACS survey questionnaires. Know what was asked!
Decennial Census questionnaires:
https://www.census.gov/history/www/through_the_decades/questionnaires/ <https://www.census.gov/history/www/through_the_decades/questionnaires/>
American Community Survey questionnaires:
https://www.census.gov/programs-surveys/acs/methodology/questionnaire-archi… <https://www.census.gov/programs-surveys/acs/methodology/questionnaire-archi…>
Next, I would recommend downloading the “table shells” from the Census Bureau’s website, and not rely on just on the tidycensus “load_variables” function. Get the table shells for all of the years: the ACS does change ever so often, and so do the tables! I find it useful to have part of my computer screen opened with the table shells visible in Excel.
ACS Table Shells:
https://www.census.gov/programs-surveys/acs/technical-documentation/table-s… <https://www.census.gov/programs-surveys/acs/technical-documentation/table-s…>
I find it useful to have on hand a guide to the ACS table numbering scheme, so you know your “B” and “C” and “S” and “GCT” tables and the two-digit subject indicator (“08” – Journey-to-Work”). https://censusreporter.org/topics/table-codes/ <https://censusreporter.org/topics/table-codes/>
Download and install the free software package R Studio. There are other YouTube videos you can watch about learning/installing R and R Studio, and I won’t cover those here.
https://rstudio.com/products/rstudio/download/#download <https://rstudio.com/products/rstudio/download/#download>
Launch R Studio. There are a few add-on packages that first need to be installed onto your computer, and then “loaded” into your working R session.
# Step 1 Install R packages. If installed in previous sessions, there is no need to re-install.
# You may need to install the packages "tidyr" and "sp" for "tidycensus" to be properly installed.
install.packages("tidyverse")
install.packages("tidycensus")
install.packages("janitor")
# Step 2: Load relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
library(janitor)
Acquire a Census API key from the Census Bureau. It’s free. It’s a 40 character string that identifies a unique API user and helps the Census Bureau improve their tools to access census data. They’ll e-mail you a key in no time at all.
https://www.census.gov/data/developers/updates/new-discovery-tool.html <https://www.census.gov/data/developers/updates/new-discovery-tool.html>
https://api.census.gov/data/key_signup.html <https://api.census.gov/data/key_signup.html>
Install your 40-character API key into your R “environment.” Just one time and no need to concern yourself ever again about this key.
# Step 3: Load the User's Census API Key.
# Census API Key was installed in previous sessions, so no need to re-install
# un-comment out the following statement with the user's API key.
# census_api_key("fortycharacterkeysentbyCensusBureau",install=TRUE)
The last section of this introduction relates to using the “load_variables” as a tool to assist in selecting various variables. I prefer to download the ACS Table Shells into Excel, and then have appropriate Table Shells opened, alongside R Studio, to aid me in variable selection and naming.
# Step 4: Explore the Data Variables using the load_variables() function
# Use the function load_variables() to view all of the possible variables for analysis
# load_variables works for both decennial census and American Community Survey databases
acs18_variable_list <- load_variables(year = 2018, dataset = "acs5", cache = TRUE)
acs18p_variable_list <- load_variables(year = 2018, dataset = "acs5/profile", cache = TRUE)
# Maybe write out the data frame to the desktop, for easier in use in Excel?
write.csv(acs18_variable_list,'acs18_variable_list.csv', row.names=FALSE)
View(acs18_variable_list)
As of this summer 2020, tidycensus can be used to extract the “base” and “collapsed” tables for all years of the ACS, from 2005 through 2018 “single year” databases; the five-year ACS databases starting with 2005/09 through to 2014/18; and the decennial census files for 2010, 2000 and 1990. For the decennial censuses, databases include the SF1 (Summary File #1) for 1990, 2000 and 2010; and the SF3 (Summary File #3) for 1990 and 2000. (There was no long form census in the 2010 Census, so, thus no long-form-based SF3 data for 2010!)
(I have yet to explore how to pull data from the decennial censuses using tidycensus, and would be grateful to hear news of successes/failures.)
A word of warning: R is very case sensitive. Something like View(acs18_variable_list) will work okay, but view(acs18_variable_list) will not work!!
That’s the end of Step #0… Setting up Tidycensus!
Chuck Purvis, Hayward, California
Retired Person (formerly of the Metropolitan Transportation Commission, San Francisco, California)\
ICYMI Panel to discuss the politics of the census at the American
Sociological Association. The event will take place this Sunday in
cyberspace.
https://umn.zoom.us/j/98520151584
Krishnan Viswanathan
5628 Burnside Circle
Tallahassee FL 32312