(I’ve been having outgoing mail issues, as well…)
Just a reminder to all that the single-year 2019 American Community Survey data was released to the public on Thursday, September 17, 2020.
This is the fifteenth year of single-year ACS data (2005 to 2019). (Reminder that group quarters data collection didn’t start until 2006.)
The data is actually under “embargo” (since 9/15/20), and only accessible to the media for analysis. But that’s just a short two day window of opportunity to get the scoop, so to speak.
Here’s a nice summary article, on health insurance coverage, by the nonpartisan Center on Budget and Policy Priorities:
https://www.cbpp.org/research/health/uninsured-rate-rose-again-in-2019-furt… <https://www.cbpp.org/research/health/uninsured-rate-rose-again-in-2019-furt…>
The bad news is that the share of the population without health insurance is up to 9.2 percent in 2019, compared to a historic low of 8.6 percent in 2016 and a historic high of 15.5 percent in 2010. (Health insurance was first collected in the ACS in 2008.)
And here’s the link to a 9/15/20 20-page report by the US Census Bureau on Health Insurance Coverage in the US based on data from the Current Population Survey (CPS) and the American Community Survey (ACS).
https://www.census.gov/content/dam/Census/library/publications/2020/demo/p6… <https://www.census.gov/content/dam/Census/library/publications/2020/demo/p6…>
https://www.census.gov/library/publications/2020/demo/p60-271.html <https://www.census.gov/library/publications/2020/demo/p60-271.html>
Note that the 2019 CPS data was collected February through April 2020, just at the start of our ongoing pandemic. I’d recommend reading both the CBPP and Census Bureau reports!!
If any of the national or local media has released any 2019 ACS results on transportation-related topics, that would be useful to post here!
Stay safe!
Chuck Purvis,
Hayward, California
formerly of the Metropolitan Transportation Commission (San Francisco, California).
Can I get a list of places within my multi-county region? Yes, but it takes a little work!
Finding Places within Counties for Your State and Region.
This is an “R” script which uses readily downloadable files from the decennial census to build a file of places within counties within states. This can then be subsetted to extract lists of places within a one-county or multi-county region. The following examples used 2010 Census data for the states of California, Texas, and New York, and can hopefully be easily adapted once the 2020 Census data becomes available spring 2021.
The standard Census geography hierarchy diagram shows that “places” are below “states.” “Places” are NOT below “County” geographic levels. This is because there are some “places” in the United States that straddle two-or-more counties! This is an inconvenience if the analyst is interested in, say, the population characteristics of all places (cities, census designated places) within a one-county or multi-county region.
A simple, creative way to find places-within-counties is to use GIS software to layer both county boundaries and place boundaries to select your places of interest. But this process doesn’t say anything about the population characteristics of places that potentially straddle two-or-more counties!
But there are two Census “summary levels” that can be used to clearly identify places-within-counties: summary level 160 (state-place); and summary level 155 (state-place-county). The #160 summary level is the more commonly used of the two. The #155 summary level is pretty much a secret summary level only used by curious census data wonks.
The two summary levels, combined, provide an accurate database of places-within-counties, including lists of places that straddle two-or-more counties.
The simplest approach is to download the available PL 94-171 “Redistricting File” for the US state of interest. Here’s the link to the Census 2010 Redistricting Files:
https://www2.census.gov/census_2010/redistricting_file--pl_94-171/ <https://www2.census.gov/census_2010/redistricting_file--pl_94-171/>
What is important is the “geographic header” file that includes all geographic summary levels of interest, including summary levels 155 and 160.
The following example extracts summary level 155 and 160 data for places in California, from Summary File #1 (SF1), though the PL 94-171 geographic header file is identical!
The filename “cageo2010.sf1” means “California” + “Geographic” + “2010” from “SF1”
This is a good example of building an “R” data frame from a fixed-format data file.
###########################################################################
## Extract the California Place Names and their Counties from the
## 2010 Decennial Census master geographic file, California, SF #1
###########################################################################
# install.packages("tidyverse")
# install.packages("dplyr")
library(dplyr)
setwd("~/Desktop/tidycensus_work/output")
setwd("~/Desktop/Census/2010_Census/ca2010.sf1")
# setwd("~/Desktop/ca2010.sf1")
x <- readLines(con="cageo2010.sf1") # Very large fixed format file, 843,287 observations for Calif
CalGeo <- data.frame(# fileid = substr(x, 1, 6),
# stusab = substr(x, 7, 8),
sumlev = substr(x, 9, 11),
# geocomp= substr(x, 12, 13),
state = substr(x, 28, 29),
county = substr(x, 30, 32),
place = substr(x, 46, 50),
# tract = substr(x, 55, 60),
# blkgrp = substr(x, 61, 61),
# block = substr(x, 62, 65),
arealand=substr(x, 199, 212),
areawatr=substr(x, 213, 226),
name = substr(x, 227, 316),
pop100 = substr(x, 319, 327),
hu100 = substr(x, 328, 336))
CalGeo$GEOID <- paste(CalGeo$state,CalGeo$place,sep="")
The following statements extract all summary level 155 and 160 records from the master geo-header file. The two files are then merge by the variables “state” and “place” and variables are renamed to something more recognizable. Note that if the variable names are the same in the merged data frames, “R” uses a convention of “variablename.x” for the first data frame and “variablename.y” for the second data frame used in the merge.
sumlev155 <- subset(CalGeo, sumlev == 155) # state-place-county summary level
sumlev160 <- subset(CalGeo, sumlev == 160) # state-place summary level
coplace1 <- merge(sumlev155, sumlev160, by = c('state','place'), all=TRUE)
coplace2 <- dplyr::rename(coplace1, county_name = name.x, # name is eg "Contra Costa County (part)"
place_name = name.y, # name is eg "Acalanes Ridge CDP"
county = county.x,
GEOID = GEOID.x)
The following statements are intended to identify places that straddle two-or-more counties in the state. In California (in 2010) we had four places that each straddle two counties, for a total of eight place-county records! These four place are Aromas (Monterey/San Benito Counties), Kingvale (Nevada/Place Counties), Kirkwood (Alpine, Amador Counties) and Tahoma (El Dorado/Place Counties).
# Extra places that straddle two-or-more counties.
# pop100.x = 2010 population count for, perhaps, part of the place (sumlev=155)
# pop100.y = 2010 population count for the FULL place (sumlev=160)
# This yields 4 places that straddle 2 counties, each, for 8 records in this file.
splittown <- subset(coplace2, pop100.x < pop100.y)
View(splittown)
And lastly, I wanted to extract the places within the nine-county San Francisco Bay Area. This is probably the simplest script for this extraction.
# Subset the Bay Area places from the SUMLEV=155/160 file
BayArea <- subset(coplace2, county== "001" | county=="013" | county=="041" |
county=="055" | county=="075" | county=="081" | county=="085" | county=="095" |
county=="097" )
# c(1,13,41,55,75,81,85,95,97)
Of course, write out the data frames to CSV files for further analysis.
setwd("~/Desktop/tidycensus_work/output")
write.csv(BayArea,"Census2010_BayArea_Places.csv")
write.csv(coplace2,"Census2010_California_Places.csv")
Let’s check Texas!
Just to check this procedure, I downloaded the PL 94-171 data files for the State of Texas. The geo-header file for Texas was even larger than California (n=1,158,742 records in Texas, n=843,287 records in California!)
Texas has 1,752 places (sumlev=160) and 1,934 place-county parts (sumlev=155). Upon inspection, there are places in Texas (Dallas!) that straddle five counties. That caught me by surprise!
The population-based split procedure for California (subset(coplace2, pop100.x < pop100.y)) didn’t work for Texas since there are a few place-county parts in Texas with zero population. I found that “arealand” works just fine for Texas. The following code works for Texas:
# Extra places that straddle two-or-more counties.
# pop100.x = 2010 population count for, perhaps, part of the place (sumlev=155)
# pop100.y = 2010 population count for the FULL place (sumlev=160)
# Find the Texas places straddling two-or-more counties
splittown <- subset(coplace2, pop100.x < pop100.y)
View(splittown)
# This works better for Texas, since there are a few place-county parts with zero population,
# and 100 percent of population in the other place-county part.
splittown2 <- subset(coplace2, arealand.x < arealand.y)
View(splittown2)
That’s as far as I’ve carried the Texas example, since I’m not on the lookout for a master national list of places split by county boundaries! Maybe this is something that the Census Bureau Geography Division has ready access to?
Let’s check New York!
This process also works for the State of New York. It’s still best to use the “AREALAND” differences, sumlev=155 versus sumlev=160, to find places that straddle two-or-more-counties (or boroughs in the case of New York City).
Yes, the City of New York straddles/encompasses five boroughs/counties. And there are 13 other places in New York State that straddle two counties: Almond, Attica, Brewerton, Bridgeport, Deposit, Dodgeville, Earlville, Geneva, Gowanda, Keeseville, Peach Lake, Rushville, and Saranac Lake.
These are “R” scripts that don’t use “tidycensus” but are clean methods for answering such a simple question as “what are the census places within my multi-county region?”
I’ve been having troubles sending my example set #2 for my introduction to tidycensus. Hopefully this time through it’ll get through.
Chuck Purvis,
Hayward, California
Example #2. More Complex Tidycensus examples: multiple years, multiple geographies, multiple variables.
This is a more complex example of a script to “pull” select variables from the ACS using my Census API key and the R package tidycensus.
Step #0. Always need to load the relevant packages/libraries when starting up a new R-session. Otherwise, it won’t work.
# Step 0: Load relevant libraries into each R-session.
library(tidyverse)
library(tidycensus)
In this set of examples, I’m extracting single year ACS variables (2005-2018) for all large (65,000+ population) places in the State of California.
# The get_acs function is run for each year of the single-year ACS data, from 2005 to 2018.
# Note that group quarters data was not collected in 2005, but started in 2006.
# Note the "_05_" included in the variable name in the first data "pull". That's a # # mnemonic device that tells us it's for the year 2005.
# Example 2.1 through 2.14: Run get_acs for large California Places, 2005-2018
# Example 2.15: Merge together data frames into a VERY wide database...lots of columns!
# Example 2.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
#-------------------------------------------------------------------------------
place05 <- get_acs(survey="acs1", year=2005, geography = "place", state = "CA",
show_call = TRUE, output="wide",
variables = c(TotalPop_05_ = "B06001_001", # Total Population
Med_HHInc_05_ = "B19013_001", # Median Household Income
Agg_HHInc_05_ = "B19025_001", # Aggregate Household Income
HHldPop_05_ = "B11002_001", # Population in Households
Househlds_05_ = "B25003_001", # Total Households
Owner_OccDU_05_= "B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_05_ = "B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_05_ = "B25077_001")) # Median Value of Owner-Occ DUs
place05$Avg_HHSize_05 <- place05$HHldPop_05_E / place05$Househlds_05_E
place05$MeanHHInc_05 <- place05$Agg_HHInc_05_E / place05$Househlds_05_E
#------------------------------------------------------------------------------------
place06 <- get_acs(survey="acs1", year=2006, geography = "place", state = "CA",
show_call = TRUE, output="wide",
variables = c(TotalPop_06_ = "B06001_001", # Total Population
Med_HHInc_06_ = "B19013_001", # Median Household Income
Agg_HHInc_06_ = "B19025_001", # Aggregate Household Income
HHldPop_06_ = "B11002_001", # Population in Households
Househlds_06_ = "B25003_001", # Total Households
Owner_OccDU_06_= "B25003_002", # Owner-Occupied Dwelling Units
Rent_OccDU_06_ = "B25003_003", # Renter-Occupied Dwelling Units
Med_HHVal_06_ = "B25077_001")) # Median Value of Owner-Occ DUs
place06$Avg_HHSize_06 <- place06$HHldPop_06_E / place06$Househlds_06_E
place06$MeanHHInc_06 <- place06$Agg_HHInc_06_E / place06$Househlds_06_E
#------------------------------------------------------------------------------------
These sets of codes are repeated for each single year ACS of interest, say for 2005 through 2018. Smarter “R” programmers will be able to tell me about “do loops” to make this process more efficient with magical wild cards.
The following step merges the data frames using the GEOID/NAME variables. This create a very “wide” database. One record per geography, and each column representing the variable/year combinations.
The “merge” function in “R” allows only two data frames to be joined by common columns at a time. I have yet to find a “R” function that allows me to merge all of the data frames at once.
#####################################################################################
# Example 2.15: Merge together data frames into a VERY wide database...lots of columns!
# Merge the dataframes, adding a year in each step. All=TRUE is needed if # of places is different.
#
# (R-language newbie script...There are probably more terse/exotic ways of doing this!)
place0506 <- merge(place05, place06, by = c('GEOID','NAME'), all=TRUE)
place0507 <- merge(place0506,place07, by = c('GEOID','NAME'), all=TRUE)
place0508 <- merge(place0507,place08, by = c('GEOID','NAME'), all=TRUE)
place0509 <- merge(place0508,place09, by = c('GEOID','NAME'), all=TRUE)
place0510 <- merge(place0509,place10, by = c('GEOID','NAME'), all=TRUE)
place0511 <- merge(place0510,place11, by = c('GEOID','NAME'), all=TRUE)
place0512 <- merge(place0511,place12, by = c('GEOID','NAME'), all=TRUE)
place0513 <- merge(place0512,place13, by = c('GEOID','NAME'), all=TRUE)
place0514 <- merge(place0513,place14, by = c('GEOID','NAME'), all=TRUE)
place0515 <- merge(place0514,place15, by = c('GEOID','NAME'), all=TRUE)
place0516 <- merge(place0515,place16, by = c('GEOID','NAME'), all=TRUE)
place0517 <- merge(place0516,place17, by = c('GEOID','NAME'), all=TRUE)
place0518 <- merge(place0517,place18, by = c('GEOID','NAME'), all=TRUE)
place_all <- place0518
View(place_all)
Sometimes you want to create smaller data frames with just a select number of columns. Here’s a good approach for that.
# The following functions output useful lists to the R-studio console which can then be edited
names(place_all)
dput(names(place_all)) # most useful for subsetting variables
# The purpose here is to re-order and select variables into a much more compact
# database, for eventual exporting into a CSV file, and then into Excel for finishing touches.
selvars <- c("GEOID", "NAME",
"TotalPop_05_E", "TotalPop_06_E", "TotalPop_07_E", "TotalPop_08_E",
"TotalPop_09_E", "TotalPop_10_E", "TotalPop_11_E", "TotalPop_12_E",
"TotalPop_13_E", "TotalPop_14_E", "TotalPop_15_E", "TotalPop_16_E",
"TotalPop_17_E", "TotalPop_18_E")
# note the brackets for outputing new data frame from previous data frame....
place_all2 <- place_all[selvars]
# View the Selected Variables Table
View(place_all2)
# Set directory for exported data files, MacOS directory style
setwd("~/Desktop/tidycensus_work/output")
# Export the data frames to CSV files, for importing to Excel, and applying finishing touches
write.csv(place_all2,"ACS_AllYears_TotalPop_Calif_Places.csv")
write.csv(place_all, "ACS_AllYears_BaseVar_Calif_Places.csv")
In this last example, I’m reading in a file of large places in the Bay Area (manually derived from the CSV file created previous) in order to subset Bay Area “large places” from State of California “large places”.
#####################################################################################
# Example 2.16: Merge in a file of Large San Francisco Bay Area places, and subset file.
# Read in a file with the Large SF Bay Area Places, > 65,000 population
# and merge with the All Large California Places
bayplace <- read.csv("BayArea_Places_65K.csv")
Bayplace1 <- merge(bayplace,place_all, by = c('NAME'))
Bayplace2 <- merge(bayplace,place_all2, by = c('NAME'))
write.csv(Bayplace1,"ACS_AllYears_BaseVar_BayArea_Places.csv")
write.csv(Bayplace2,"ACS_AllYears_TotalPop_BayArea_Places.csv")
This concludes Example #2: “multiple geographies / multiple years/ multiple variables” with only one record (row) per each geography.