Today’s (7/28/2021) blog post by acting Census Bureau director Dr Ron Jarmin is essential reading. And the imbedded youtube videos help us understand what’s going on.
Here’s a link to the blog post:
The What is Redistricting video:
The Protecting Privacy video:
Here’s a snippet from the Director’s post:
"With these [privacy protecting] parameters, some small areas like census blocks may look
“fuzzy,” meaning that the data for a particular block may not seem
correct. Importantly, our approach yields high quality data as users
combine these "fuzzy” blocks to form more significant geographic units
like census tracts, cities, voting districts, counties, and American
Indian/Alaska Native tribal areas. Our calibration was designed to
achieve acceptable quality thresholds for these levels of geography.
So, if you’re looking at block-level data, you may notice situations like the following:
- Occupancy status doesn’t match population counts. Some blocks
may show that the housing units are all occupied, but the population
count is zero. Other blocks may show the reverse: the housing units are
vacant, but the population count is greater than zero.
- Children appear to live alone. Some blocks may show a population count for people under age 18 but show no people age 18 and older.
- Households appear unusually large. For example, you may find blocks with 45 people, but only three housing units.
Though unusual, situations like these in the data help confirm that confidentiality is being protected.
Noise in the block-level data will require a shift in how some data users typically approach using these census data.
Instead of looking for precision in an individual block, we strongly
encourage data users to aggregate, or group, blocks together. As blocks
are grouped together, the fuzziness disappears. And when you step back
with more blocks in view, the details add together and make a sharp
picture. "
# # #
So, as I understand it, it’s not that the Bureau is taking complete microdata records (the household, household members, etc.) and sprinkling them randomly within a census tract, but each individual variable is independently (?) modeled/simulated using their privacy protection parameters. More or less, I guess.
If I was a City Planner, the variable that I would/should be most certain of is the count of dwelling units at my city block level. I wouldn’t readily know if those housing units were occupied or vacant, but I could believe with my own eyes(and aerial photography) that a housing unit is present. But if “housing units” are fuzzified (?) using differential privacy, then, meh…..
###
Fuzzy Wuzzy was a bear, Fuzzy Wuzzy had no hair, Fuzzy Wuzzy wasn’t fuzzy, was he?
###