Block group data is and always has been important to the public sector. It
has been essential to my activities as a transportation planner over the
years. I cannot speak for other areas, but there is nothing "accidental"
about the geography of block groups in the areas in which I work. Perhaps
this is due to my active interest and involvement in the delineation of all
census statistical areas affecting my work. The necessity of meeting
minimum goals of persons and households for the sake of statistical analysis
does occasionally lead to somewhat arbitrary aggregations of small areas.
The necessity of delineating the block group boundaries prior to the actual
census does occasionally lead to odd-looking results, especially in rapidly
developing or redeveloping areas.
Errors made by the Bureau of the Census in field review, in digitizing, or
in geocoding are the only cause of "accidental" geography in my experience.
For the 1990 census, the Bureau would not allow local entities to demand
corrections to Bureau errors below the Census Tract level. In 2000, they
are allowing us to demand corrections that affect block groups, but not
individual blocks. (My metropolitan area had about half a mile of a major
drainage feature eliminated during field review. The Bureau admitted it was
their mistake but would not restore the block boundary.) In short, I'll
fight over any decisions that diminish my flexibility in performing
demographic analyses and would encourage other to do the same.
Robert R. Allen, AICP
Transportation Planning Director
Abilene Metropolitan Planning Organization
ph.: 915-676-6243
fax: 915-676-6242
Excerpt from Original Message-----
From: owner-ctpp-news(a)chrispy.net [mailto:owner-ctpp-news@chrispy.net]On
Behalf Of Patty Becker
Sent: Sunday, January 27, 2002 12:04 PM
To: ctpp-news(a)chrispy.net
Subject: [CTPP] Some more points
I am skeptical about block group data. They have never been very reliable
anyhow, and are mostly important to the private sector because they permit
more precise aggregation in what I call radius analyses (1 mile circle,
three mile circle, etc.) The long form data at the individual block group
level have very high sampling and non-sampling error rates. In addition,
block groups are largely "accidental" geography, because the lines between
them are most often arbitrary and not based on any real-life
criteria. Tracts (and TAZs), on the other hand, are deliberately drawn to
meet local criteria, or they should be.
I think it would serve us all better to refer to TAZs as tract-equivalents
rather than block-group equivalents. In areas without high employment
density, that's what they are. I don't think a fight over block group data
will serve anyone well as this point.
Thanks to everyone who commented on my comments.
Regarding ACS data delivery, there is a plan in the works which is going to
be tested this year. It's called "Tier 3," referring to American
FactFinder (tier 1 is the profiles, and tier 2 is the summary file tables,
both now available). In Tier 3, the idea is that the user could specify
the precise geography and precise cross-tabulation(s) desired, and the
system will check to see if the requested table meets confidentiality
criteria before delivering it. The table would run off the basic 2000
census record.
If Tier 3 works out, it would probably be extended to the ACS files. Thus,
for example, a run could be done for a group of tracts with population 65K
or higher. This OUGHT to include a standard profile for the custom geography.
I definitely agree with the idea of sub-MCD ACS tabulation areas--these are
not PUMAs because they're not for microdata. This is an idea to be worked
on for the future.
I am skeptical about block group data. They have never been very reliable
anyhow, and are mostly important to the private sector because they permit
more precise aggregation in what I call radius analyses (1 mile circle,
three mile circle, etc.) The long form data at the individual block group
level have very high sampling and non-sampling error rates. In addition,
block groups are largely "accidental" geography, because the lines between
them are most often arbitrary and not based on any real-life
criteria. Tracts (and TAZs), on the other hand, are deliberately drawn to
meet local criteria, or they should be.
I think it would serve us all better to refer to TAZs as tract-equivalents
rather than block-group equivalents. In areas without high employment
density, that's what they are. I don't think a fight over block group data
will serve anyone well as this point.
The "rolling average" means that while data released at the tract level in
2008 were collected in 2003-7, the 2009 release will include 2004-8 data,
and so forth.
One of the most important things about the ACS is that it is in the field
on a 12 month basis. This is a distinct difference from the census
long-form, which is collected as of April 1 (at least in theory) and for
which the income data represent the previous calendar year. In ACS, the
income data will reflect the past 12 months. Mobile populations will be
enumerated where they are at any given time, which means that a town like
Gainesville, FL, which has a much lower population count in the summer than
in the winter (40K to 100K) will show up with less than 100K population in
the ACS. Analysis of these various differences is one of the main purposes
of the supplementary surveys, especially the C2SS in 2000.
Elaine--good luck on getting 14 tables on the TAZ/TAZ matrix. I'm pretty
skeptical given the current climate, especially since the fact that the
data go only into models doesn't carry much weight with the DRB. I will be
interested to see what happens. I would recommend, though, that you
consider giving up some of the detail in the interest of having the counts.
About the rule of 100: that is the arbitrary cutoff for presentation of
data on SF2. See
http://www.census.gov/Press-Release/www/2001/sumfile2.html. SF4 will
probably be higher. But again, remember, that the cutoff on these files is
designed for the presentation of data for these small race/ethnic
groups. It makes sense not to deliver 47 tables for a population group
with only 35 (or 99) people in a given geographic area such as a tract or
small MCD. You can always look at a higher level of geography, such as the
county, to get these characteristics for the (perhaps slightly) larger
population count for the group.
An important point about SF2, and what I said above, is that the file is
PUBLIC. It will be out there on AFF and anyone can look at it. The case
needs to be made that the CTPP, data for TAZs, is NOT PUBLIC in the same
sense. Eventually they're going to have to work out licensing agreements
or something similar to handle this problem. Right now, we're in the
crosshairs of the problem without good decisions having been made.
As I said before, eternal vigilance is required.
Patty Becker
Patricia C. (Patty) Becker 248/354-6520
APB Associates/SEMCC FAX 248/354-6645
28300 Franklin Road Home 248/355-2428
Southfield, MI 48034 pbecker(a)umich.edu
First, let me thank Patty for her insightful comments. I wish I could have attended the conference on privacy and confidentiality that occured just before TRB!
The CTPP Working Group has been working very diligently with the Disclosure Review Board on the final tables to be included in the CTPP 2000. The DRB approved the residence-only and workplace-only tables. However, the home-to-work tabulations required a lot of discussion.
The TAZ-to-TAZ tables are MORE THAN A COUNT OF WORKERS. As it stands now, we have a list of 14 tables, of which 5 are subject to a threshold of 3 unweighted records, and 9 which will be reported without a threshold. Please see the January 2002 CTPP Status Report at www.trbcensus.com for a near final list. Table 3-2 on the list has been replaced with Means of Transportation (7 categories) by Vehicles Available (3 categories).
Another question that arose at the TRB Workshop was the definition of STANDARD tabulations from ACS. Nancy Gordon of the Census Bureau said that she would like increased communication with the transportation community on all aspects of ACS.
HERE ARE THE CURRENT PLANS FOR ACS STANDARD TABULATIONS (residence geography only):
"Core" tables with transportation variables:
1. Means of transportation to work (13 cat)
2. Travel Time to work (13 cat)
3. Travel Time (4) by Means of Transportation to Work (2)
a. Public Transit, and
b. Other (includes drive alone, carpool, walk, bike, etc)
4. Aggregate Travel time by Means of Transportation (2) (see item 3 above)
5. Tenure (owner/renter) by Vehicles Available (6)
"Non-Core" tables planned for release in Spring 2002
1. Tenure (owner/renter) by Veh Avail (2 cat: 0/1+ veh) by Age of Householder (3 cat: 15-34/ 35-64/65 and over)
2. Time leaving Home to go to Work (15 cat)
3. Private vehicle occupancy (5 categories for carpooling, plus drove alone, and other)
Question for the Census Bureau:
How can the transportation community provide recommendation for standard tabulations planned for the ACS for residence geography ? And, is there a possibility for the standard tabulations to include 1) workplace geography and 2) home-to-work flow tabulation, or will this need to be prepared in a CTPP-like special tabulation under contract?
Question for the data user community:
What tabulations would you want from a standard ACS product?
My PERSONAL opinion is that on any travel time table by means of transport, drive alone should not be combined with walk and bike! Also, instead of so many tables using housing tenure (owner/renter), we might want to have # of workers by vehicles avail, or household income by vehicles avail.
Elaine Murakami
Federal Highway Adminitration
Dave:
You raise some excellent points, and I'll try to respond.
Yes, what the CENSUS BUREAU means by areas of 65,000-plus population are places, counties, MSAs, etc., with at least 65,000+ total population.
But consider the example of BIG CITY, say, BIG CITY in New Mexico with a current population of 448,607. Now, wouldn't YOU like to split this city into 6 "ACS Districts" of at least 65,000+ population, where you could then get ANNUAL DATA for each of these sub-city districts??? (I pitched this idea at our TRB "ACS Workshop" this past Sunday, January 13th. Census Bureau folks were interested - - didn't say yes, didn't say no to what I was pushing....)
If we can define PUMAs of 100,000-plus population, then certainly we can also define ACS districts of 65,000-plus population?
In terms of TAZ-to-TAZ commute flow data, it will be very necessary to aggregate five years of ACS data in order to get reliable, very small area (tract, taz) data. So, the first release of "very small area" data from the ACS should be 2008, to represent aggregated data from 2003-2007.
Block groups. In terms of tract versus block group, I've only seen the Census Bureau publicly admit to providing tract-level data only. I don't recall us raising the block group issue at our ACS workshop, but the Census Bureau *was* saying that TAZ data would be available by 2008 (my USDOT colleagues should correct me if I'm wrong). Since TAZes are similar in size (population, employment-wise) to block groups, it would make sense to me that block group ACS data is released in 2008, as well. I totally agree with you about block group data being made available!!!
I haven't seen any statistical research by Census Bureau staff that discusses the standard errors of census tract data versus standard error of block group data, based on 5-year accumulation of ACS data. What I have seen is the research that justifies the 65,000+ threshold and thresholds for other multi-year (2-year, 3-year, 4-year, 5-year) accumulation of ACS data (see ACS web page, paper by Chip Alexander on this particular issue.)
Also, I don't think the Census Bureau has ruled out (or positively commited TO) multi-year products other than the five-year census tract data. So, we COULD see 2-year or 3-year or 4-year accumulation of data, if that's what the user community is REALLY after! Maybe with 6-year or 7-year accumulation of data we get "block groups"? Who knows? (And whether these multi-year tabs are FREE or COST-REIMBURSABLE is yet another issue in the "to be determined" column of issues.)
Hopefully (sooner rather than later) our USDOT colleagues will post our powerpoint presentations from the TRB ACS workshop on our subcommittee's website (www.trbcensus.com). (nudge, nudge ;-)
Chuck Purvis, MTC
***********************************************
Charles L. Purvis, AICP
Senior Transportation Planner/Analyst
Metropolitan Transportation Commission
101 Eighth Street
Oakland, CA 94607-4700
(510) 464-7731 (office)
(510) 464-7848 (fax)
www: http://www.mtc.ca.gov/
Census WWW: http://census.mtc.ca.gov/
***********************************************
>>> David Abrams <dabrams(a)mrgcog.org> 01/23/02 03:36PM >>>
This sounds fairly reasonable, but I would like at least one clarification.
You stated that there would be one-year tables for areas of 65,000 or more
and two-year data for areas of 30,000 or more. Are these areas defined as
places or counties? In the same paragraph you referred to a five-year cycle
in a manner that implied that the CTPP was based on five years of data, is
this a correct interpretation?
I am also concerned that the minimum geographic level for sample data
appears to be the tract. The block group level for sample data has been
very valuable. Is it correct that we would lose block group level sample
data?
Dave Abrams
Information Services Manager
MRGCOG, Albuquerque, NM
This sounds fairly reasonable, but I would like at least one clarification.
You stated that there would be one-year tables for areas of 65,000 or more
and two-year data for areas of 30,000 or more. Are these areas defined as
places or counties? In the same paragraph you referred to a five-year cycle
in a manner that implied that the CTPP was based on five years of data, is
this a correct interpretation?
I am also concerned that the minimum geographic level for sample data
appears to be the tract. The block group level for sample data has been
very valuable. Is it correct that we would lose block group level sample
data?
Dave Abrams
Information Services Manager
MRGCOG, Albuquerque, NM
-----Original Message-----
From: Patty Becker [mailto:pbecker@umich.edu]
Sent: Wednesday, January 23, 2002 11:32 AM
To: ctpp-news(a)chrispy.net
Subject: [CTPP] Data detail
Let me try to address some of the issues that Liz Hartmann raises:
Over the past two or three years, the concerns about privacy and
confidentiality in data release have grown exponentially. At the census
bureau, this issue exploded two years ago with the question of what would
be on the Public Use Microdata Sample file (PUMS). The original plan would
have, essentially, made the file useless. After a user meeting convened by
the Population Division at the bureau, and a LOT of internal negotiating,
the final outline for the 2000 PUMS file is quite reasonable. I have no
problem with top coding and rounding, because, in fact, the data are not
reliable at the per minute level for journey to work, or at the details of
very high income levels. My only problem is with the 100,000 cutoff, but
it was clear that we could not fight that this time around.
The long-form data are weighted to represent the total population in
publications and summary files. You can see those weights on the PUMS
file. Theoretically, they would be 6 to 1, since the sample is 1 in 6, but
a lot of things can make them different from that. I have not heard what
the cutoff level is going to be on SF4 for the long form data; the critical
point is that we won't have any cutoff level on SF3. Remember, again, that
SF2 and SF4 are a completely different concept than they were in
1980/90. They have exactly the same tables as SF1/SF3, but are iterated
for the small race/ethnic groups such as Indian tribes when there is a big
enough N to do so. That makes perfect sense to me.
The important point that needs to be made for TAZ files (especially the TAZ
to TAZ) is that these files are used only in modeling. No one makes a
printed (or web-based) report out of them! We all know, or should, that
the data are very, very thin, and thus really are not very reliable. But
there are so many TAZ pairs that the model handles the problems--or at
least I hope it does.
I think there is a reasonable chance that we will be able to get the CTPP
tables needed for transportation planning when the ACS reaches its full
five-year cycle, planned for 2008. The data will still be thinner than
long-form data; we will have a chance to measure what all that means for
geographic areas larger than tracts/TAZs as the one year, two year, and
three year accumulations become available. We are supposed to get one-year
tables for areas of 65,000 or more, and two-year data for areas of 30,000
or more. That's probably enough for the districts used in the models.
"Is there room for collapsing across some of the "people categories" in
order to have data for the smaller geographic levels? How much would that
cost in dollars and timeliness?" That's already handled in the design of
the tables for any given product, including the CTPP. The TAZ/TAZ file has
nothing but the count of workers, right? We get, or impute,
characteristics from the wider data set that's on the TAZ of residence and
TAZ of work files.
Yes, I completely agree with "But I'd hate to lose the baby with the
bathwater. There's so much to be gained from this incredible data source.
It's one of the best things going (in my opinion), and crucial to good
Planning." Personally, I plan to remain in the fight as long as
possible. We will have to deal with it product by product. Two weeks ago,
I attended a DC conference on privacy and confidentiality. Of perhaps 275
people there, at least 240 were from federal agencies. There were very few
users. We have to make sure the user community is heard from.
I will be happy to work with the CTPP community on this. I am a
demographer rather than a transportation planner, but have more experience
than most demographers with the CTPP (and UTPP) and the use of these data.
Patty Becker
----------------------------------------------------------------------------
---------------------
Patricia C. (Patty) Becker 248/354-6520
APB Associates/SEMCC FAX 248/354-6645
28300 Franklin Road Home 248/355-2428
Southfield, MI 48034 pbecker(a)umich.edu
This was helpful to me:
"From Celia Boertlein:
http://www.census.gov/prod/cen2000/doc/sf2.pdf
Appendix H "Characteristic Iterations" discusses the use of the population
threshold in showing matrices for specific race groups, Indian and Alaska
Native tribes, and Hispanic or Latino groups."
And although initially confusing, this also helped:
http://factfinder.census.gov/home/en/sf2.html
Patty Becker's comments on the listserve were also helpful. I'm still feeling confused about the data privacy issue and why the rule applies to some, and not all of the SF releases. Seems a little inconsistent, but it may make more sense as the details unfold over time.
******************************************
I agree with your statement about the amount of info to digest when understanding the census. Can you explain what is meant by "Rule of 100"?
Thanks!
Kelli Peterson
John:
I would recommend you contact Mr. Phil Salopek or Ms. Clara Reschovsky at the Census Bureau (Population Division, Journey-to-Work & Migration Statistics Branch). Phil or Clara can look up the Metro Planning Organization contacts for the "Urban Element" contacts for New England areas.
For the Boston MPO (Central Transportation Planning Staff) I would recommend talking to Marc Desmarais (their Census person) or Mr. Karl Quackenbush (Deputy Director at CTPS). I don't who's who in the other New England metro areas.
All TAZes in the 1990 CTPP (Census Transportation Planning Package) AS WELL AS the 2000 CTPP are USER-DEFINED by the MPOs and STATE DOTs. I'm fairly certain there weren't rules that the TAZes nested on MCDs (it's nice if they were, but that's an issue for the local DOT or MPO to address! I can tell you that I nested my TAZes within Census Tracts, but NOT to PLACES....)
Also, the CTPP/Urban Element CDs include a GIS layer (Caliper's TRANSVU format) of the TAZ, which can be overlaid on your MCD boundary layer. Your hotshot GIS analysts can then try to develop a correspondence between your places and the TAZes.
I am crossposting my response to our "CTPP listserv" since that's where a lot of our MPO and Census Bureau colleagues "hang out" to discuss CTPP and Census Issues that are of interest to our MPO and Council of Governments communities.
cheers,
Chuck Purvis, MTC-Oakland CA
***********************************************
Charles L. Purvis, AICP
Senior Transportation Planner/Analyst
Metropolitan Transportation Commission
101 Eighth Street
Oakland, CA 94607-4700
(510) 464-7731 (office)
(510) 464-7848 (fax)
www: http://www.mtc.ca.gov/
Census WWW: http://census.mtc.ca.gov/
***********************************************
>>> gigsjr(a)MISER.UMASS.EDU 01/23/02 11:10AM >>>
Dear SDCers,
Stefan is looking for a document that shows how traffic analysis zones
(TAZs)
from the 1990 "Census Transportation Planning Package: Urban Element"
relate to minor civil divisions (MCDs) in New England. Specifically, he
would like to find out which TAZs are located within the boundaries of
each MCD. He also needs to know if TAZs ever cross MCD boundaries in New
England (e.g., like urbanized areas do). If so, to what level of
geography do TAZs sum up to? Finally, are there any areas that are not
covered by TAZs?
Thanks in advance,
John Gaviglio, Manager
Data Group and the Massachusetts State Data Center
Massachusetts Institute for Social and Economic Research
University of Massachusetts at Amherst
413-545-3460
www.umass.edu/miser
Let me try to address some of the issues that Liz Hartmann raises:
Over the past two or three years, the concerns about privacy and
confidentiality in data release have grown exponentially. At the census
bureau, this issue exploded two years ago with the question of what would
be on the Public Use Microdata Sample file (PUMS). The original plan would
have, essentially, made the file useless. After a user meeting convened by
the Population Division at the bureau, and a LOT of internal negotiating,
the final outline for the 2000 PUMS file is quite reasonable. I have no
problem with top coding and rounding, because, in fact, the data are not
reliable at the per minute level for journey to work, or at the details of
very high income levels. My only problem is with the 100,000 cutoff, but
it was clear that we could not fight that this time around.
The long-form data are weighted to represent the total population in
publications and summary files. You can see those weights on the PUMS
file. Theoretically, they would be 6 to 1, since the sample is 1 in 6, but
a lot of things can make them different from that. I have not heard what
the cutoff level is going to be on SF4 for the long form data; the critical
point is that we won't have any cutoff level on SF3. Remember, again, that
SF2 and SF4 are a completely different concept than they were in
1980/90. They have exactly the same tables as SF1/SF3, but are iterated
for the small race/ethnic groups such as Indian tribes when there is a big
enough N to do so. That makes perfect sense to me.
The important point that needs to be made for TAZ files (especially the TAZ
to TAZ) is that these files are used only in modeling. No one makes a
printed (or web-based) report out of them! We all know, or should, that
the data are very, very thin, and thus really are not very reliable. But
there are so many TAZ pairs that the model handles the problems--or at
least I hope it does.
I think there is a reasonable chance that we will be able to get the CTPP
tables needed for transportation planning when the ACS reaches its full
five-year cycle, planned for 2008. The data will still be thinner than
long-form data; we will have a chance to measure what all that means for
geographic areas larger than tracts/TAZs as the one year, two year, and
three year accumulations become available. We are supposed to get one-year
tables for areas of 65,000 or more, and two-year data for areas of 30,000
or more. That's probably enough for the districts used in the models.
"Is there room for collapsing across some of the "people categories" in
order to have data for the smaller geographic levels? How much would that
cost in dollars and timeliness?" That's already handled in the design of
the tables for any given product, including the CTPP. The TAZ/TAZ file has
nothing but the count of workers, right? We get, or impute,
characteristics from the wider data set that's on the TAZ of residence and
TAZ of work files.
Yes, I completely agree with "But I'd hate to lose the baby with the
bathwater. There's so much to be gained from this incredible data source.
It's one of the best things going (in my opinion), and crucial to good
Planning." Personally, I plan to remain in the fight as long as
possible. We will have to deal with it product by product. Two weeks ago,
I attended a DC conference on privacy and confidentiality. Of perhaps 275
people there, at least 240 were from federal agencies. There were very few
users. We have to make sure the user community is heard from.
I will be happy to work with the CTPP community on this. I am a
demographer rather than a transportation planner, but have more experience
than most demographers with the CTPP (and UTPP) and the use of these data.
Patty Becker
-------------------------------------------------------------------------------------------------
Patricia C. (Patty) Becker 248/354-6520
APB Associates/SEMCC FAX 248/354-6645
28300 Franklin Road Home 248/355-2428
Southfield, MI 48034 pbecker(a)umich.edu
Thanks for the clarifications on SF2 data. There's so much information to
digest when understanding census...any and all assists from more experienced users are very much appreciated (and very much needed)!
I still, though, have concerns about losing the fine-geo-level of data. There's been a lot of talk about the effects of the "rule of 100" on data availability at the TAZ level, and I could see these same concerns affecting other long-form data. "Rule of 100" is currently affecting availability of 100% data on the SF2; the long-form is sample data, and will have less base N, even if totals are imputed (although I don't quite understand this methodology either). Is there room for collapsing across some of the "people categories" in order to have data for the smaller geographic levels? How much would that cost in dollars and timeliness?
Data privacy is a really compelling, knotty issue; how much privacy, and for whom? From whom? Is is public data, or government data? It seems like a decade ago, we didn't have the technology or data (so widely) available to even dream of the possible complications. And there are issues, too, about the representativeness/generalizability of summary data based on small N.
But I'd hate to lose the baby with the bathwater. There's so much to be gained from this incredible data source. It's one of the best things going (in my opinion), and crucial to good Planning.
So, is this just a transportation issue, or do other areas (health, human services, education, academia) deal with these issues in other ways?
Just thinkin',
Liz Hartmann