TO: CTPP-News
RE: ACS and Decennial Census
I'm forwarding this extensive message from John Blodgett of the Missouri State Data
Center (forwarded from the State Data Center listserv.)
Chuck Purvis, MTC
***** John Blodgett's e-mail:
***************************************************************************
Let me just throw in a few other issues that noone has mentioned yet in this discussion
regarding the ACS as a replacement for the decennial long form. These are issues that
are built into the ACS design, and have nothing to do with any possible problems with
implementation or subtle differences in the way questions are asked, etc. (In the software
development world, this would like a distinction between a "bug" and a
"feature". These are features of ACS that we will probably all have to learn to
live with.) They mostly have to do with the universes being measured by the 2 surveys
(decennial vs ACS), and with the joys of moving averages.
The universes of these 2 surveys are not really the same. The differences that bother me
have to do with the time dimension. The census looks at the population on a single day,
April 1 of the census year. ACS looks at the population for all months of the year, and
in the case of smaller geographic areas, will be based on a "moving average" or
persons living there over a period of 2 or more years. For geographic areas such as
states, metro areas, large cities and counties, this will probably not be such a big
difference. But for smaller areas, and especially for small areas with
"seasonal" populations, there will be major differences.
1. For example: A (real) census tract here in Boone County, Mo (Columbia) has about 3000
people living there according to sf1. They all live in group quarters -- dormitories.
That is what was counted on April 1, 2000. But ACS would not count such persons on a
single day. They would instead be using data based on 60 months of surveys over 5 years.
A good guess is that at least 20% of these students would be living somewhere else during
the summer months, or during the various academic break periods occurring throughout the
year. The ACS is going to count people at their resdidence on some day in Jan, July or
August -- whenever they are conducting the survey. If 20% of the students, on average,
are not there on any given day of the year then we have a builtin mismatch. We will
simply not have sample characteristics of the 3000 people counted in the census from the
ACS. This is not a mistake, it's a "feature" of the ACS. Just like the
new race detail is a "feature" of the 2k census and makes it impossible to
compare race data from early censuses. Live with it.
(Not only do we not have sample characteristics of the population, we no longer can talk
about "the" population of this census tract. We have the census count of
4-1-2000, but we also have a moving-average estimate from ACS. This applies not just to
the census tract, but also to the county and city. There will be significant differences
in these numbers for college towns and resort areas.)
2. The "moving average" feature of the ACS is not a bad way to go as long as the
area being described is relatively stable over the period of the averaging. The data will
only be really bad for the areas where it would be most interesting. The Bureau has
indicated they will probably do some kind of over-sampling in areas perceived as
undergoing rapid change, but for now that's a pig in a polk. Another aspect of the
moving average problem is that of "moving geography". How do we get
characteristics for those jurisdictions that operate in "Continuous Annexation"
mode? When I get my data in 2008 for the city of Ashland, MO based on data collected in
2003-2007 is it all going to be for the then-current (2008) city boundaries? So then when
I get another set in 2009, will they have to once again "move the chains" so
that I can get fresh data for the previous year for the current year boundaries?
(Complicated, isn't it? We have data over time, for a geographic entity that is
changing over time. Really tough to pin things down. And fairly impossible to do any
kind of analysis that may involve trends for the city based on their incorporated limits
over time. Very messy. Even if it could be done, could it be explained?) This problem
also pertains to ZIP/ZCTA areas and school districts. Not that we can expect those to be
published.
3. I am concerned about the possible proliferation of alternative versions of sample
data. The ACS folks have announced their plan to publish data for areas based on their
population size. Areas of 65,000+ get new data each year based on a single year of ACS
survey results, while areas somewhat smaller will get data based on a 2 or 3 or 4-year
moving average. A question arises as to what the "best" numbers are for an
area (and also as to whether we'll be given a choice). What if my county of 70,000
gets data published in 2008 based on households sampled in 2007. But I want to look at
the characteristics of single hispanic mothers in the county, and this sub-population is
small enough that the single-year estimates are garbage. So can I also get a 5-year
moving average for the county that will have smaller std errors, but which could be
misleading if there was a dramatic shift in the subpopulation being studied over the
period? Since the Bureau will, by then, be publishing data down to the BG level, I guess
I could do my own custom aggregation of the bg data even if I could not get it directly
& easily from the Bureau. That's a little extra work, but some people will not
see it as a big deal (others definitely will.) What really bothers me about this is the
"flexibility factor" -- I think I see that there will be many ways to answer the
same question. Right now, if someone asks me what the poverty rate is for a county I
can tell them we don't really know except for the numbers collected at the last
census, or you can take your chances with the SAIPE estimates. But in the era of ACS data
I can tell them we only have estimates for some counties so far, and for those we have
several versions we can get, based on the number of years of data we use to do the
estimate. And then the user says, "use the numbers that yield the highest value (or
lowest) value" so I can use those in my grant application". Uh oh.
(paragraph #s are just for ease of reference.)
John Blodgett
OSEDA - Office of Social & Economic Data Analysis
U. of Missouri Outreach and Extension
626 Clark Hall - UMC
Columbia, MO 65211
(573) 882-2727
blodgettj(a)umsystem.edu <mailto:blodgettj@umsystem.edu>