HI Everyone –
Ed Christopher and I were invited speakers at last
week’s Joint Statistical Meetings, the statisticians’ equivalent to
TRB.
For those of you interested in survey methodology, it was
fun to talk to people who deal with the same kinds of issues that we
have: cell phones, paper diaries, non-response bias. We could learn
a lot, especially on implementing and analyzing longitudinal surveys that
are often used in health studies, so I would recommend attending JSM once in a
while especially if travel survey methods is your field.
(Note: Promod Chandok from BTS chairs a “transportation
statistics” group that he is trying to formalize).
1.
The term “worker flow” has a different
meaning to labor economists than it does to us. They use this term to
measure worker entrances and exits from employers. (Session 471)
2.
Administrative records. There is a LOT of
discussion about using administrative records to replace or augment traditional
household or housing unit surveys. This topic was discussed in both
of the sessions on Census 2020 (Sessions 389 and 563).
a. But,
this means that a lot more research would need to be completed about StARS
(Statistical Administrative Records System). StARS is a compilation of federal
administrative records, mostly IRS and Social Security, Medicare, Selective
Service.
b. John
Czajka (Session 331) compared income data from Current Population Survey
(CPS) to ACS and overall the numbers look OK, but when you start to break down
by income classes (high vs. low) or types of income (wages vs. assets) then
comparisons start to break down. One point he made was that ACS counts college
students where they are living at the time of the survey, and CPS counts
them at their parent’s home. (I think that the StARS would also most
likely count college students at their parent’s home, because students
are not likely to change their address for reporting to SS or IRS).
c. WHERE
the use of administrative records would occur in the decennial census process
is a big question – would it be used as a follow-up, or would it be used
at the front end, with surveys used as a follow-up? The CB said that they
could not rely on the USPS to be in existence for the Census2020.
3.
CB is talking about using the internet to collect
responses in a large (not test) way for Census 2020.
4.
Using satellite image processing in the
“visionary” session (#389) on Census 2020. Joe Salvo, NYC,
raised this topic . Maybe the CB has an EARP (Exploratory Advanced
Research Program) like USDOT has, and could work with NASA and DoD to do
some testing, maybe for the 2030 Census.
5.
Several people (CB and non-CB) said that the per unit
cost of a completed decennial census survey is too high and the CB needs to
hold to no higher than the cost of 2010 per unit (current estimate for Census
2010 cost is $108/household). I think CB had a slide that said it was 60%
higher per hhld cost than Census 2000.
6.
The CB plans to issue a Fed Reg notice about the 5-year
ACS tabulations in August (this month). The 5-year (2005-2009) ACS
tabulations are planned to be released around December 2010. Census 2000
tracts and block groups boundaries are used (not the Census 2010
geography). These ACS tabulations will not have the benefit
of the Census 2010 results to be used for sample weighting.
Debbie Griffin (Session 511) said tables are NOT
restricted based on population thresholds or reliability. They WILL
restrict based on disclosure risk. They did not define what criteria were
used to identify the disclosure risk, but Laura Z (Session 158) said that 100
cells per table was a threshold that would result in suppression, and also that
GQ data will use partial synthesis to protect confidentiality.
The BLOCK GROUP data from the
5-year ACS (2005-2009) will NOT be available from American Fact Finder, and may
not be available in the usual “data download” area, and may be restricted
to Summary File download and also Data Ferrett. Ken Hodges of
Nielsen said he did not understand why the CB was applying different data
access rules to BG data, when by his estimation, there were many more towns
that are smaller than the average BG, and these data access limitations were
not being applied to them. The Census Bureau says that BG data
should ONLY be used when used to build up larger geographic areas, because the
Margins of Error (MOEs) are too large otherwise.
We, the transportation community,
should be ready to do some small area (tract) analysis when these data are
released and then make recommendations as to whether or not to use these 5-year
small area tabulations. We know that sub-county estimates, e.g.
place and PUMA estimates, have been problematic in some areas.
7.
The CB is working on web-based tutorials to help data
users understand ACS and multiple years of data accumulation and
reporting. These materials are going through 508-compliance review.
The CB hopes that they will have materials available in September.
8.
Freddie Navarro reported several factors on why the ACS
is less reliable than the Census 2000 long form, including:
a. The
mail-back + CATI “cooperation rate” was estimated to be about 78%
(when the ACS was in test period), based on Census 2000, but this has been
closer to 50%.
b. The
lack of tract-level controls (pop totals by age/sex/race) has resulted in an
increase of 15-25% in the standard errors.
c. In
summary, the most current results show that the Coefficient of variation (C.V.)
is 75% higher than the C.V. from the Census 2000 LF. The original
estimate was that the ACS C.V. would be about 33% higher. This is
consistent with our estimates that the sample size is about 50% the size of
Census 2000 LF after 5 years of data accumulation.
9.
Ramesh Dandekar from EIA presented a paper on why
rounded values don’t really protect confidentiality, and that by using
Linear Programming, having the independently rounded table marginals and the
rounded cell values, that he could guess many of the cell values exactly, and
many of the other cells by 1. I think that because so many of the other
cells were off by 1 or 2, that it would still meet the DRB protection of
individual confidentiality. However, conceptually, his approach could
be extended to use linear programming to solve for values across multiple
tables, e.g. Table A: MOT by income combined with Table B: MOT by vehicles
available, and then the numbers of cells with an exact match would likely
increase. He used the CTPP2000 tables with rounded values as the example
and did not have “real” data, but created an example of an
unrounded table. Of course, he used a small table as the example, and how
much computing power you would need to implement this on a real CTPP tabulation
is not clear, but since computing power is cheap, I think it could be done if
someone really wanted to try. The approach being taken by NCHRP
08-79 (please see the synopsis of this project in the August 2010 CTPP Status
Report http://www.fhwa.dot.gov/ctpp/sr1008.htm
) will avoid the use of rounding for disclosure protection.
So, I hope that I haven’t
made any big errors, but if you find errors in my email, please let me know so that
I can correct them!
Elaine Murakami
FHWA Office of Planning (Wash DC)
206-220-4460 (in Seattle)