HI Everyone -
Ed Christopher and I were invited speakers at last week's Joint
Statistical Meetings, the statisticians' equivalent to TRB.
For those of you interested in survey methodology, it was fun to talk to
people who deal with the same kinds of issues that we have: cell
phones, paper diaries, non-response bias. We could learn a lot,
especially on implementing and analyzing longitudinal surveys that are
often used in health studies, so I would recommend attending JSM once in
a while especially if travel survey methods is your field. (Note:
Promod Chandok from BTS chairs a "transportation statistics" group that
he is trying to formalize).
1. The term "worker flow" has a different meaning to labor
economists than it does to us. They use this term to measure worker
entrances and exits from employers. (Session 471)
2. Administrative records. There is a LOT of discussion about
using administrative records to replace or augment traditional household
or housing unit surveys. This topic was discussed in both of the
sessions on Census 2020 (Sessions 389 and 563).
a. But, this means that a lot more research would need to be
completed about StARS (Statistical Administrative Records System). StARS
is a compilation of federal administrative records, mostly IRS and
Social Security, Medicare, Selective Service.
b. John Czajka (Session 331) compared income data from Current
Population Survey (CPS) to ACS and overall the numbers look OK, but when
you start to break down by income classes (high vs. low) or types of
income (wages vs. assets) then comparisons start to break down. One
point he made was that ACS counts college students where they are living
at the time of the survey, and CPS counts them at their parent's home.
(I think that the StARS would also most likely count college students at
their parent's home, because students are not likely to change their
address for reporting to SS or IRS).
c. WHERE the use of administrative records would occur in the
decennial census process is a big question - would it be used as a
follow-up, or would it be used at the front end, with surveys used as a
follow-up? The CB said that they could not rely on the USPS to be in
existence for the Census2020.
3. CB is talking about using the internet to collect responses in
a large (not test) way for Census 2020.
4. Using satellite image processing in the "visionary" session
(#389) on Census 2020. Joe Salvo, NYC, raised this topic . Maybe the
CB has an EARP (Exploratory Advanced Research Program) like USDOT has,
and could work with NASA and DoD to do some testing, maybe for the 2030
Census.
5. Several people (CB and non-CB) said that the per unit cost of a
completed decennial census survey is too high and the CB needs to hold
to no higher than the cost of 2010 per unit (current estimate for Census
2010 cost is $108/household). I think CB had a slide that said it was
60% higher per hhld cost than Census 2000.
6. The CB plans to issue a Fed Reg notice about the 5-year ACS
tabulations in August (this month). The 5-year (2005-2009) ACS
tabulations are planned to be released around December 2010. Census
2000 tracts and block groups boundaries are used (not the Census 2010
geography). These ACS tabulations will not have the benefit of the
Census 2010 results to be used for sample weighting.
Debbie Griffin (Session 511) said tables are NOT restricted based on
population thresholds or reliability. They WILL restrict based on
disclosure risk. They did not define what criteria were used to
identify the disclosure risk, but Laura Z (Session 158) said that 100
cells per table was a threshold that would result in suppression, and
also that GQ data will use partial synthesis to protect confidentiality.
The BLOCK GROUP data from the 5-year ACS (2005-2009) will NOT be
available from American Fact Finder, and may not be available in the
usual "data download" area, and may be restricted to Summary File
download and also Data Ferrett. Ken Hodges of Nielsen said he did not
understand why the CB was applying different data access rules to BG
data, when by his estimation, there were many more towns that are
smaller than the average BG, and these data access limitations were not
being applied to them. The Census Bureau says that BG data should ONLY
be used when used to build up larger geographic areas, because the
Margins of Error (MOEs) are too large otherwise.
We, the transportation community, should be ready to do some small area
(tract) analysis when these data are released and then make
recommendations as to whether or not to use these 5-year small area
tabulations. We know that sub-county estimates, e.g. place and PUMA
estimates, have been problematic in some areas.
7. The CB is working on web-based tutorials to help data users
understand ACS and multiple years of data accumulation and reporting.
These materials are going through 508-compliance review. The CB hopes
that they will have materials available in September.
8. Freddie Navarro reported several factors on why the ACS is less
reliable than the Census 2000 long form, including:
a. The mail-back + CATI "cooperation rate" was estimated to be
about 78% (when the ACS was in test period), based on Census 2000, but
this has been closer to 50%.
b. The lack of tract-level controls (pop totals by age/sex/race)
has resulted in an increase of 15-25% in the standard errors.
c. In summary, the most current results show that the Coefficient
of variation (C.V.) is 75% higher than the C.V. from the Census 2000 LF.
The original estimate was that the ACS C.V. would be about 33% higher.
This is consistent with our estimates that the sample size is about 50%
the size of Census 2000 LF after 5 years of data accumulation.
9. Ramesh Dandekar from EIA presented a paper on why rounded
values don't really protect confidentiality, and that by using Linear
Programming, having the independently rounded table marginals and the
rounded cell values, that he could guess many of the cell values
exactly, and many of the other cells by 1. I think that because so many
of the other cells were off by 1 or 2, that it would still meet the DRB
protection of individual confidentiality. However, conceptually, his
approach could be extended to use linear programming to solve for values
across multiple tables, e.g. Table A: MOT by income combined with Table
B: MOT by vehicles available, and then the numbers of cells with an
exact match would likely increase. He used the CTPP2000 tables with
rounded values as the example and did not have "real" data, but created
an example of an unrounded table. Of course, he used a small table as
the example, and how much computing power you would need to implement
this on a real CTPP tabulation is not clear, but since computing power
is cheap, I think it could be done if someone really wanted to try.
The approach being taken by NCHRP 08-79 (please see the synopsis of this
project in the August 2010 CTPP Status Report
http://www.fhwa.dot.gov/ctpp/sr1008.htm ) will avoid the use of rounding
for disclosure protection.
So, I hope that I haven't made any big errors, but if you find errors in
my email, please let me know so that I can correct them!
Elaine Murakami
FHWA Office of Planning (Wash DC)
206-220-4460 (in Seattle)