HI Everyone –

Ed Christopher and I were invited speakers at last week’s Joint Statistical Meetings, the statisticians’ equivalent to TRB.

For those of you interested in survey methodology, it was fun to talk to people who deal with the same kinds of issues that we have: cell phones, paper diaries, non-response bias. We could learn a lot, especially on implementing and analyzing longitudinal surveys that are often used in health studies, so I would recommend attending JSM once in a while especially if travel survey methods is your field. (Note: Promod Chandok from BTS chairs a “transportation statistics” group that he is trying to formalize).

1. The term “worker flow” has a different meaning to labor economists than it does to us. They use this term to measure worker entrances and exits from employers. (Session 471)

2. Administrative records. There is a LOT of discussion about using administrative records to replace or augment traditional household or housing unit surveys. This topic was discussed in both of the sessions on Census 2020 (Sessions 389 and 563).

a. But, this means that a lot more research would need to be completed about StARS (Statistical Administrative Records System). StARS is a compilation of federal administrative records, mostly IRS and Social Security, Medicare, Selective Service.

b. John Czajka (Session 331) compared income data from Current Population Survey (CPS) to ACS and overall the numbers look OK, but when you start to break down by income classes (high vs. low) or types of income (wages vs. assets) then comparisons start to break down. One point he made was that ACS counts college students where they are living at the time of the survey, and CPS counts them at their parent’s home. (I think that the StARS would also most likely count college students at their parent’s home, because students are not likely to change their address for reporting to SS or IRS).

c. WHERE the use of administrative records would occur in the decennial census process is a big question – would it be used as a follow-up, or would it be used at the front end, with surveys used as a follow-up? The CB said that they could not rely on the USPS to be in existence for the Census2020.

3. CB is talking about using the internet to collect responses in a large (not test) way for Census 2020.

4. Using satellite image processing in the “visionary” session (#389) on Census 2020. Joe Salvo, NYC, raised this topic . Maybe the CB has an EARP (Exploratory Advanced Research Program) like USDOT has, and could work with NASA and DoD to do some testing, maybe for the 2030 Census.

5. Several people (CB and non-CB) said that the per unit cost of a completed decennial census survey is too high and the CB needs to hold to no higher than the cost of 2010 per unit (current estimate for Census 2010 cost is $108/household). I think CB had a slide that said it was 60% higher per hhld cost than Census 2000.

6. The CB plans to issue a Fed Reg notice about the 5-year ACS tabulations in August (this month). The 5-year (2005-2009) ACS tabulations are planned to be released around December 2010. Census 2000 tracts and block groups boundaries are used (not the Census 2010 geography). These ACS tabulations will not have the benefit of the Census 2010 results to be used for sample weighting.

Debbie Griffin (Session 511) said tables are NOT restricted based on population thresholds or reliability. They WILL restrict based on disclosure risk. They did not define what criteria were used to identify the disclosure risk, but Laura Z (Session 158) said that 100 cells per table was a threshold that would result in suppression, and also that GQ data will use partial synthesis to protect confidentiality.

The BLOCK GROUP data from the 5-year ACS (2005-2009) will NOT be available from American Fact Finder, and may not be available in the usual “data download” area, and may be restricted to Summary File download and also Data Ferrett. Ken Hodges of Nielsen said he did not understand why the CB was applying different data access rules to BG data, when by his estimation, there were many more towns that are smaller than the average BG, and these data access limitations were not being applied to them. The Census Bureau says that BG data should ONLY be used when used to build up larger geographic areas, because the Margins of Error (MOEs) are too large otherwise.

We, the transportation community, should be ready to do some small area (tract) analysis when these data are released and then make recommendations as to whether or not to use these 5-year small area tabulations. We know that sub-county estimates, e.g. place and PUMA estimates, have been problematic in some areas.

7. The CB is working on web-based tutorials to help data users understand ACS and multiple years of data accumulation and reporting. These materials are going through 508-compliance review. The CB hopes that they will have materials available in September.

8. Freddie Navarro reported several factors on why the ACS is less reliable than the Census 2000 long form, including:

a. The mail-back + CATI “cooperation rate” was estimated to be about 78% (when the ACS was in test period), based on Census 2000, but this has been closer to 50%.

b. The lack of tract-level controls (pop totals by age/sex/race) has resulted in an increase of 15-25% in the standard errors.

c. In summary, the most current results show that the Coefficient of variation (C.V.) is 75% higher than the C.V. from the Census 2000 LF. The original estimate was that the ACS C.V. would be about 33% higher. This is consistent with our estimates that the sample size is about 50% the size of Census 2000 LF after 5 years of data accumulation.

9. Ramesh Dandekar from EIA presented a paper on why rounded values don’t really protect confidentiality, and that by using Linear Programming, having the independently rounded table marginals and the rounded cell values, that he could guess many of the cell values exactly, and many of the other cells by 1. I think that because so many of the other cells were off by 1 or 2, that it would still meet the DRB protection of individual confidentiality. However, conceptually, his approach could be extended to use linear programming to solve for values across multiple tables, e.g. Table A: MOT by income combined with Table B: MOT by vehicles available, and then the numbers of cells with an exact match would likely increase. He used the CTPP2000 tables with rounded values as the example and did not have “real” data, but created an example of an unrounded table. Of course, he used a small table as the example, and how much computing power you would need to implement this on a real CTPP tabulation is not clear, but since computing power is cheap, I think it could be done if someone really wanted to try. The approach being taken by NCHRP 08-79 (please see the synopsis of this project in the August 2010 CTPP Status Report http://www.fhwa.dot.gov/ctpp/sr1008.htm ) will avoid the use of rounding for disclosure protection.

So, I hope that I haven’t made any big errors, but if you find errors in my email, please let me know so that I can correct them!

Elaine Murakami

FHWA Office of Planning (Wash DC)

206-220-4460 (in Seattle)