[CTPP] Scientific American Article on Sampling

8 Jun 1998

The following is the full text of an article on Census 2000 Sampling 
from the July 1998 issue of Scientific American.

**************************************************************
http://www.sciam.com:80/1998/0798issue/0798infocus.html
**************************************************************

Statistical Uncertainty

Researchers warn that continued debate over the 2000 census could
doom it to failure

Censuses in the U.S. have always seemed straightforward_it's just a
head count, right?_and have always proved, in practice, to be just
the opposite: logistically complex, politically contentious and
statistically inaccurate. Clerks were still tabulating the results of
the 1880 census eight years later. The 1920 count revealed such a
dramatic shift in population from farms to cities that Congress
refused to honor the results. And a mistake in doling out electoral
college seats based on the 1870 census handed Rutherford B. Hayes the
presidency when Samuel J. Tilden should in fact have been awarded the
most votes.

But after 1940 the accuracy of the census at least improved each
decade, so that only 1.2 percent of the population slipped past the
enumerators in 1980, according to an independent demographic
analysis. That trend toward increasing accuracy reversed in 1990,
however. The Census Bureau paid 25 percent more per home to count
people than it had in 1980, and its hundreds of thousands of workers
made repeated attempts to collect information on every person in every
house_what is called a full enumeration. Nevertheless, the number of
residents left off the rolls for their neighborhood rose to 15
million, while 11 million were counted where they should not have
been. The net undercount of four million amounted to 1.8 percent of
the populace.

Less than 2 percent might be an acceptable margin of error were it
not that some groups of people were missed more than others. A
quality-check survey found that blacks, for example, were
undercounted by 4.4 percent; rural renters, by 5.9 percent. Because
census data are put to so many important uses_from redrawing voting
districts and siting schools to distributing congressional seats and
divvying up some $150 billion in annual federal spending_all agree
that this is a problem.

In response, Congress unanimously passed a bill in 1991 commissioning
the National Academy of Sciences (NAS) to study ways to reduce cost
and error in the census. The expert panel arrived at an unequivocal
conclusion: the only way to reduce the undercount of all racial
groups to acceptable levels at an acceptable cost is to introduce
scientific sampling into the April 1, 2000, census and to give up the
goal of accounting directly for every individual. Other expert groups,
including a special Department of Commerce task force, two other NAS
panels, the General Accounting Office and both statisticians' and
sociologists' professional societies, have since added their strong
endorsement of a census that incorporates random sampling of some
kind.

After some waffling, the Census Bureau finally settled last year on a
plan to use two kinds of surveys. The first will begin after most
people have mailed back the census forms sent to every household.
Simulations predict that perhaps one third of the population will
neglect to fill out a form_more in some census tracts (clusters of
adjacent blocks, housing 2,000 to 7,000 people) than in others, of
course. To calculate the remainder of the population, census workers
will visit enough randomly selected homes to ensure that at least 90
percent of the households in each tract are accounted for directly.

So if only 600 out of 1,000 homes in a given tract fill out forms,
enumerators will knock on the doors of random nonrespondents until
they add another 300 to the tally. The number of denizens in the
remaining 100 houses can then be determined by extrapolation,
explains Howard R. Hogan, who leads the statistical design of the
census.

After the initial count is nearly complete, a second wave of census
takers will fan out across the country to conduct a much smaller
quality-control survey of 750,000 homes. Armed with a more meticulous
(and much more expensive) list of addresses than the census used,
this so-called integrated coverage measurement (ICM) will be used to
gauge how many people in each socioeconomic strata were overcounted
or undercounted in the first stage. The results will be used to
inflate or deflate the counts for each group in order to arrive at
final census figures that are closer to the true population in each
region.

"We endorsed the use of sampling [in the first stage] for two
reasons," reports James Trussell, director of population research at
Princeton University and a member of two NAS panels on the census.
"It saves money, and it at least offers the potential for increased
accuracy, because you could use a smaller, much better trained force
of enumerators." The Census Bureau puts the cost of the recommended,
statistics-based plan at about $4 billion. A traditional full
enumeration, it estimates, would cost up to $800 million more.

The ICM survey is important, says Alan M. Zaslavsky, a statistician
at Harvard Medical School, because it will reduce the lopsided
undercounting of certain minorities. "If we did a traditional
enumeration," he comments, "then we would in effect be saying one
more time that it is okay to undercount blacks by 3 or 4
percent_we've done it in the past, and we'll do it again."

Republican leaders in Congress do not like the answers given by such
experts. Two representatives and their advocates, including House
Speaker Newt Gingrich, filed suits to force the census takers to
attempt to enumerate everyone. Oral arguments in one trial were set
for June; the cases may not be decided until 1999.

The Republicans' main concern, explains Liz Podhoretz, an aide to the
House subcommittee on the census, is "that the ICM is five times
bigger than the [quality-check survey performed] in 1990, and they
plan to do it in half the time with less qualified people. And it
disturbs them that statisticians could delete a person's census data"
to adjust for overcounted socioeconomic groups.

Although the great majority of researchers support the new census
plan, there are several well-respected dissenters. "I think the 2000
design is going to have more error than the 1990 design," says David
A. Freedman of the University of California at Berkeley. The errors
to worry about, he argues, are not the well-understood errors
introduced by sampling but systematic mistakes made in collecting and
processing the data.

As an example, Freedman points out that a computer coding error made
in the quality check during the last census would have erased one
million people from the country and erroneously moved a congressional
seat from Pennsylvania to Arizona had the survey data been used to
correct the census. That mistake was not caught until after the
results were presented to Congress. "Small mistakes can have large
effects on total counts," adds Kenneth W. Wachter, another Berkeley
statistician.

"There are ways to improve the accuracy without sampling," Podhoretz
asserts. "Simplifying the form and offering it in several languages,
as is planned, should help. They should use [presumably more
familiar] postal workers as enumerators. They should use
administrative records, such as welfare rolls."

"That shows appalling ignorance," Trussell retorts. "Our first report
addressed that argument head-on and concluded that you cannot get
there by doing it the old way. You're just wasting a lot of money."

Representative Dan Miller of Florida was planning to introduce a bill
in June that would make it illegal to delete any nonduplicated census
form from the count. Such a restriction would derail the census,
Trussell warns. "The idea behind sampling is not to eliminate anybody
but to arrive at the best estimate of what the actual population is.
Surely the goal is not just to count as many people as possible?"

As the debate drags on, the brinkmanship is making statisticians
nervous. Podhoretz predicts that "some kind of a showdown is likely
next spring." That may be too late. "You don't want to redesign a
census at the last minute," Freedman says.

"I think the two sides should just agree to flip a coin," Trussell
says. "To think next year about what we're going to do is madness."
Wachter concurs: "We must not let the battle over sampling methods
destroy the whole census." Otherwise April 1, 2000, may make all
involved look like April fools.

--W. Wayt Gibbs in San Francisco 
****************************************************************

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

[CTPP] Scientific American Article on Sampling