When you talk about the number of variables, do you mean combinations or sequential tables? 

A combination is, e.g., means of travel to work by sex, by race/Hispanic origin. That's 3 variables.

Sequential is. e.g., looking first at means of travel, then secondly at travel time, then sex of workers, then race/Hispanic origin of workers. That's four variables but they are not being cross-tabbed with one another. 

I agree that three is a reasonable limit in a combination. The data get stretched so thin that even 3 may be too many.

In sequential, you should be able to look at as many variables as needed.

Overall, it's still all about the number of cases. The more cases, the lower the MOE and vice versa.

Patty Becker

On Thu, Mar 14, 2013 at 3:13 PM, Steven Farber <Steven.Farber@geog.utah.edu> wrote:
Ed, Slide 39 (and a few preceding) lay the issue out on the table. It does indeed have to do with covariance. The true equation for the SE of a sum of random errors must include their covariances in the sum.  I suppose when the number of variables being added together is small, the omission of covariances does not lead to a large difference in MOEs. But the estimation of MOEs gets worse and worse, the more covariances that you leave out. Since covariances are pairwise, the number being left out grows exponentially with respect to the number of variables being added together.

Nancy's email below:

We were told to limit our aggregations to four items by Mark Asiala from the ACS staff at our Annual California State Data Center meeting.  It is mentioned in his presentation slides online at http://www.dof.ca.gov/research/demographic/state_census_data_center/meetings/documents/CASDC_AnnualMtg2012_Asiala-ACSUpdate.pdf   Slide 39

There is other information in this presentation about the new sample frame in the 2011 survey which may be interesting.

Nancy Gemignani
California State Census Data Center
Demographic Research Unit
(916) 327-0103 ext 2550

Steven Farber, Ph.D
Assistant Professor
Department of Geography
University or Utah
http://stevenfarber.wordpress.com

-----Original Message-----
From: ctpp-news-bounces@chrispy.net [mailto:ctpp-news-bounces@chrispy.net] On Behalf Of Ed Christopher
Sent: March-14-13 12:57 PM
To: ctpp-news@chrispy.net
Subject: Re: [CTPP] Working with County flow data

Warning! This may ramble so if you do not care about the issue delete.

Steve I am looking for specific references to the "limit of 3".  I know I have heard this many times and in fact tested it myself.  Using data from the Missouri State Data Center I got Tract data for the modes people use to go work for my neighborhood Tract. With the Missouri data they had published the total commuters calculated with a MOE along with the total workers. I then went and pulled the 4 block groups for my neighborhood from the census website. At the time the Missouri Data Center did not have Block Group data published.  I do not know if they have them now, I did not check. While I could get the breakdown of the modes for the BGs, the table did not have the total commuters as a subtotal with a corresponding MOE. I figured I could just calculate my own adding up the 5 modes and do the calculation.  Before I went off to do this the scientist in me took over and I tested the formula on the tract data just to see if I could replicate the published MOE for the total n!
 umber of commuters.  I could not do it.  Fortunately, Liang Long came to my rescue and suggested that I just take the Total number of workers and subtract those who work at home (both of which have MOEs) and try that.  It worked!  I could replicate the published MOE.  What this did was prove that as you more variables to the mix the formula for calculating the MOE breaks down.

For what I was doing I was able to find a way of only working with two variables but many times you can not.

When I presented this at a transportation census conference in October of 2011 several users in the "power users session" confirmed that they had heard that 3 was the most variables you wanted to use at a time.  I did find this on the census site that says "limit the number of variables"
http://www.census.gov/acs/www/Downloads/data_documentation/Statistical_Testing/2011StatisticalTesting3and5year.pdf


A few days ago I talked with Elaine Murakami about this and she had the perfect rule of thumb for me.  Since the whole MOE thing is just an approximation anyway "just take the largest MOE in the string of numbers you are aggregating and use that".  If you think about it, this does make some mathematical and more importantly intuitive sense.  I wish we could get some statisticians to help out here. We need easy, quick to use methods.


Steven Farber wrote:
> I think I jumped the gun before when stating concerns over exploding MOE's.
>
> Going back to the New York State Data Center document, you'll notice that the MOE has increased in absolute terms when summing over areas, but dropped in relative terms in comparison to the sum.
>
> So MOE has increased but the Coefficient of Variation has dropped. In other words, our aggregated estimate is more precise than each of the smaller area estimates.
>
> http://www.census.gov/acs/www/Downloads/handbooks/ACSResearch.pdf - Appendix 3 contains all the calculations required.
>
> Ed, do you recall where you saw that this type of calculation should be limited to 3 summands at a time?
>
>
>
>
> Steven Farber, Ph.D
> Assistant Professor
> Department of Geography
> University or Utah
> http://stevenfarber.wordpress.com
>
> -----Original Message-----
> From: ctpp-news-bounces@chrispy.net
> [mailto:ctpp-news-bounces@chrispy.net] On Behalf Of liang.long@dot.gov
> Sent: March-12-13 10:18 AM
> To: ctpp-news@chrispy.net
> Subject: Re: [CTPP] Working with County flow data
>
> I can see why Census doesn't recommend do more than three variables at a time.  When you add 17 counties together, you get a much bigger area with more households sampled.  In theory, you should get a smaller MOEs compared each individual county.  But if you derive MOEs from those 17 counties, you will get a much bigger MOEs, which is contradictory to the theory.
>
>
> ________________________________________
> From: ctpp-news-bounces@chrispy.net [ctpp-news-bounces@chrispy.net] on
> behalf of Ed Christopher [edc@berwyned.com]
> Sent: Tuesday, March 12, 2013 11:15 AM
> To: ctpp-news@chrispy.net
> Subject: Re: [CTPP] Working with County flow data
>
> Thanks--I know the spread sheet allows you to recalculate MOEs for more than three variables but I remember doing more than 3 a while back and I was getting some wild MOEs.  When I dug into it I found something in the Census compass reports that said not to do more than three variables at a time.  I was hoping that someone figured out a way around this.
>
> Ed C
>
> On Mar 12, 2013, at 9:59 AM, "Hoctor Mulmat, Darlanne" <Darlanne.Mulmat@sandag.org<mailto:Darlanne.Mulmat@sandag.org>> wrote:
>
> The New York State Data Center developed a Statistical Calculations Menu that includes an option for computing the margin of error for the sum of three or more estimates. See attached.
>
> Darlanne Hoctor Mulmat
> Applied Research Division - Criminal Justice/Public Policy San Diego
> Association of Governments
> 619-699-7326
>
> From:
> ctpp-news-bounces@chrispy.net<mailto:ctpp-news-bounces@chrispy.net>
> [mailto:ctpp-news-bounces@chrispy.net] On Behalf Of
> Ed.Christopher@dot.gov<mailto:Ed.Christopher@dot.gov>
> Sent: Tuesday, March 12, 2013 6:57 AM
> To: ctpp-news@chrispy.net<mailto:ctpp-news@chrispy.net>
> Subject: [CTPP] Working with County flow data
>
> Has anyone come up with some easy ways for collapsing and grouping counties together using last week's county flow data and recalculating new MOEs. I have so many counties that I want to group together that I am looking for a quick way that can handle "lots" of counties.  Another issue I am struggling with is that we are always told not to group more than three variables at a time or the formulas for calculating the new MOE do not really work.  This is particularly troublesome especially if I am trying to group 17 counties together.  What it comes down to is 9 different calculations given that I can only group 3 counties at a time together.  Anyone figure out any short cuts or ways around this short of disregarding the MOEs altogether?  Given all the clustering that I am looking at using the "cheat" sheets I am used to, I will be recalculating MOEs for weeks.
>
>
> Ed Christopher
> <StatisticalCalculationsMenu.xls>
> _______________________________________________
> ctpp-news mailing list
> ctpp-news@ryoko.chrispy.net<mailto:ctpp-news@ryoko.chrispy.net>
> http://ryoko.chrispy.net/mailman/listinfo/ctpp-news
>
> _______________________________________________
> ctpp-news mailing list
> ctpp-news@ryoko.chrispy.net
> http://ryoko.chrispy.net/mailman/listinfo/ctpp-news
>
> _______________________________________________
> ctpp-news mailing list
> ctpp-news@ryoko.chrispy.net
> http://ryoko.chrispy.net/mailman/listinfo/ctpp-news
>

--
Ed Christopher
708-283-3534 (V)
708-574-8131 (cell)

FHWA RC-TST-PLN
4749 Lincoln Mall Drive, Suite 600
Matteson, IL  60443
_______________________________________________
ctpp-news mailing list
ctpp-news@ryoko.chrispy.net
http://ryoko.chrispy.net/mailman/listinfo/ctpp-news

_______________________________________________
ctpp-news mailing list
ctpp-news@ryoko.chrispy.net
http://ryoko.chrispy.net/mailman/listinfo/ctpp-news



--
Patricia C. (Patty) Becker
APB Associates/Southeast Michigan Census Council (SEMCC)
28300 Franklin Rd, Southfield, MI 48034
office: 248-354-6520
home:248-355-2428
pbecker@umich.edu