Thanks Elaine. I haven't actually used OnTheMap other than to explore the web
application and try some downloads but I thought that I would forward the caveats that I
have run across.
Re: Version 3 OTM...(see:
http://www.vrdc.cornell.edu/onthemap/data/v3/notes-otm-v3.0.pdf
)
V3 Caveats:
•QWI numbers are drawn from QWI release R2008Q2. Stable jobs and statistics related to
'stable jobs' have a bug which has been fixed in QWI release R2008Q4. An update to
the data is expected later this year. Consult the LEHD site for more details.
•Current QWI numbers are considered experimental. Since they are computed using the same
confidentiality protection technology as the general QWI numbers, but using much finer
geographic cells, there are more suppressions (more smaller numbers to protect). Users
should be aware that when aggregating numbers to levels that are comparable to the general
QWI data, the numbers they generate will be systematically lower.
•Only one implicate has been released at this point. Future updates should be counted (and
used) as additional implicates. Additional implicates may be released in the future as
well.
•WAC, OD, and QWI files are only available for states participating in the LEHD program.
RAC files are available for all states, even those not actively participating in the LEHD
program, but coverage is limited. For example, a worker of a NJ company (NJ participates
in LEHD) may live in NY (which not yet been integrated). Thus, a residence area for this
and other workers is defined, and available here for download. However, the residence area
information will NOT include information on workers of NY companies, since that
information was not available at the time that OnTheMap v3.0 was created. (This applies
for v3.0 to: CT, DC, MA, NY, NY, OH, PR, VI). These RAC files can be accessed through the
links in the OD AUX files of states where these workers are tabulated.
It appears that only one implicate has been released in Version 3.
Version 2 documentation from section 1.2.3 at
http://www.vrdc.cornell.edu/onthemap/doc/otm_public_master.pdf indicates that three
implicates were available and adds the following warning:
This version of the data provides 3 implicates (independent draws in the synthesizing
algorithm) for the OD matrix and the Residence Area Characteristics (RAC) files. This is
reflected in the filenames, see Sections 1.4.3 and 1.4.4. For further information on how
to properly analyze multiply synthesized or imputed datasets, see Raghunathan et al.
(2003); Reiter (2004b) and Reiter (2004a), or consult Sessions 8a and 8b of the online
INFO 747 class at Cornell University's CISER at
http://vrdc.ciser.cornell.edu/info747/. A note of warning is in order, though: It is
statistically incorrect to use the average of the 3 implicates unless the aggregator
function is strictly linear. Adding geographic areas is linear, and forming ratios from
two linearly aggregated quantities (earnings over employment, for example) can be done
correctly as long as the numerator and the denominator are averaged separately.