From: Howard Slavin [mailto:email@example.com]
Sent: Friday, November 15, 2013 11:32 AM
Subject: RE: [CTPP] Adventures in large datasets
Thanks to Krishnan for the kind mention of TransCAD. I wanted to let everyone know that
Caliper is now distributing the ctpp flow data in matrix form for TransCAD users. While
the data have been provided at the Transportation Analysis Zone level, only the Census
tract to Census tract flow data are complete, and these are the data that Caliper is now
providing to TransCAD users in the United States. The flow data are provided in the form
of two TransCAD matrices that provide the estimated flows between all of the Census tracts
in the U.S. and that distinguish estimated work trip flows with two different breakdowns
by means of transportation and margins of error. The dimensions of these matrices are
roughly 70,000 by 70,000, but remarkably they compress to only 1.5 GB or less and so are
downloadable. With TransCAD, it is easy for users to subset the matrices to cover any
geographic subarea of interest. These files can be downloaded by users at
We expect to make the TAZ to TAZ flow data available as soon as it is corrected by the
[mailto:firstname.lastname@example.org] On Behalf Of Krishnan Viswanathan
Sent: Thursday, November 14, 2013 9:59 PM
Subject: Re: [CTPP] Adventures in large datasets
Besides SQL server I have the following suggestions:
1) the ff package in R (
2) HDF5 seems like a decent option though I have not used it. Link to rhdf5 (
). Also, SFCTA has some code
for getting data into and out of HDF5 (
3) I have found TransCAD to be efficient in processing large datasets.
Hope this helps.
I downloaded the Maryland state raw data (the whole enchilada) that Penelope was good
enough to provide me. It came with documentation that clearly explains what needs to be
done but I am being hampered by the sheer size of the dataset. It's 10 GB and
that's without going into joining tables, transposing them to meet my needs, etc.
Even breaking the parts into different databases it can't be handled in Access. I can
fit Part 1 into an ESRI geodatabase but I don't have the flexibility in linking tables
that Access has.
Does anyone have any suggestions for dealing with large databases? SQL server is one
option. Are there others?
Mara Kaminowitz, GISP
Baltimore Metropolitan Council
Offices @ McHenry Row
1500 Whetstone Way
Baltimore, MD 21230
410-732-0500 ext. 1030<tel:410-732-0500%20ext.%201030>
ctpp-news mailing list