From: Howard Slavin [mailto:email@example.com]
Sent: Friday, November 15, 2013 11:32 AM
Subject: RE: [CTPP] Adventures in large datasets
Thanks to Krishnan for the kind mention of TransCAD. I wanted to let everyone know that Caliper is now distributing the ctpp flow data in matrix form for TransCAD users. While the data have been provided at the Transportation Analysis Zone level, only the Census tract to Census tract flow data are complete, and these are the data that Caliper is now providing to TransCAD users in the United States. The flow data are provided in the form of two TransCAD matrices that provide the estimated flows between all of the Census tracts in the U.S. and that distinguish estimated work trip flows with two different breakdowns by means of transportation and margins of error. The dimensions of these matrices are roughly 70,000 by 70,000, but remarkably they compress to only 1.5 GB or less and so are downloadable. With TransCAD, it is easy for users to subset the matrices to cover any geographic subarea of interest. These files can be downloaded by users at http://www.caliper.com/Press/census-ctpp-worker-flow-data-download.htm
We expect to make the TAZ to TAZ flow data available as soon as it is corrected by the Census Bureau.
On Behalf Of Krishnan Viswanathan
Sent: Thursday, November 14, 2013 9:59 PM
Subject: Re: [CTPP] Adventures in large datasets
Besides SQL server I have the following suggestions:
1) the ff package in R ( http://www.bnosac.be/index.php/blog/22-if-you-are-into-large-data-and-work-a-lot-package-ff)
2) HDF5 seems like a decent option though I have not used it. Link to rhdf5 ( http://bioconductor.org/packages/release/bioc/html/rhdf5.html). Also, SFCTA has some code for getting data into and out of HDF5 ( https://github.com/sfcta/TAutils/tree/master/hdf5)
3) I have found TransCAD to be efficient in processing large datasets.
Hope this helps.