From: Howard Slavin [mailto:howard@caliper.com]
Sent: Friday, November 15, 2013 11:32 AM
To: 'ctpp-news@chrispy.net'
Subject: RE: [CTPP] Adventures in large datasets

Thanks to Krishnan for the kind mention of TransCAD. I wanted to let everyone know that Caliper is now distributing the ctpp flow data in matrix form for TransCAD users. While the data have been provided at the Transportation Analysis Zone level, only the Census tract to Census tract flow data are complete, and these are the data that Caliper is now providing to TransCAD users in the United States. The flow data are provided in the form of two TransCAD matrices that provide the estimated flows between all of the Census tracts in the U.S. and that distinguish estimated work trip flows with two different breakdowns by means of transportation and margins of error. The dimensions of these matrices are roughly 70,000 by 70,000, but remarkably they compress to only 1.5 GB or less and so are downloadable. With TransCAD, it is easy for users to subset the matrices to cover any geographic subarea of interest. These files can be downloaded by users at http://www.caliper.com/Press/census-ctpp-worker-flow-data-download.htm

We expect to make the TAZ to TAZ flow data available as soon as it is corrected by the Census Bureau.

Howard

From: ctpp-news-bounces@chrispy.net [mailto:ctpp-news-bounces@chrispy.net] On Behalf Of Krishnan Viswanathan
Sent: Thursday, November 14, 2013 9:59 PM
To: ctpp-news@chrispy.net
Subject: Re: [CTPP] Adventures in large datasets

Mara

Besides SQL server I have the following suggestions:
1) the ff package in R ( http://www.bnosac.be/index.php/blog/22-if-you-are-into-large-data-and-work-a-lot-package-ff)
2) HDF5 seems like a decent option though I have not used it. Link to rhdf5 ( http://bioconductor.org/packages/release/bioc/html/rhdf5.html). Also, SFCTA has some code for getting data into and out of HDF5 ( https://github.com/sfcta/TAutils/tree/master/hdf5)
3) I have found TransCAD to be efficient in processing large datasets.

Hope this helps.

Krishnan

I downloaded the Maryland state raw data (the whole enchilada) that Penelope was good enough to provide me. It came with documentation that clearly explains what needs to be done but I am being hampered by the sheer size of the dataset. It's 10 GB and that's without going into joining tables, transposing them to meet my needs, etc. Even breaking the parts into different databases it can't be handled in Access. I can fit Part 1 into an ESRI geodatabase but I don't have the flexibility in linking tables that Access has.

Does anyone have any suggestions for dealing with large databases? SQL server is one option. Are there others?

Mara Kaminowitz, GISP
GIS Coordinator
.........................................................................
Baltimore Metropolitan Council
Offices @ McHenry Row
1500 Whetstone Way
Suite 300
Baltimore, MD 21230
410-732-0500 ext. 1030
mkaminowitz@baltometro.org
www.baltometro.org

_______________________________________________
ctpp-news mailing list
ctpp-news@ryoko.chrispy.net
http://ryoko.chrispy.net/mailman/listinfo/ctpp-news