Hary

Thanks for the clarification and pointing out the drawbacks with using HDF5. I am all for open source software so if MySQL etc. works then great. 

I found TransCAD really fast in processing large (though not using 10 GB) datasets esp. when sorting and found it much faster than standard statistical packages. 

Krishnan




On Fri, Nov 15, 2013 at 9:03 AM, hprawiranata mitcrpc.org <hprawiranata@mitcrpc.org> wrote:
Krishnan,

HDF was data format, all climate data is stored on HDF format (I
experience writing C code for this data format on SPARC computers long
time ago and from HDF I converted to MySQL ! ). CTPP is just a flat
data set, txt. Converting into HDF is a different complicated story
and on a wrong direction.

Linking data tables from many tables has to be done using database
engine. MS Access won't cut it. M$ SQL is expensive. Free, Open Source
is the only way and fast.

TransCAD... I never test the limit. But linking simple tables is ok.
Large and many tables ? Better use data modeler software and link to
database engine.

Hary(ono) Prawiranata
Transportation Analyst/Modeler
Tri-County Regional Planning Commission
3135 Pine Tree Rd. Ste 2C
Lansing, MI 48911

On Thu, Nov 14, 2013 at 9:58 PM, Krishnan Viswanathan
<krisviswanathan@gmail.com> wrote:
> Mara
>
> Besides SQL server I have the following suggestions:
> 1) the ff package in R (
> http://www.bnosac.be/index.php/blog/22-if-you-are-into-large-data-and-work-a-lot-package-ff)
> 2) HDF5 seems like a decent option though I have not used it. Link to rhdf5
> ( http://bioconductor.org/packages/release/bioc/html/rhdf5.html). Also,
> SFCTA has some code for getting data into and out of HDF5 (
> https://github.com/sfcta/TAutils/tree/master/hdf5)
> 3) I have found TransCAD to be efficient in processing large datasets.
>
> Hope this helps.
>
> Krishnan
>
> I downloaded the Maryland state raw data (the whole enchilada) that Penelope
> was good enough to provide me.  It came with documentation that clearly
> explains what needs to be done but I am being hampered by the sheer size of
> the dataset.  It's 10 GB and that's without going into joining tables,
> transposing them to meet my needs, etc.  Even breaking the parts into
> different databases it can't be handled in Access.  I can fit Part 1 into an
> ESRI geodatabase but I don't have the flexibility in linking tables that
> Access has.
>
>
>
> Does anyone have any suggestions for dealing with large databases?  SQL
> server is one option.  Are there others?
>
>
>
> Mara Kaminowitz, GISP
> GIS Coordinator
> .........................................................................
> Baltimore Metropolitan Council
> Offices @ McHenry Row
> 1500 Whetstone Way
> Suite 300
> Baltimore, MD  21230
> 410-732-0500 ext. 1030
> mkaminowitz@baltometro.org
> www.baltometro.org
>
>
>
>
> _______________________________________________
> ctpp-news mailing list
> ctpp-news@ryoko.chrispy.net
> http://ryoko.chrispy.net/mailman/listinfo/ctpp-news
>
>
> _______________________________________________
> ctpp-news mailing list
> ctpp-news@ryoko.chrispy.net
> http://ryoko.chrispy.net/mailman/listinfo/ctpp-news
>
_______________________________________________
ctpp-news mailing list
ctpp-news@ryoko.chrispy.net
http://ryoko.chrispy.net/mailman/listinfo/ctpp-news



--
Krishnan Viswanathan
5628 Burnside Circle
Tallahassee FL 32312