Hello everyone,
I recently finished a summer internship at the City of Cambridge MA Community Development Department doing web development and demographic data gathering. Much of of my job during July and August consisted of mining data from the Census Summary File 1 and American Community Survey 5-year estimates for use in a public profile of the city's neighborhood characteristics.
To make this process easier, I wrote a simple utility to gather data using the census API. While much of its functionality overlaps with American FactFinder, it is also able to retrieve data from the block and block group level, making it useful for areas like Cambridge where neighborhood boundaries do not exactly align with census tracts. I'd like to release it open source to anyone who'll find it useful.
Basic features:
- .EXE file that runs out of the Windows or Linux terminal. Unfortunately won't run on Macintosh (but will work in parallels or boot camp)
- Simple text-based interface
- Search any available geography anywhere in the United States, down to the block/block group level (where available)
- Retrieve up to 5 variables at a time for unlimited geographies at a time. The 5 variable limit is on the end of the census API, not this utility.
- Automatically saves retrieved variables into a .csv file that can be opened by excel.
- Can only search ACS 5-year for the time being. Adding SF1 functionality is fairly straightforward.
The national use section of the utility is fairly narrow, but has potential for expansion. The most robust functionality is specific to the City of Cambridge, and I added national searches as an afterthought.
The interface is divided into two modes, National and Cambridge specific search, and both work similarly. When you choose to conduct a national search of the ACS 5-year, you will be prompted with the following menu:
Typing 10 will prompt you to enter the variable IDs (up to 5) and the geographies. Note that you can search as many geographies as you like, but they must be of the same type (the lowest in the hierarchy, so for 10: state-county-tract-block group you can search up to 5 variables for any number of block groups in the chosen census tract.
The format for entering a search for query type 10 would be:
tableID1,tableID2,tableID3 stateFIPScode countyFIPScode tractFIPScode blockGroup1,blockGroup2
For example to search ACS estimates B02001_002E and B02001_003E, (off the top of my head I think _002E is race: white and _003E is race: black / african american) in the state of Massachusetts (code 017) in Middlesex county (code 025) census tract 353500 block groups 1 and 2, you would enter:
B02001_002E,B002001_003E 025 017 353500 1,2
This isn't the most friendly format but it's nice for grabbing large amounts of information on several geographies at a time. For general ease of use I'd suggest using an external notepad program to assemble your queries and then paste them in so you don't have to retype it all in case of a typo. All queries follow this format of the table/variable IDs separated by commas followed by the geographies in the order they appear separated by spaces.
The main functionality that I'd like to add in the future is the ability to create custom geographies. Since Cambridge has 13 neighborhoods that do not exactly line up with census tracts, I hard coded in each neighborhood's component tracts and/or block groups so the neighborhoods could be searched as a whole. This can be easily written to work dynamically for the entire country, allowing you to save custom geographies of multiple census tracts, counties, or any combination of counties, subcounties, block groups, etc. There is a half-written geoGroup class that is intended to do this.
The program is written in C++ which is not the most user-friendly language, but the code is commented and fairly repetitive, so shouldn't be too arcane. I'll continually make changes and I'd invite anyone who's interested in this utility's use / potential uses to give me feedback or suggestions or make changes of your own. If there is interest in improving it, I would also like to, at some point, rewrite the utility in Java with a graphical interface and the ability to use custom geographies. I am in no way a professional programmer (I'm a 4th year sociology undergrad with a computer science minor) so I'm sure there are plenty of things in the code that could be cleaned up or improved.
It was written in a Cygwin environment using a Cygwin version of LibCurl so C could read data from http, and is released this under the general public license (which is compatible with Cygwin's GPL and Curl's openBSD).
The current source code can be found in a GitHub repository
here. The compiled executable of the current version and its dependent libraries can be downloaded
here.
Regards,
Samuel Ennis