Hello everyone,
I recently finished a summer internship at the City of Cambridge MA
Community Development Department doing web development and demographic data
gathering. Much of of my job during July and August consisted of mining
data from the Census Summary File 1 and American Community Survey 5-year
estimates for use in a public profile of the city's neighborhood
characteristics.
To make this process easier, I wrote a simple utility to gather data using
the census API. While much of its functionality overlaps with American
FactFinder, it is also able to retrieve data from the block and block group
level, making it useful for areas like Cambridge where neighborhood
boundaries do not exactly align with census tracts. I'd like to release it
open source to anyone who'll find it useful.
Basic features:
- .EXE file that runs out of the Windows or Linux
terminal. Unfortunately won't run on Macintosh (but will work in parallels
or boot camp)
- Simple text-based interface
- Search any available geography anywhere in the United States, down to the
block/block group level (where available)
- Retrieve up to 5 variables at a time for unlimited geographies at a time.
The 5 variable limit is on the end of the census API, not this utility.
- Automatically saves retrieved variables into a .csv file that can be
opened by excel.
- Can only search ACS 5-year for the time being. Adding SF1 functionality
is fairly straightforward.
The national use section of the utility is fairly narrow, but has potential
for expansion. The most robust functionality is specific to the City of
Cambridge, and I added national searches as an afterthought.
The interface is divided into two modes, National and Cambridge specific
search, and both work similarly. When you choose to conduct a national
search of the ACS 5-year, you will be prompted with the following menu:
[image: Inline image 1]
Typing 10 will prompt you to enter the variable IDs (up to 5) and the
geographies. Note that you can search as many geographies as you like, but
they must be of the same type (the lowest in the hierarchy, so for *10:
state-county-tract-block group *you can search up to 5 variables for any
number of block groups in the chosen census tract.
The format for entering a search for query type 10 would be:
*tableID1,tableID2,tableID3 stateFIPScode countyFIPScode tractFIPScode
blockGroup1,blockGroup2*
For example to search ACS estimates B02001_002E and B02001_003E, (off the
top of my head I think _002E is race: white and _003E is race: black /
african american) in the state of Massachusetts (code 017) in Middlesex
county (code 025) census tract 353500 block groups 1 and 2, you would enter:
*B02001_002E,B002001_003E 025 017 353500 1,2*
*
*
This isn't the most friendly format but it's nice for grabbing large
amounts of information on several geographies at a time. For general ease
of use I'd suggest using an external notepad program to assemble your
queries and then paste them in so you don't have to retype it all in case
of a typo. All queries follow this format of the table/variable IDs
separated by commas followed by the geographies in the order they appear
separated by spaces.
The main functionality that I'd like to add in the future is the ability to
create custom geographies. Since Cambridge has 13 neighborhoods that do not
exactly line up with census tracts, I hard coded in each neighborhood's
component tracts and/or block groups so the neighborhoods could be searched
as a whole. This can be easily written to work dynamically for the entire
country, allowing you to save custom geographies of multiple census tracts,
counties, or any combination of counties, subcounties, block groups, etc.
There is a half-written geoGroup class that is intended to do this.
The program is written in C++ which is not the most user-friendly language,
but the code is commented and fairly repetitive, so shouldn't be too
arcane. I'll continually make changes and I'd invite anyone who's
interested in this utility's use / potential uses to give me feedback or
suggestions or make changes of your own. If there is interest in improving
it, I would also like to, at some point, rewrite the utility in Java with a
graphical interface and the ability to use custom geographies. I am in no
way a professional programmer (I'm a 4th year sociology undergrad with a
computer science minor) so I'm sure there are plenty of things in the code
that could be cleaned up or improved.
It was written in a Cygwin environment using a Cygwin version of LibCurl so
C could read data from http, and is released this under the general public
license (which is compatible with Cygwin's GPL and Curl's openBSD).
The current source code can be found in a GitHub repository
here<https://github.com/UpQuark/CensusRetrieverSource>ce>.
The compiled executable of the current version and its dependent libraries
can be downloaded here <http://www.fileswap.com/dl/01ZiyjiDlY/>.
Regards,
Samuel Ennis