This e-mail is a follow-up to the another posted a while ago. Daryl Scott from South
Western Regional Planning Agency wrote to me a note on how he converted SF1 data into an
Oracle database, and subsequently related it to Arcview. Attached is his note.
SF1 has a number of tables. Oracle and other databases are especially capable of handling
huge tables and files. Darryl's note will be useful to anyone interested in
developing extensive Census 2000 databases for their planning region.
Thanks
Nanda Srinivasan
**********************************************
INCOMING from Darryl Scott:
Here is my experience with importing the data into Oracle.
My goal was to load all the data for the State of Connecticut, all Counties
in Connecticut, all Towns in Connecticut, and all tracts, block groups, and
blocks in Fairfield County from Census 2000 Summary File 1 into Oracle and
link it to GIS. Below are the steps I took to accomplish that task. I do
not describe the failed attempts in detail.
1. I downloaded the data in ASCII format for Connecticut from the Census Web
Site and extracted all the files.
2. I imported the ctgeo file into Oracle.
3. Then I created a view that contained all the records that I wanted to
import into Oracle.
---
create view ctfcgeo as
SELECT sumlev, geocomp, logrecno, state, county, cousub, cousubcc, place,
placecc, placedc, tract, blkgrp, block, msacmsa, cmsa, macci,
pmsa, necma, necmacci, ua, uatype, ur, sldu, sldl, vtd, vtdi, zcta3, zcta5,
arealand, areawatr, arealand/2590000 as area_sqmi, name,
funcstat, gcuni, pop100, hu100, sdelm, sdsec, sduni, taz, macc, uacp, stfid
FROM ctgeo
WHERE (sumlev = '040' and geocomp='00') or (sumlev='050') or
(sumlev = '060' and Name Like '% town') or (sumlev = '140' and
county =
'001') or
(sumlev = '150' and county = '001') or (sumlev = '101' and county
= '001');
---
4. I tried to import one data file and delete the unneeded records with a
SQL command, but the process proved to be very inefficient. So I wrote an
Avenue Script that extracts the desirable records from sf1 data files. I
saved the list of good values from the logrecno field to a text file. The
script created one .dat file for each each .uf1 file. In addition, the
script removed the first four fields from the uf1 file. The script caused
ArcView and even the computer to hang, but that was how I knew the script
was working.
---
mypath="d:\gis\gisdata_temp\Census2000\sumfile1\"
file_list={"ct00005","ct00006","ct00007","ct00008","ct00009",
"ct00010","ct00011","ct00012","ct00013","ct00014","ct00015","ct00016","ct000
17","ct00018","ct00019",
"ct00020","ct00021","ct00022","ct00023","ct00024","ct00025","ct00026","ct000
27","ct00028","ct00029",
"ct00030","ct00031","ct00032","ct00033","ct00034","ct00035","ct00036","ct000
37","ct00038","ct00039"}
rec_file=TextFile.Make((mypath+"ctfcgeo_logrecno.txt").AsFileName,#FILE_PERM
_READ)
rec_source=rec_file.Read(rec_file.GetSize)
rec_file.Close
rec_list_string = {}
rec_list_number = {}
rec_list_string=rec_source.AsTokens(nl)
for each rec in rec_list_string
rec_list_number.add(rec.AsNumber)
end
for each file_prefix in file_list
data_file_name=(mypath+file_prefix+".uf1").AsFileName
newdata_file_name=(mypath+file_prefix+".dat").AsFileName
data_file=LineFile.Make(data_file_name,#FILE_PERM_READ)
newdata_file=LineFile.Make(newdata_file_name,#FILE_PERM_WRITE)
for each i in 0..(data_file.GetSize-1)
dline=data_file.ReadElt
if (dline=nil) then continue end
dline=dline.Right(dline.Count-15)
d_recno=dline.Left(7)
d_num=d_recno.AsNumber
results=rec_list_number.FindByValue(d_num)
if (results= -1) then continue end
newdata_file.WriteElt(dline)
end
newdata_file.Close
data_file.Close
end
msgbox.info("Finished Processing Files","Script")
---
5. The following day, I created the SQL statements to create the tables in
Oracle and control files to import the information in the dat files. (These
files were compressed into a zip file and attached to this document.) I
decided to use the same field names as described in the data dictionary. I
sought help from Nanda Srinivasan at FHWA who told me about the sf1combo.xls
file found at
http://mcdc2.missouri.edu/data/sf12000/Tools/. That file
saved hours of work because I was able to create most of my SQL and control
files from it. I still had to explore the data files to see which fields
had decimal places, but fortunately most of the data did not have decimal
places. I also added the number of the datafile to the logrecno field name
to make sure that each table had unique fields (e.g. logreno1 for ct00001,
logrecno2 for ct00002, and so on.) I also used TextPad
(
http://www.textpad.com) to create the SQL and control files because of its
ability to select text as blocks rather than lines. The SQL and control
files worked and I was able to import the data from SF1 into Oracle.
6. Then I had to link the data to TIGER 2000 files in shape file format. I
already had the shape files from
http://www.geographynetwork.com/. I
created some views in Oracle where I joined the data files to the geography
file and used the [logreno] field as the common field. Then, I used
ArcView's Database Access Extension to add the Oracle views to a ArcView
project. Except for the town level, I discovered that I needed to create a
common field to link the Oracle views to the shape files. Thus, I created a
stfid field in the ctgeo table with the SQL statements below.
---
alter table ctgeo add
stfid VARCHAR2(16);
update ctgeo
set stfid = state||county||tract
where sumlev='140';
update ctgeo
set stfid = state||county||tract||blkgrp
where sumlev='150';
update ctgeo
set stfid = state||county||tract||block
where sumlev='101';
---
7. To improve performance, I created indexes on the logrecno* field on each
sf1 table. I also indexed the sumlev field and stfid field in the ctgeo
field.
It took a little bit longer than expected, but I accomplished my goals. I
imported the sf1 data into Oracle and became able to produce thematic maps
in ArcView. Because of this approach, the sf1 data can be used in ArcView,
Microsoft Access, and any other software that can connect to the Oracle
database through an ODBC connection.
<<sf1_oracle_sql.zip>>
--
Daryl Scott
South Western Regional Planning Agency
Stamford Government Center
888 Washington Blvd., 3rd Floor
Stamford, CT 06901
Tel: (203) 316-5190
Fax: (203) 316-4995
E-mail: dscott(a)swrpa.org