This e-mail is a follow-up to the another posted a while
ago. Daryl Scott from South Western Regional Planning Agency wrote to me a
note on how he converted SF1 data into an Oracle database, and subsequently
related it to Arcview. Attached is his note.
SF1 has a number of tables. Oracle and other databases
are especially capable of handling huge tables and files. Darryl's note
will be useful to anyone interested in developing extensive Census 2000
databases for their planning region.
Thanks
Nanda Srinivasan
**********************************************
INCOMING from Darryl Scott:
Here is my experience with importing the data into Oracle.
My goal was to load all the data for the State of Connecticut, all
Counties
in Connecticut, all Towns in Connecticut, and all tracts, block
groups, and
blocks in Fairfield County from Census 2000 Summary File 1 into
Oracle and
link it to GIS. Below are the steps I took to accomplish that
task. I do
not describe the failed attempts in detail.
1. I downloaded the data in ASCII format for Connecticut from the Census
Web
Site and extracted all the files.
2. I imported the ctgeo file into
Oracle.
3. Then I created a view that contained all the records that I wanted
to
import into Oracle.
---
create view ctfcgeo as
SELECT sumlev, geocomp, logrecno, state, county, cousub, cousubcc,
place,
placecc, placedc, tract, blkgrp, block, msacmsa, cmsa, macci,
pmsa,
necma, necmacci, ua, uatype, ur, sldu, sldl, vtd, vtdi, zcta3,
zcta5,
arealand, areawatr, arealand/2590000 as area_sqmi, name,
funcstat,
gcuni, pop100, hu100, sdelm, sdsec, sduni, taz, macc, uacp, stfid
FROM
ctgeo
WHERE (sumlev = '040' and geocomp='00') or (sumlev='050') or
(sumlev = '060' and Name Like '% town') or (sumlev = '140' and county
=
'001') or
(sumlev = '150' and county = '001') or (sumlev = '101' and
county = '001');
---
4. I tried to import one data file and delete the unneeded records with
a
SQL command, but the process proved to be very inefficient. So I
wrote an
Avenue Script that extracts the desirable records from sf1 data
files. I
saved the list of good values from the logrecno field to a
text file. The
script created one .dat file for each each .uf1
file. In addition, the
script removed the first four fields from the
uf1 file. The script caused
ArcView and even the computer to hang, but
that was how I knew the script
was working.
---
mypath="d:\gis\gisdata_temp\Census2000\sumfile1\"
file_list={"ct00005","ct00006","ct00007","ct00008","ct00009",
"ct00010","ct00011","ct00012","ct00013","ct00014","ct00015","ct00016","ct000
17","ct00018","ct00019",
"ct00020","ct00021","ct00022","ct00023","ct00024","ct00025","ct00026","ct000
27","ct00028","ct00029",
"ct00030","ct00031","ct00032","ct00033","ct00034","ct00035","ct00036","ct000
37","ct00038","ct00039"}
rec_file=TextFile.Make((mypath+"ctfcgeo_logrecno.txt").AsFileName,#FILE_PERM
_READ)
rec_source=rec_file.Read(rec_file.GetSize)
rec_file.Close
rec_list_string = {}
rec_list_number = {}
rec_list_string=rec_source.AsTokens(nl)
for each rec in
rec_list_string
rec_list_number.add(rec.AsNumber)
end
for each file_prefix in file_list
data_file_name=(mypath+file_prefix+".uf1").AsFileName
newdata_file_name=(mypath+file_prefix+".dat").AsFileName
data_file=LineFile.Make(data_file_name,#FILE_PERM_READ)
newdata_file=LineFile.Make(newdata_file_name,#FILE_PERM_WRITE)
for each i in 0..(data_file.GetSize-1)
dline=data_file.ReadElt
if (dline=nil) then continue
end
dline=dline.Right(dline.Count-15)
d_recno=dline.Left(7)
d_num=d_recno.AsNumber
results=rec_list_number.FindByValue(d_num)
if (results=
-1) then continue end
newdata_file.WriteElt(dline)
end
newdata_file.Close
data_file.Close
end
msgbox.info("Finished Processing Files","Script")
---
5. The following day, I created the SQL statements to create the tables
in
Oracle and control files to import the information in the dat files.
(These
files were compressed into a zip file and attached to this
document.) I
decided to use the same field names as described in the
data dictionary. I
sought help from Nanda Srinivasan at FHWA who told
me about the sf1combo.xls
file found at
http://mcdc2.missouri.edu/data/sf12000/Tools/.
That file
saved hours of work because I was able to create most of my SQL and
control
files from it. I still had to explore the data files to see
which fields
had decimal places, but fortunately most of the data did not
have decimal
places. I also added the number of the datafile to the
logrecno field name
to make sure that each table had unique fields (e.g.
logreno1 for ct00001,
logrecno2 for ct00002, and so on.) I also used
TextPad
(
http://www.textpad.com) to
create the SQL and control files because of its
ability to select text as
blocks rather than lines. The SQL and control
files worked and I was
able to import the data from SF1 into Oracle.
6. Then I had to link the data to TIGER 2000 files in shape file
format. I
already had the shape files from
http://www.geographynetwork.com/.
I
created some views in Oracle where I joined the data files to the
geography
file and used the [logreno] field as the common field. Then,
I used
ArcView's Database Access Extension to add the Oracle views to a
ArcView
project. Except for the town level, I discovered that I needed
to create a
common field to link the Oracle views to the shape files.
Thus, I created a
stfid field in the ctgeo table with the SQL statements
below.
---
alter table ctgeo add
stfid VARCHAR2(16);
update ctgeo
set stfid = state||county||tract
where
sumlev='140';
update ctgeo
set stfid = state||county||tract||blkgrp
where
sumlev='150';
update ctgeo
set stfid = state||county||tract||block
where
sumlev='101';
---
7. To improve performance, I created indexes on the logrecno* field on
each
sf1 table. I also indexed the sumlev field and stfid field in the
ctgeo
field.
It took a little bit longer than expected, but I accomplished my
goals. I
imported the sf1 data into Oracle and became able to produce
thematic maps
in ArcView. Because of this approach, the sf1 data can be
used in ArcView,
Microsoft Access, and any other software that can connect to
the Oracle
database through an ODBC connection.
<<sf1_oracle_sql.zip>>
--
Daryl Scott
South Western Regional Planning Agency
Stamford
Government Center
888 Washington Blvd., 3rd Floor
Stamford, CT
06901
Tel: (203) 316-5190
Fax: (203) 316-4995
E-mail:
dscott@swrpa.org