Shannon Conners' Blog

shannon_conners · Oct 2, 2015 06:15 AM

Recently, a customer asked me about how to get a table of 9 digit zip codes with latitude and longitude values that could be imported into JMP to merge with an existing JMP data table that contained 9 digit zip codes. This question was new to me, and I had a suggestion from someone else that perhaps SAS PROC GEOCODE would be helpful. However, I knew the customer wasn't a SAS programmer so I was hoping to find a happy path that didn't include needing to learn how to program in SAS.

I first checked out the JMP Geocoder add-in, but it seemed to only Geocode to the 5 digit zip level. The customer already had those X/Y lat/lon locations but needed something more granular, hence the desire for the 9 digit zips. I next investigated PROC GEOCODE and it turned out that the proc required two tables-your data table and a table with the 9 digit zips linked to latitude and longitude values. But I already had the zip codes I needed-if I could just find the index table, it seemed I wouldn't need the proc to join the tables, although it did offer some fuzzy matching that a straight join wouldn't.

I checked in with SAS tech support and they said that the data table they have that includes the 9 digit zip files is found on a link on this page down under the Zip+4 Geocoding section. You do need a SAS profile ID in order to download the zip file, which you already have if you're reading this and are a member of the JMP community. In any case, it's free to set up a SAS profile. The file is from 2006, so it is out of date, but apparently the current version of the TIGER data doesn’t include zip+4 data. The next section on the page mentions a paid source (Melissa Data) that provides more current info and keeps it updated for a fee.

Once you download the zip file, you can see that it contains a readme file, a CSV file (ZIP4.CSV), and a SAS program. I thought I'd import the CSV straight into JMP at that point, but this is where I hit snag #1: the CSV file lacked column headers! I went back to the SAS program and imported the data using it, and the column headers were added so I could see what they were. However, the program that added column names also appeared to reorder the the columns, so it became a bit of a matching game to determine which columns in the CSV matched the named columns in the SAS table.

Now I went back to JMP to pull in the imported SAS file containing the column headers, and this is where I ran into snag #2. Some of the 5 digit zips started with "00" or "0" and the column was being treated as numeric data when imported from CSV into JMP, so the leading 0's were being dropped from the zips upon import. I checked the Column Properties and there was a Z5 SAS format set on the column, but I talked to Xan about it and he confirmed that JMP didn't currently understand that zip code format.

While I could have fixed this with a conditional formula column after the fact by changing the column to character and adding 0, 1 or 2 zeroes based on the length of each zip, he suggested that the fastest solution might be to re-import the CSV file using JMP's Text Import Preview mode, which would allow me to both change the column type to character and rename the columns I needed before importing. I opened my SAS file and determined the column header names for just the columns I needed and did just that.

To import the ZIP4.CSV file yourself, you will need to cllick File > New Script in JMP, then paste the text below into a script window. Put your cursor at the start of the first line and click the Run Script button on the toolbar. You can hover to find that button with a tooltip, or else right click in the script window and choose Run Script. As noted below, you'll need to point to the correct path on your machine to the file you downloaded and unzipped from the SAS support website linked to above.

Names Default to Here(1);

// Replace the path below with the path to the CSV file saved on your machine

zipdata = Open(

"C:\<Your folder location>\ZIP4.csv",

columns(

Column( "ZIP", Character, "Nominal" ),

Column( "ZIP+4", Character, "Nominal" ),

Column( "c000003", Character, "Nominal" ),

Column( "City", Character, "Nominal" ),

Column( "c000005", Character, "Nominal" ),

Column( "c000006", Character, "Nominal" ),

Column( "c000007", Character, "Nominal" ),

Column( "c000008", Numeric, "Continuous", Format( "Best", 10 ) ),

Column( "c000009", Numeric, "Continuous", Format( "Best", 10 ) ),

Column( "c0000010", Numeric, "Continuous", Format( "Best", 10 ) ),

Column( "X", Numeric, "Continuous", Format( "Best", 10 ) ),

Column( "Y", Numeric, "Continuous", Format( "Best", 10 ) ),

Column( "c0000013", Character, "Nominal" ),

Column( "c0000014", Character, "Nominal" )

),

Import Settings(

End Of Line( CRLF, CR, LF ),

End Of Field( Comma, CSV( 0 ) ),

Strip Quotes( 1 ),

Use Apostrophe as Quotation Mark( 0 ),

Scan Whole File( 1 ),

Treat empty columns as numeric( 0 ),

CompressNumericColumns( 0 ),

CompressCharacterColumns( 0 ),

CompressAllowListCheck( 0 ),

Labels( 0 ),

Column Names Start( 1 ),

Data Starts( 1 ),

Lines To Read( "All" ),

Year Rule( "20xx" )

)

);

// You can select and delete the unnamed columns in the table-they are Census related details that I don't think you really need. If you find that you do, let me know.

As I mentioned above, I didn't go through the matching exercise to get all the correct column names for all columns in this file because I didn't need the census-related attributes, just the zip code, city, and X/Y coordinates corresponding to latitude and longitude. If you have access to SAS, you can run a script to import the file into SAS to see all of the column names and tweak this script to name all the generic c000003, c000004 etc columns correctly, or if you need these and don't have SAS, please add a comment here and I'll go back to complete that work.

After you run the script to generate the table and save it, you can drag the X and Y to the drop zones in graph builder to see the locations of the zip codes. They’re so dense that they largely show up the shape of the country!

The imported table is ~16.6 M rows and 14 columns. My final table was still half a gig when trimmed down to just the 5 columns I needed.

Next, you will need to join this table with your existing JMP table containing 9 digit zips. One important factor is to determine if your zip code format matches the format of the imported table, which has the 5 digit zip code and 4 digit extension in separate columns. If your data table does too, then you should be good to go doing Tables > Join, choosing to join the two tables, By Matching Columns, and choosing the right columns to match up both columns.

However, if you need a column in the format of XXXXX-XXXX because that’s the format your data table has the zip codes in, then you can do the following to the table you just imported.

Right click on the blank column at the right of the table and choose New Columns…
In the dialog that pops up, name the new column ZIP CODE, and under the Column Properties pulldown, choose Formula. This opens JMP’s formula editor.
Click on ZIP in the column chooser under the Table Columns pulldown, then choose Character in the column selection box under the Functions section and choose Concat from the popup list. Type ‘-‘ in the empty box that opens.
Click Character > Concat again, and with the new blank box selected, choose the Zip+4 column from the table column selection box.
You should see a formula like this:

Once your formula looks like the one above, click OK in the formula editor and OK again in the column dialog. That should add a new column to your table called ZIP CODE that contains zips in the format 00601-2103. You can then join with your original data table containing zips in that format.

The first two rows in the CSV file seem to have gotten a bit messed up in the import, but I’m not sure why unless the problems existed in the original CSV file. They're easily fixed.

I hope this information might be helpful to someone else needing 9 digit zip code data. Again, it's not the most recent version of this data that exists, but might be enough for some applications. If you need better fuzzy matching of address or zip code data, and you have access to SAS, you can check out PROC GEOCODE as well.

P.S. I'm certain there are better ways to do this via scripting and SAS integration, but when helping a customer that isn't a scripter and doesn't have access to SAS, I had to find a JMP-only and largely interactive route.

jeff.perkinson