The Federal Reserve has a great service for data called FRED (Federal Reserve Economic Data), which is maintained by the St. Louis FED. It is one of the best sources of economic data about the United States. One of the datapoints they provide, (other than the national overview) is more detailed slices of data by states and metro areas. The ‘FRED’ also provides any easy was to download the data with .txt link. So if I wanted to see the seasonally adjusted unemployment data in Alaska, then I could just click on this link: http://research.stlouisfed.org/fred2/data/AKUR.txt
Now if I wanted to gather that information for all 50-ish states (DC and PR may be there) then I ‘could’ click every link and then download every file. That would be a time-consuming endeavor, especially for data sets that may come out monthly.
Since the FRED is very consistent, and we know the data and exist in a simple .txt extension, all we have to do it gather it using code and then load it into the database. Just by looking at the file, I could see it was <STATE CODE><METRIC>.txt. Thankfully that was mercifully easy. I then wrote a data loading program in Java using Eclipse. I ran it twice, once for seasonally adjusted unemployment (SA) data and once for not seasonally adjusted (NSA) data. Java Code: savefiles.java
This data is saved in a default location which is based on your Eclipse installation, and for me it was saved here: C:\Users\<username>\workspace\savefiles\
Once you execute that java code, then you have to go through them and load them all. You could manually open up every file and load them, but that also would take a long time. Some open source methods include Talend and KNIME, BOTH of which have java modules. If you want to productionize this specific example, you will want to explore those first. For this first attempt though, I used SQL Server Integration Services (SSIS) to easily load the files since I have it and I’m very familiar with it. Here are the basic parts of the SSIS package:
- The File-name variable: We need to know what state and what attribute we’re trying to save. Create one called ‘filename’ and default it to the first file in the directory.
- The Container: Add a ForEach Loop Container and add a DFT into it.
- Container properties: Click on Collection, ensure the ‘Enumerator’ is set to ‘Foreach File Enumerator’; change the folder to wherever you saved the data previous, change ‘Files’ to *.* (ensure no other files are there)
- Click on Variable Mappings and choose the filename system variable you made earlier.
- Next create your source and destination connections Your source will be a flat-file connection, and your destination will be your database.
- Flat File Connection: In this instance, make sure you skip the first 11 rows for the FED data:
- Next click on Preview and verify that everything looks okay:
- Next go into the The Data Flow Task (DFT). Add in ‘Flat-File Source,’ add in ‘Derived Column’, and OLE DB Destination. Connect the modules like this:
- Click on the ‘Derived Column’ module (called Get File Name in the image above) and add in a derived column called ‘File’ and configure it like this:
Choose a destination table and you will be able to load a bunch of FED data easily. Modify the file names and locations and you can then download & load a variety of state-level data sources fairly easily.
Why did I use Java and SSIS to do all of this? Well I had pulled files from the internet using Java in the past… and I had also used SSIS to load multiple files from a directory. So I just mashed together two easy things I had done before and it didn’t take much time. I knew Java had ways to interface with the internet, and I knew SSIS could loop and connect easily to a DB. Both of these are obvious. Unfortunately SSIS is not open and requires someone to have a SQL Server so this method is pretty restrictive for the part-time data engineer out there. Regardless, I was able to quickly capture 102 text files and load them into a database and build this visualization comparing the Seasonal vs Non-seasonal unemployment rates: