Grabbing tons of data via KNIME loop. US Traffic fatalities. Part 1

Andy Cotgreave did an amazing Tableau workbook about Seasonality in US road fatalities and it is located here:
https://public.tableausoftware.com/profile/digitalteam#!/vizhome/Seasonality-AndyCotgreave/SeasonalityinUSRoadFatalities

It is a cool wonderful vizualization, that’s obvious as he’s a Zen Master.  In addition, he went into detail about his design choices.  But how did he get all that data (or rather how can you get it)?  Well the US has several great government resources for various pieces of data.  The National Highway Traffic Safety Administration (NHTSA) has a reporting system called FARS which contains a wealth of data.  Now FARS which is short for exactly what it is: Fatality Analysis Reporting System.  I would like to personally call out this administration for having an extraordinarily clear name for something.  I mean even if you Googled something approximate to those words you would probably find it.

Here is an easy KNIME loop to download the data:
KNIME_ftp_loops

KNIME came out with some new features, several of these were related to grabbing files.  No longer do you have to program it in Java or Python.  KNIME now offers an easy way to do it.  What I like the most about KNIME is it is essentially self-documenting.  This data flow clearly shows me what is happening, plus allowing for some annotation.

The settings of the nodes used:

Node62:  The FTP Connection is simply “ftp.nhtsa.dot.gov” the port is the default FTP port of 21.  Make sure ‘none’ is on Authentication.  Then test your connection and BAZINGA:
KNIME_FTP_success

Node 63:  Change directory to /FARS/

Node 64:  This extracts the Uniform resource identifier (URI) information.  Just make sure “Path” is checked.

Node 65:  This String Manipulation module adds in string($Path$+”/DBF/”), this is just going to append a 2nd subdirectory, that thankfully is consistent throughout all files.

Node 67:  No changes.

Node 61:  Download.  Show the variable ports, and take the directory you constructed with the above nodes and feed it into the source file or folder.  Change your target folder to wherever you want the data to be downloaded.  THIS IS VERY IMPORTANT.  Do not download all files, as it is a waste to grab things you don’t need.  Make sure you check “Only download files that match” and then choose F*.zip and a wildcard.  This will limit the number of downloaded files.  Also, if you are only interested in a subset of the years then add a  2nd string manupulation, use “substr($Path$,6 , 4)” and a row filter node to remove unneeded folders from the list.

Node 68:  No changes.

“1 Row Test Case”  This node can replace node 67 if you don’t want a loop and just want to download 1 at a time.

Vanpoolin’ Using Tableau’s table-calcs to visualize the change in commuting miles driven.

Rolling with the homies to work. How my vehicle miles traveled (VMT)  have changed.

Impatient? download the workbook.

Here you can see how my commuting patterns have changed over the past few years.   I have owned a 1998 Volvo S70, a Ford Focus, and a Chevy Equinox.  The Volvo was driven exclusively by me, and the Focus/Equinox have been a mix of me and my wife.  For the past several years I have been recording my mileage in a program called Vehical.  It is available for the iPhone.  Unfortunately it only records 1 car, but since my cars were all a very different miles, its was easy to distinguish over the years.  This brings me to the point of this visualization, which is adjusting Tableau Table calcs to display disparate quantities in a comprehensible way.

Problem:  A simple graphing of the data shows little information because all odometers and dates are very different.  Trend line slopes and VMT are not comparable.
Page1_car_border

By using table calcs, we can force all lines into the same range so that they can be compared easily.  Now we can see some interesting things.  First off, although my wife and I have traded off driving the Equinox and Focus, for the entirety of my vanpooling I was driving the Focus (orange).  You can see a significant departure in cumulative mileage of the focus from the other two cars.  In addition, although the Equinox is no longer used to commuting, we put more road-trip miles on it and those are far more visible than the above graph.

Page0_border

In addition there are other questions that remain unanswered.  For instance, how am I doing compared to the US average.  Although diagonal reference lines are very-much possible, I will choose another approach and create a rate table calculation.  I used nearly the same formula from above, but used “-1″ to tell Tableau to make the calculation relative to the previous value instead of “first().”  More tends become visible and road trips become far more obvious.

Page4_car_border

Sweet!  Now as you can see some of the trends are way more obvious.  Road trips are obvious and a job change during the Volvo days starts to become more visible.  In additiona, I added a reference line at the 40 Mi/Day mark.  This represents the current VMT that I found from the Federal Reserve.  If you’re interested in US economic data, then the FRED is it.  They also have a nice Excel add-in, and their data is also in .txt files directly linkable via the web.

US 12 month rolling VMT Per working-age population:

Now we will add a moving average table calc.  This mellows out the data and you can see how the trends have moved over time.  Tada!  I’m driving much less than average.

Page0_car_border_movingAvg

 

Check out the full workbook.

Custom Tableau Color Palettes

Tableau comes pre-packaged with many useful color palettes. Tableau also allows you to use a custom sequential or custom diverging palette from withing Tableau. Unfortunately the custom option only allows a two color options. But lets say you see a Tableau packages 3-color-palette you like and you want to make a few adjustments to it and make it lighter or darker.  Here is an example of what I did to the temperature palette, I added 2 lighter lights.

Temp_diveragence

The first think you will need to do is of course take a screen-shot of the tableau palette.  You can then paste that into a paint program and use the eyedropper tool.  I suggest that you download a free tool such as Paint.Net.  That program gives you an eye-dropper tool and gives you the color code in hexadecimal (ie #FFFFFF is white).  Unfortunately Microsoft’s pre-loaded paint program doesn’t give you this info.  It can be converted from the link below though if you don’t want to download Paint.net.

Go to http://colllor.com  (I have no idea how to pronounce that, I can only assume its similar to the many ‘L’s after someone scores a goal in soccer), choose paste in the hexadecimal (or choose it from the selector).  Collor will give you a wide range of similar colors.  I was interested in the shades/tones section right on top.  I wanted to take the default temperature palette that Tableau provides and lighten it up for a bit.   Another option for finding similar colors and hex numbers is this website http://encycolorpedia.com/.   It could be useful in certain circumstances, but it provides far too many options.  So instead the links below will be via colllor.  I used the eyedropper tool and found that Tableau has these 5 colors for that first temperature palette.

http://colllor.com/529985
http://colllor.com/78A062
http://colllor.com/DACE47
http://colllor.com/F3BA4E
http://colllor.com/C26B51

I have gone ahead and picked two sets of lighter Tableau Temperature colors:  “Lighter” and “Lightest”.

Navigate to here:  C:\Users\<username>\Documents\My Tableau Repository\  and edit Preferences.tps in a text editor (make a copy in case).   Between the <preferences> tag add in this XML

<color-palette name=”Temperature_Lighter” type=”ordered-diverging”>
<color>#6DB09D</color>
<color>#91B380</color>
<color>#E1D666</color>
<color>#F6CA79</color>
<color>#CF8C77</color>
</color-palette>
<color-palette name=”Temperature_Lightest” type=”ordered-diverging”>
<color>#A2CDC1</color>
<color>#BDD1B3</color>
<color>#E7DF88</color>
<color>#F8D9A0</color>
<color>#E4BFB4</color>
</color-palette>

This will give you two additional temperature palettes.

 

UPDATE:

Tableau Zen Master Joe Mako gave me a incredibly simple tip.  This is exactly why the Tableau Zen masters are recognized by Tableau.  They are actively looking in the community and offering their advice.  Joe advised taking the current temperature palette, then clicking on one of the colors, making no changes to the values, and then pressing ‘add to custom palette.’ This changes the xml in the book and reveals the exact colors Tableau uses for their continuous palettes.  I’m not going to pretend that I regularly look over my Tableau books’ xml but it was fairly easy to locate the palette.  Name your sheet something you will remember then search for that name within the XML.  For instance I named mine ‘CustomDiverg’ and found this line immediately:

<worksheet name=’CustomDiverg’>

under the style header, you should be able to find this line, and the 5 colors under it:

<color-palette custom=’true’ name=” type=’ordered-diverging’>

                <color>#529985</color>
                <color>#79a163</color>
                <color>#dbcf47</color>
                <color>#ce974c</color>
                <color>#c26b51</color>

Tableau_custom_divg_Capture

Here is an animated example courtesy of Joe:

http://ericksondata.com/wp/wp-content/uploads/2014/11/JoeMako_Example.gif

 

US Economic Progress

So I decided to create an Economic index using data from the Federal Reserve.  I decided on these 6 metrics to monitor our economic progress:

1. Civilian Labor Force Participation Rate
2. Compensation of Employees: Wages & Salary Accruals vs. ½ GDP
3. GDP (x2) vs All Total Debts
4. GDP vs Consumer Price Index
5. Current Real Median Household Income in the United States vs Max
6. M2 Velocity: Velocity of Money.

Link to Tableau Workbook: https://public.tableausoftware.com/views/EconomicProgressIndex/ReportDashboard

Full Detailed Explanation:
1. Civilian Labor Force Participation Rate: This is a better measure than unemployment because it captures actual workers vs the rest of the population. Discouraged workers are included in this. Assumed that higher is better.

2. Compensation of Employees: Wages & Salary Accruals vs. ½ GDP: (Private industries): How much of the American Pie is going to private workers? It is related to metric #1 but captures whether workers are being paid better for the work they are doing. Various factors could structurally alter this number. Obviously automation can and will lower this number. However even in the late 90s it peaked to .4.

3. GDP (x2) vs All Total Debts: A board measure of all debts to all income. Gov’t, Student Loan, Mortgage, Credit, Business etc. to 2x GDP. How leveraged are we? The higher this metric the more risk we have. One can take out debt, and improve the other metrics but it exposes the country to much more risk than before. This metric assumes a 2:1 Debt to income level is ideal.

4. GDP vs Consumer Price Index: CPI is a flawed measure, but still somewhat valid. CPI continues to rise, but how does it track in relation to GDP? If the CPI goes up faster than GDP, we are relatively becoming less rich. If level, then we are holding ground. I’d rising slower than GDP then we are getting richer. More is better.

5. Real Median Household Income in the United States vs max: The household is the basic block of America. How is the average household doing relative to the best year? As long as REAL median household income shows continued gains this metric will be better.

6. M2 Velocity: Velocity of Money. Overall this tells how fast money passes from one individual holder to the next. “If the velocity of money is increasing, then transactions are occurring between individuals more frequently.” according to Wikipedia. Hard to define an ideal, but generally assumed that more is better.