Custom Tableau Color Palettes

Tableau comes pre-packaged with many useful color palettes. Tableau also allows you to use a custom sequential or custom diverging palette from withing Tableau. Unfortunately the custom option only allows a two color options. But lets say you see a Tableau packages 3-color-palette you like and you want to make a few adjustments to it and make it lighter or darker.  Here is an example of what I did to the temperature palette, I added 2 lighter lights.

Temp_diveragence

The first think you will need to do is of course take a screen-shot of the tableau palette.  You can then paste that into a paint program and use the eyedropper tool.  I suggest that you download a free tool such as Paint.Net.  That program gives you an eye-dropper tool and gives you the color code in hexadecimal (ie #FFFFFF is white).  Unfortunately Microsoft’s pre-loaded paint program doesn’t give you this info.  It can be converted from the link below though if you don’t want to download Paint.net.

Go to http://colllor.com  (I have no idea how to pronounce that, I can only assume its similar to the many ‘L’s after someone scores a goal in soccer), choose paste in the hexadecimal (or choose it from the selector).  Collor will give you a wide range of similar colors.  I was interested in the shades/tones section right on top.  I wanted to take the default temperature palette that Tableau provides and lighten it up for a bit.   Another option for finding similar colors and hex numbers is this website http://encycolorpedia.com/.   It could be useful in certain circumstances, but it provides far too many options.  So instead the links below will be via colllor.  I used the eyedropper tool and found that Tableau has these 5 colors for that first temperature palette.

http://colllor.com/529985
http://colllor.com/78A062
http://colllor.com/DACE47
http://colllor.com/F3BA4E
http://colllor.com/C26B51

I have gone ahead and picked two sets of lighter Tableau Temperature colors:  “Lighter” and “Lightest”.

Navigate to here:  C:\Users\<username>\Documents\My Tableau Repository\  and edit Preferences.tps in a text editor (make a copy in case).   Between the <preferences> tag add in this XML

<color-palette name=”Temperature_Lighter” type=”ordered-diverging”>
<color>#6DB09D</color>
<color>#91B380</color>
<color>#E1D666</color>
<color>#F6CA79</color>
<color>#CF8C77</color>
</color-palette>
<color-palette name=”Temperature_Lightest” type=”ordered-diverging”>
<color>#A2CDC1</color>
<color>#BDD1B3</color>
<color>#E7DF88</color>
<color>#F8D9A0</color>
<color>#E4BFB4</color>
</color-palette>

This will give you two additional temperature palettes.

US Economic Progress

So I decided to create an Economic index using data from the Federal Reserve.  I decided on these 6 metrics to monitor our economic progress:

1. Civilian Labor Force Participation Rate
2. Compensation of Employees: Wages & Salary Accruals vs. ½ GDP
3. GDP (x2) vs All Total Debts
4. GDP vs Consumer Price Index
5. Current Real Median Household Income in the United States vs Max
6. M2 Velocity: Velocity of Money.

Link to Tableau Workbook: https://public.tableausoftware.com/views/EconomicProgressIndex/ReportDashboard

Full Detailed Explanation:
1. Civilian Labor Force Participation Rate: This is a better measure than unemployment because it captures actual workers vs the rest of the population. Discouraged workers are included in this. Assumed that higher is better.

2. Compensation of Employees: Wages & Salary Accruals vs. ½ GDP: (Private industries): How much of the American Pie is going to private workers? It is related to metric #1 but captures whether workers are being paid better for the work they are doing. Various factors could structurally alter this number. Obviously automation can and will lower this number. However even in the late 90s it peaked to .4.

3. GDP (x2) vs All Total Debts: A board measure of all debts to all income. Gov’t, Student Loan, Mortgage, Credit, Business etc. to 2x GDP. How leveraged are we? The higher this metric the more risk we have. One can take out debt, and improve the other metrics but it exposes the country to much more risk than before. This metric assumes a 2:1 Debt to income level is ideal.

4. GDP vs Consumer Price Index: CPI is a flawed measure, but still somewhat valid. CPI continues to rise, but how does it track in relation to GDP? If the CPI goes up faster than GDP, we are relatively becoming less rich. If level, then we are holding ground. I’d rising slower than GDP then we are getting richer. More is better.

5. Real Median Household Income in the United States vs max: The household is the basic block of America. How is the average household doing relative to the best year? As long as REAL median household income shows continued gains this metric will be better.

6. M2 Velocity: Velocity of Money. Overall this tells how fast money passes from one individual holder to the next. “If the velocity of money is increasing, then transactions are occurring between individuals more frequently.” according to Wikipedia. Hard to define an ideal, but generally assumed that more is better.

Dualing Chartists. Ebola vs Auto Accidents

The first chart comes from an organization called Sightline , which is a Pacific Northwest sustainability blog.  Sustainability roughly translates to ‘don’t use cars’ for the lay person.  As such their POV is against cars and the deaths and pollution they cause.  The author’s point is that we underestimate risk and the coverage of Ebola is way overblown.

Anyways here is there chart

ebola-v-cars-2

Source:  Sightline: http://daily.sightline.org/2014/10/16/ebola-versus-cars/

Now one of the commentators in the above link contributed this chart:

Ebola_v_traffic_Liberia

What I love about this is that neither person is wrong.  They are both properly using bar charts with the axis starting at zero.  The first chart author However the 2nd person is missing the point of the author’s article.  I also love how both as backing up their stories with DATA AND CHARTS!  Properly sourced data and properly used charts to boot!  Now the first chart is showing how we improperly treat risk.  We fear an rare event with a high mortality and completely ignore a very frequent event with a very low mortality.  Regardless though I thought this was a great view of dueling charts.

Algorithm influenced human behavior. The possible dangers of mass customization

A friend of mine clued me into this interesting article:  https://www.yahoo.com/tech/i-liked-everything-i-saw-on-facebook-for-2-days-heres-94435047974.html   You should read the article but the gist is that a person ‘liked’ everything they saw on Facebook and then their feeds quickly went to a place devoid of actual human posts.  The resulting posts were especially polarized and extreme and driven not by regular people.  Mobile went that way quicker than the desktop version, probably due to less screen real estate and the higher importance of mobile ads.  Much of FB’s content is through mobile.  It was an interesting experiment that the author: Mat Honan conducted.  It led me to think and then write this post.

I’ll go on a tangent here about the nature of humor.  I think other human experiences are similar, though their mechanisms will necessarily differ.  Not lets take all the things that a person thinks is funny. Lets call this a person’s ‘Humorspace.’ Now this space is greatly influenced by what you thought was funny in the past. Your sense of humor develops over time.  Situational humor is funny for many groups of people because many people experience the exact same situations and the comedian uses that to craft something funny.  Something is funny (there are many theories to this) because in general your brain rewards you for strengthening existing-but-weak neural connections.  Something is funny because the incident is not common, but not so far away as to have no connection (excluding any absurdists).  Two people will think something is funny when their brains share similar strengths of connections for the topic.  Not lets take all the things that a person thinks is ‘funny’ and lets call this a person’s Humor-space.  Now this space is greatly influenced by what you thought was funny in the past.  Your sense of humor develops over time.  We can say if two people think something is funny then there is an intersection of their humor-space at that point in time.

Humorspace

Now go to Pinterest and type in humor in the search bar (or just click http://www.pinterest.com/all/humor/).  Try it via different IP addresses, or the same IP Address but logged in and not.  Are there differences in the content displayed?  I was able to see completely different information when logged in vs not.  Now what happens if you and I experience different content?  Would not our humor-space drift apart over time?  With such a vast amount of content, it’s improbably that we would have the same content.  Are the algorithms that feed us content divergent, convergent, or simply psuedo-random walks?    Now suppose that we discover a way to live forever, a likely way would be some method of uploading our consciousness to a virtual environment.  We will likely still be able to be fed and consume media content in the immortal age.  After decades and decades, or centuries of time, it will probably be inevitable that our experiences will drift farther from other people.

What would happen to two people after many years.  Would their humor-spaces diverge far from each other so that they are no longer similar. Would we be able to make the other person laugh?

Live-Event-TV

Is Live-Event TV growing?  I’ve heard some chatter off and on about the way that networks are combating streaming TV services, like Netflix, Hulu, and Amazon Prime by offering more live-event TV.  (I use the term Live-Event-TV to include live-TV and simple time-shifted TV viewing like on a DVR).   Think about it for it a minute…  What was the last show you routinely watched and then went into the office/school the next day and said “Hey did you see XYZ last night? Pretty awesome/thrilling/funny huh?”  For me it was some of the earlier seasons of ‘The Office’, which ended in 2013.  It seems to me that networks are driving more content to live-events that cannot be streamed later.  This is a way of converging people towards one set of content and it differs.  It seems to me that many people are interested in this as they sense a drift in their experience space from other people they know.  The Superbowl is still a fun event, even though pro-football doesn’t interest me.  I love it because we have parties and everyone is there sharing the same experience.  Our experience-spaces are becoming more similar and we like it.  Growth of live-event-tv will continue as our media space becomes increasingly balkanized.

I use ‘balkanization’ on purpose.  The definition seems to indicate subdivisions with increasing hostility between them.  The article referenced above indicates that content was driven to hostile and incompatible extremes.

Data Engineering example. Java + SSIS to gather Macroeconomic data from the FED.

The Federal Reserve has a great service for data called FRED (Federal Reserve Economic Data), which is maintained by the St. Louis FED.  It is one of the best sources of economic data about the United States.  One of the datapoints they provide, (other than the national overview) is more detailed slices of data by states and metro areas. The ‘FRED’ also provides any easy was to download the data with .txt link. So if I wanted to see the seasonally adjusted unemployment data in Alaska, then I could just click on this link:  http://research.stlouisfed.org/fred2/data/AKUR.txt

Now if I wanted to gather that information for all 50-ish states (DC and PR may be there) then I ‘could’ click every link and then download every file. That would be a time-consuming endeavor, especially for data sets that may come out monthly.

Since the FRED is very consistent, and we know the data and exist in a simple .txt extension, all we have to do it gather it using code and then load it into the database.  Just by looking at the file, I could see it was <STATE CODE><METRIC>.txt.   Thankfully that was mercifully easy.  I then wrote a data loading program in Java using Eclipse. I ran it twice, once for seasonally adjusted unemployment (SA) data and once for not seasonally adjusted (NSA) data.  Java Code: savefiles.java
This data is saved in a default location which is based on your Eclipse installation, and for me it was saved here: C:\Users\<username>\workspace\savefiles\

Once you execute that java code, then you have to go through them and load them all.  You could manually open up every file and load them, but that also would take a long time.  Some open source methods include Talend and KNIME, BOTH of which have java modules.  If you want to productionize this specific example, you will want to explore those first.  For this first attempt though, I used SQL Server Integration Services (SSIS) to easily load the files since I have it and I’m very familiar with it.   Here are the basic parts of the SSIS package:

  • The File-name variable: We need to know what state and what attribute we’re trying to save. Create one called ‘filename’ and default it to the first file in the directory.
    blog2_Pic1
  • The Container: Add a ForEach Loop Container and add a DFT into it.
    blog2_Pic2
  • Container properties: Click on Collection, ensure the ‘Enumerator’ is set to ‘Foreach File Enumerator’; change the folder to wherever you saved the data previous, change ‘Files’ to *.* (ensure no other files are there)
    blog2_Pic3
  • Click on Variable Mappings and choose the filename system variable you made earlier.
    blog2_Pic4
  • Next create your source and destination connections Your source will be a flat-file connection, and your destination will be your database.
  • Flat File Connection: In this instance, make sure you skip the first 11 rows for the FED data:
    blog2_Pic5
  • Next click on Preview and verify that everything looks okay:
    blog2_Pic6
  • Next go into the The Data Flow Task (DFT). Add in ‘Flat-File Source,’ add in ‘Derived Column’, and OLE DB Destination. Connect the modules like this:
    blog2_Pic7
  • Click on the ‘Derived Column’ module (called Get File Name in the image above) and add in a derived column called ‘File’ and configure it like this:
    blog2_Pic8

Choose a destination table and you will be able to load a bunch of FED data easily. Modify the file names and locations and you can then download & load a variety of state-level data sources fairly easily.

Why did I use Java and SSIS to do all of this? Well I had pulled files from the internet using Java in the past… and I had also used SSIS to load multiple files from a directory. So I just mashed together two easy things I had done before and it didn’t take much time. I knew Java had ways to interface with the internet, and I knew SSIS could loop and connect easily to a DB. Both of these are obvious. Unfortunately SSIS is not open and requires someone to have a SQL Server so this method is pretty restrictive for the part-time data engineer out there. Regardless, I was able to quickly capture 102 text files and load them into a database and build this visualization comparing the Seasonal vs Non-seasonal unemployment rates:

Final:
https://public.tableausoftware.com/views/Unemployment_33/AdjstmentDashboard?:embed=y&:display_count=no

 

Rules and Probabilities for Double Monopoly

Tableau workbook with complete probabilities based on 4 board orientations and 2 8-sided dice with the doubles rule: bit.ly/1qEXfnK

So double monopoly has 2 boards.  It’s twice the fun! Boards a set corner-to-corner.  People move in a figure 8.  Boards can be joined at any corner but to keep consistent, lets try Go to Go (GO2GO), Jail to Jail (J2J), Free Parking to Free Parking (FP2FP), or Goto Jail to Goto Jail (GJ2GJ).  As they will cross that space twice, nothing is diminished from the space’s probability.  If you were to use different corners, one corner will be hit only once.  Click on this:
4_DoubleMonop_boards_Small2

Tableau workbook:

 

A complete Monopoly can consist of the correctly corresponding properties from either reality.  i.e. Park Place and Imperial Palace are a match.  Yodas hut and Farmer Maggots is a match.  If you have Boardwalk and Imperial Palace that’s not a match.  Nor is Hey,Jude; Abbey Road and Ganondorf.  etc.etc. Owning 2 complete monopolies of the same color doubles the value of the rent.  Any unimproved properties quadruples the unimproved price.  Having hotels on Boardwalk and Park-Place, but nothing on Mt Doom and Barad Dur still doubles the rent on BW and PP

In order to speed things up, 8-sided dice are used.  Or three 6 sided ones, if you ensure 2 are the same color so you can still use the 3 doubles-go-to-jail rules.  Also to ensure those hard-to-land on properties are hit, we can modify the free parking rule.  Free parking allows you to move to the next un-purchased property.  After all properties are purchased FP reverts to however you’d usually use it.  Another method of speeding the game up is to shuffle and deal 3-5 properties at the beginning and have the person pay for them at the start.  These should help speed up the game.

Cards will apply only to the board their origin. Go-to cards will then move the player to the same board enabling them to ‘bypass’ the 2nd board.  Go-to-Jail card is the same.Go-to-Jail (G2J) space applies to the board they had just left if the boards are joined at G2J.  Any special rules for that alternate reality board apply to that board only.

$3k is given out.  Due to inflation :), $1s are not used, but now become $1,000s.  Any prices are rounded to nearest 5, Mediterranean Ave must be landed on twice for any pay.  Thus the only change to the starting money is 1 – $1k, and an additional 1-$500 and 1-$5.

Currency options.  It is possible to keep both monies separate and required to pay off debts and purchases in each reality.  So if you need imperial credits, and all you have is ‘love’ then you would have to trade of use the bank as a currency exchanger of last resort.  Since ‘money can’t buy you love’ then you could make it a rule that the bank cannot exchange monopoly money for Beatles ‘love’ bucks.  If you let the bank charge a large fee, say 50%, ($100 monopoly money becomes 50 imperial credits) then other players can act as currency traders and arbitrageurs.  Forcing a very high fee, or ‘money can’t buy you love’ rule add an additional level of screwage that other players can enact on the person.  However the bank will need to use a-previously-agreed-upon exchange rate for bankruptcy proceedings.  Preferably with something like this line:  “Republic credits? Republic credits are no good out here. I need something more real.”

Railroad rents:   Several multiplier options, haven’t decided on which

RR Options
RRs owned x2 Half x1.90
1 25 13           25
2 50 25           48
3 100 50           90
4 200 100         171
5 400 200         326
6 800 400         619
7 1600 800     1,176
8 3200 1600     2,235

Utility rents:

Utilities Owned Multiplier x2.5
1 4
2 10
3 25
4 62.5

With two 8 sided die, a 16 would yield 1000, making the utilities pretty lucrative, if only rarely (1/64 chance with 2 8-sided dice).

 

Tableau workbook with complete probabilities based on 4 board orientations and 2 8-sided dice with the doubles rule: bit.ly/1qEXfnK

Java used build to calc the probabilities.  Should work in a standard Eclipse installation of Java 7: dub.java

Grade Inflation UGA vs GaTech

I have a quick chart here showing grade inflation by schools. It would seem that my Alma Mater experienced less grade inflation than UGA. UGA had a significant bump during the 1990s. Were professors just more liberal with their grades during that time? It could also be that Georgia experienced an increase in higher quality applicants.

Other GA Schools are also available, play with the tableau viz below:

GPA Inflation

Discrete and Continous coloring in Tableau.

This has long bothered me in certain circumstances.  If you have a null value in a continuous pill then Tableau will color that null as if it were a zero.  Under most circumstances this is an okay solution, and I have to credit the programmers; it is better than the alternative.  When in doubt, show something and show it predictably.

So here is the problem I have faced.  Lets say zero is good, 10 is bad, and NULL is indifferent. Tableau will display null as the same color as zero.  Now lets say zero is good, 10 is bad, and NULL is BAD.  Now lets say 10 is good, -10 is bad, and NULL is something else.  When you place data into Tableau it will color nulls as zero.

However, Tableau is awesome and I was able to find an easy solution:

FinishedProduct

The secret is not really hard.  Basically all you need is two calculated fields, and then place  a  “transparent” image in your custom icons.

Formula 1
Float:  FLOAT([Type1])

This will take a number and convert it into a float, if there is a NULL or text it will make them all NULL.

Formula 2&3
NullFill_1: case [float]  when null then ‘NULL’
else ‘FILL’
end

Now just duplicate NullFill_1 as NullFill_2.

Formula 4:
1:1

This “1″ is solely just a place holder, call it anything and set it to 1.   Now take “1″ and place it on the column shelf twice.  Place Nullfill_1 on one shelf and NullFill_2 on another shelf and make 1 transparent, and give another a real shape.  If you need a transparent icon, just use this one:

NA blank trans

Now your Tableau workbook should look like this:

DualAxisThis

Now just dual axis those, adjust some of the legends and you will get this awesomeness!

 

Simple Hex Binning using R

R has of course numerous packages available.  One of the packages is hexbin.  Hexbinning gives the user a way to visualize high-density scatterplots.  There is a way to build it in Tableau without R, but it involves many more calculated fields.  A very simple way is to use hexbin() in R and using a Tableau custom shape.

Now this method relies on using the transparency field to duplicate the ‘look’ of a hexbin, by overlaying the hexagons on top of each other.  You can see that if you highlight one of the hexagons it will say ’5 items selected.’  This method does not give the actual function of grouping things into a hexagonal bin.  Here is the problem with this simple solution.  Tableau likes to receive the exact same number of rows as it sends to R.  Thus it is not possible (yet…) to send 50 rows and receive back 5 rows with a count.  This is essential for a true hexbin implemenation (I am hot on the trail of an idea around this).

Here are the formulas

Hexbin X:
SCRIPT_REAL(‘library(hexbin);hbin<-hexbin(.arg1,.arg2,xbins = .arg3,xbnds = c(-1,1),ybnds = c(-1,1));xys <- hcell2xy(hbin);xys$x’ ,avg(randx),avg(randy),[Bins])

Hexbin Y:
SCRIPT_REAL(‘library(hexbin);hbin<-hexbin(.arg1,.arg2,xbins = .arg3,xbnds = c(-1,1),ybnds = c(-1,1));xys <- hcell2xy(hbin);xys$y’,avg(randx),avg(randy),[Bins])

 (Notice the distinction here xys$x vs xys$y)

Here is what the Script is doing.  Library(hexbin) loads the library.  You may need to install hexbin on your R instance first and here is what the hexbin formula likes to see:  hexbin()

x: The x values
y: The y values
xbins # of bins
xbnds The +/- bound for x
ybnds The +/- bound for y

The next command hcell2xy simply prints the hexagon’s coordinates for each row so that Tableau can then receive it back and then display it.

Possible errors:

  • R error:  “xbnds[1] < xbnds[2] (or ybnds[1] < ybnds[2] )”

This specific error means that the ‘xbnds’ would like to see the lowest bound to the highest bound.

  • Hexagons do not match up and there is a star looking negative space between the ‘bins’. You just need to swap the axis.

But luckily this is a 1 button fix :)  :

 

Oh one more thing, here are two good hexbin files that you can use for custom shapes.  Add these to the \My Tableau Repository\Shapes\My Custom Shapes\ directory.

Hexagon_M_Filled

Hexagon_M_Hollow

Here is the working Tableau Packaged workbook.  Note that due to the R integration, I cannot upload this to Tableau Public.