Simple Hex Binning using R

R has of course numerous packages available.  One of the packages is hexbin.  Hexbinning gives the user a way to visualize high-density scatterplots.  There is a way to build it in Tableau without R, but it involves many more calculated fields.  A very simple way is to use hexbin() in R and using a Tableau custom shape.

Now this method relies on using the transparency field to duplicate the ‘look’ of a hexbin, by overlaying the hexagons on top of each other.  You can see that if you highlight one of the hexagons it will say ’5 items selected.’  This method does not give the actual function of grouping things into a hexagonal bin.  Here is the problem with this simple solution.  Tableau likes to receive the exact same number of rows as it sends to R.  Thus it is not possible (yet…) to send 50 rows and receive back 5 rows with a count.  This is essential for a true hexbin implemenation (I am hot on the trail of an idea around this).

Here are the formulas

Hexbin X:
SCRIPT_REAL(‘library(hexbin);hbin<-hexbin(.arg1,.arg2,xbins = .arg3,xbnds = c(-1,1),ybnds = c(-1,1));xys <- hcell2xy(hbin);xys$x’ ,avg(randx),avg(randy),[Bins])

Hexbin Y:
SCRIPT_REAL(‘library(hexbin);hbin<-hexbin(.arg1,.arg2,xbins = .arg3,xbnds = c(-1,1),ybnds = c(-1,1));xys <- hcell2xy(hbin);xys$y’,avg(randx),avg(randy),[Bins])

 (Notice the distinction here xys$x vs xys$y)

Here is what the Script is doing.  Library(hexbin) loads the library.  You may need to install hexbin on your R instance first and here is what the hexbin formula likes to see:  hexbin()

x: The x values
y: The y values
xbins # of bins
xbnds The +/- bound for x
ybnds The +/- bound for y

The next command hcell2xy simply prints the hexagon’s coordinates for each row so that Tableau can then receive it back and then display it.

Possible errors:

  • R error:  “xbnds[1] < xbnds[2] (or ybnds[1] < ybnds[2] )”

This specific error means that the ‘xbnds’ would like to see the lowest bound to the highest bound.

  • Hexagons do not match up and there is a star looking negative space between the ‘bins’. You just need to swap the axis.

But luckily this is a 1 button fix :)  :

 

Oh one more thing, here are two good hexbin files that you can use for custom shapes.  Add these to the \My Tableau Repository\Shapes\My Custom Shapes\ directory.

Hexagon_M_Filled

Hexagon_M_Hollow

Here is the working Tableau Packaged workbook.  Note that due to the R integration, I cannot upload this to Tableau Public.

Using an R function and Tableau 8.1 to map custom areas

The R Project for Statistical Computing has a smörgåsbord of functions available to use.  This is obvious to anyone who has spent 30 minutes with the program.  Tableau is also equally awesome and really excels for displaying maps.  Again this should be manifest for anyone who’s spent 3 minutes with the product.
(Finished Product)
Convex_hull

Steps to Mapping a District using Tableau and R.

Tableau 8.1 now includes R integration.  This is my first major application of R and Tableau.  I had played around with it before, using seasonal decomposition to remove the seasonal swings, but that was mainly for a personal view of the data and not anything to display publically.   This post will outline how to do something very cool by combining Tableau’s easy mapping features AND R’s powerful packages.   (want to skip over all these boring instructions? here is the finished workbook)

Basic setup needed before you attempt anything in Tableau (hold your horses we’ll get there soon):

Step 1:  Install R and R Studio.
Step 2:  Within R Studio, install the Rserve package.

Tools -> Install Packages
blog_pic1

Type Rser and choose Rserve (and type the “ve” if you’re the OCD – intellisense hater type).
blog_pic2
Choose Rserve.
Install

Step 3:  Start up Rserve from the Rstudio Console:

> library(Rserve);
> Rserve()

Step 4: Connect Tableau to your local Rserve and test the connection
blog_pic3

Use localhost and then test the connection:
blog_pic4

BAM! You’re ready to rock with the data rock stars!   \m/ (>.<) \m/
But now onto the hard stuff in Tableau (see that didn’t take much time):

Fields you need:

  • Points you want to map.
  • A Group or Grouping Hierarchy (if need be just create a calculated field Group = “group”)
  • Latitude
  • Longitude

Tableau calculated Fields:

Size: = SIZE()

R-Derived Calculated Fields (In Tableau):

Script_PolyOrder:
SCRIPT_REAL(“X<- matrix(,nrow=.arg1, ncol=2 );X[,1]<-.arg2;X[,2]<-.arg3;”+
“Y <- (ifelse(X[,1] %in% X[chull(X)],1,0));”+
“Z <- matrix(0,ncol = 1,nrow =.arg1);Z[c(chull(X)),1] <- seq_along(chull(X));Z”,[Size],attr([Lat]),attr([Long]) )

2nd Tableau calculated Field:

Script_In_Ext:
case [Script_PolyOrder] when 0.0 then ‘aInterior’
else ‘Exterior’
end

The first field “Script_PolyOrder” simply determines the polygon order of the points.  The second script then determines whether the point is interior or exterior.  I forced ‘Interior’ into ‘aInterior’ to make it come before Exterior.  “Why didn’t your just rename ‘Exterior’ ‘Outside’?”  Okay I realize that now as I write this blog, but I was going for pure function when I wrote this Tableau file.  Below are two additional optional fields, which are useful for experimenting and determining how Tableau sees things.  I will go more into this later.

Script_Lat:
SCRIPT_REAL(“X<- matrix(,nrow=.arg1, ncol=2 );X[,1]<-.arg2;X[,2]<-.arg3;”+
“Y <- (ifelse(X[,1] %in% X[chull(X)],1,0));”+
“Z <- X*Y;X[,1]“,[Size],attr([Lat]),attr([Long]) )

Script_Lng:
SCRIPT_REAL(“X<- matrix(,nrow=.arg1, ncol=2 );X[,1]<-.arg2;X[,2]<-.arg3;”+
“Y <- (ifelse(X[,1] %in% X[chull(X)],1,0));”+
“Z <- X*Y;X[,2]“,[TotalItems],attr([Lat]),attr([Long]) )

Building the Map:

Drag the pills on the map like this:blog_pic5

And dual axis the 2nd Lat pill.  The dual axis part can be done later if you’d likeIt is somewhat nice for clarity to build the different parts separately and THEN see them some together.

Make the first Lat a Polygon.  Drag “Group” onto the detail shelf.  Uncheck Aggregate Measures in the Analysis Menu.  Drag Script_PolyOrder onto path and WHAM you have a polygon!

Now this is the important part for visualization.  Make the 2nd Lat pill the Circle (or shape if you want to) and make sure the first Lat pill is the polygon.  This is important because Tableau places the 2nd pill over the top of the first one.

Shelf Examples:

 BAD shelf / Bad Vis example:

blog_pic6

Good Shelf / Good Viz Example:

blog_pic7

 

Now you have an outline for the “Group” and the interior points for that group do not matter to the polygon.  Thus preventing any sort of jagged border that looks ugly.

Last Step:

Drag the calculate field Script_In_Ext onto the page shelf.  Check “Show History”.  Move the page shelf to the last page “Ext”.  Adjust the “Fade” to fade out the previous.   Now you are finished and it should look amazing.  Keep reading if you want to know the details of how it was achieved:

Alternate strategy:

There is an another option I toyed with: http://community.tableausoftware.com/thread/140023

Basically if you already know your polygon order, and do not wish to have a dynamic polygon drawing, you can ignore using the page shelf and R entirely.  Create a uniqueID which separates interior points from themselves and groups exterior points into the same group.

UniqueID

case [Script_In_Ext] when ‘In’ then “Str”+str(Attr([LowerHierarchy]))
else “D” + str(attr([NextLevelUpHierarchy]))
end

Add this to the shelf in place of your lowest hierarchy.  In essence this calculated field dumps interior points into their own unique 1-point polygon and groups all the exterior points into 1 polygon.  This was the first way I solved it, but looking back this is slightly more complicated AND requires me to know the polygon order.  Adding in the polygon order in Excel is another post but it is possible and maybe preferable in some circumstances.

One benefit of this is by integrating R somewhere earlier in the pipe (Say with Rexcel or KNIME) you don’t need to bog down the visualization server with calculations (and recalculations).  This is a good solution for many cases.

Tableau and R

Tableau wants to see Matrices and vectors from R.  You can only pass 1 column of a matrix back to Tableau.

Lets go over Script_Lat.  In this calculated field, you will pass the latitude and longitude to R, and get R to pass back the Latitude.  This is purely illustrative and not needed to actually function

SCRIPT_REAL(“X<- matrix(,nrow=.arg1, ncol=2 );X[,1]<-.arg2;X[,2]<-.arg3;”+
“Y <- (ifelse(X[,1] %in% X[chull(X)],1,0));”+
“Z <- X*Y;X[,1]“,[Size],attr([Lat]),attr([Long]) )

First function:

X<- matrix(,nrow=.arg1, ncol=2 );

This creates a matrix X with a variable number of rows and 2 columns.  This number of rows is based on the Size calculated field, which is the first argument after the R code in the Tableau

2nd & 3rd Functions:

X[,1]<-.arg2;
X[,2]<-.arg3;

Load Column 1 of matrix X with the 2nd argument, which is latitude, and load longitude into the 2nd column.

4th Function the confusing part:

Y <- (ifelse(X[,1] %in% X[chull(X)],1,0));

Basically save a matrix Y of the results of an if-then statement of points of X that are in the Convex Hull results of X.  I know it’s hard to describe:

http://stackoverflow.com/questions/22072194/basic-r-how-to-populate-a-vector-with-results-from-a-function

http://stackoverflow.com/questions/22096182/r-retain-order-from-a-vector-apply-it-to-another-vector

5th function:

Z <- X*Y;

Save a Matrix Z which is matrix X and vector Y note that this is an “Element-Wise Multiplication” and not true matrix multiplication.  This zeros outs any Interior points leaving only exterior points.  The Tableau calculated field then uses this.  This is a screen grab of Z within RStudio using random data showing zeros for the interior points.
blog_pic8

6th function:

X[,1]

Now return Latitude back.  X[,1] X[,2] or Z[,1] .  This is the last step Tableau expects to see.  Finished Calc’d field in Tableau:
blog_pic9

Possible Errors:

You may get this error (possibly many times :D ) :blog_pic10

Press details.  It could be that you need to start Rserve.  Or you could need re-do the Size field (configuring the size field was difficult, play with Aggregate Measures / use Tableau’s Total fields or generally adjust so you’re sending R the correct # of elements to R). Or you did some sort of error within R.  If the problem is R syntax, I suggest following the trail within RStudio to determine the error.  Use this to generate random points and start debugging:
X <- matrix(stats::rnorm(100), ncol = 2)

 

Personal Traffic patterns

For the past two years I have been using OpenPaths to analyze my personal movement data.  I looked at the data through Tableau and came up with some cool views. I decided to slice the data in new ways by using longitude and latitude binning. This easy Excel map hack gave me ideas and I then created a variable latitude/longitude binning method in Tableau.  One problem of this approach is that the bin sizes are pretty static.  The variable zoom method uses a formula of Round([x]*[Zoom],0)/[Zoom]  instead of just rounding to a magnitude of 10.  So for instance, placing the formula =ROUND(A1*4,0)/4 into excel will round the values to the nearest quarter.  A formula of =ROUND(A1*100,0)/100 will round it to two digits.

I used that binning method to create a series of maps tracking my personal location.  In this first map I have a general record of two paths I took to work.  Each bin is colored by the number of records.  Red is high, green is low.

Atlanta_Metro_numberofRecords2

The next map shows those two routes.  I took the average velocity for that bin and colored it by that.  Lower velocity is in blue while a higher velocity is in red.  What is interesting is that the route to job 1 was much faster, while route 2 had several sections of slowness (bottlenecks)

Routes_to_job

This third map is interesting, showing the effect of a traffic light on velocity.  I only analyzed this location on the afternoon commute (when I traveled north), so you can see a blue coloring leading to the intersection and immediately after, followed by a bright red (fast) bin.  As velocity is calculated using the distance from the previous measurement, and a congested intersection takes a little bit of time to clear out, it makes sense that this would lag the intersection some.

Intersection

Full listing of Obamacare Health Care Plans. Price visualization.

So here is a Tableau visualization of all the Obamacare health plans for states that elected to use the federal exchange (not sure about the states where the Feds and state Gov’t split some responsibility). Here is a breakout map. The values differ by county so I am showing the averages. Full visualization of all the health plans can be found here.

Clicking on any circle, or highlighting a section will bring up those plans below. You can then sort and filter as needed.

Tenure and the important of direction and movement in a graphic.

Here is a tenure change report that I did awhile ago in a data viz class. I wanted to post it because I think the page shelf in Tableau is under-utilized. All the pages are shown at once by using the ‘show history’ option in Tableau. This example here shows how tenure in some positions have changed, but others have not. I have two graphs showing the same thing, but in the chart at the bottom I have turned everything to the right. Here are some tips I’ve stumbled upon:

  • A dotted line seems to show ‘something is changing’ better than a solid. It shows impermanence.
  • The past tends to fade. Fade (adding transparency or whiteness) seems to indicate the past better than a change in hue.
  • Keep the change 1 dimensional. After all our dimension of time is one dimensional (and so far one directional).  Sometimes you can get away with 2 though as Hans Rosling has done in http://www.gapminder.org/
  • Up does not mean the same as Down, Left, or Right.

In this chart, I have deliberately chosen Up rather than right.

2nd chart: Notice how everything looks ‘faster’

Although the 1st graph has a wider professor category area, the Number of Positions axis on both are relatively close in size. (Please note that you need to measure to 160k, I forced it to 180k on the 2nd graph to get the label in). However the small changes in perspective change this graph completely.

Top paid Georgia University Employees

Recently I read an article from the AJC showing the top paid GA State employees. Their article can be found here.  (note that at the time of this publishing their website does not link to the article).  I have made a dashboard showing who gets paid the most by university.  Travel and Salary expenses are included. First of all,this excludes the coaches, who get paid a large amount.  Secondly if the person started the position half-way through the year then their pay is only 1/2 what it should have been (i.e. Savannah State’s President). Third thing to know is that there was no Part-Time or Full-Time designation in the data I received. As such you will have to decide on an arbitrary ‘part-time cut off’ using the slider at the top. I have decided on a $10,000 level. You can put in $15k if you wish. It’s really not certain what part-time is. This could make a huge difference.

The ‘ratio’ column is the ratio between the highest-paid person and the average pay for the university (again that average pay is dependent on the slider).

The graph shows the university’s average pay, median pay (both based on the filter), max pay, and the president’s pay.  The person’s name to the right is the highest paid person for that university,  If you hover over the colored box next to their name you will see their position.

All the data came from OpenGeorgia.gov. It includes Travel and Salary. One special thing to note is that apparently some people may be paid by the university foundations and that pay may not be included in this dashboard.

Full view is here if the ads get in the way

NSA no vote and defense spending, a simple analysis

This article in Wired seems to imply that there is a correlation between defense/intelligence industry contributions and whether the Congressman voted to continue NSA funding.  Lets be clear, the amendment had a snowball’s chance in a Georgia July (although what the heck GA this summer has been very mild!) of passing through the Senate and Obama’s veto, but Wired had some data section detailing the 2 year contributions and the result of the vote.

I first took the data, sorted it by the contributions from largest to smallest, and gave a -1 for a ‘No’ vote a 0 for a ‘not voting’ and a 1 for a ‘Yes’ vote.  A “no” vote is a vote to continue the NSA’s phone spying. I then did a running total and produced this graphic in Excel:
DefenseSpendingNSAInflectionpoint

Just eyeballing it, it seems to me that down to ~$23k in contributions (I think over 2 years) a congressman was more likely to vote to continue funding domestic surveillance and around $18k and lower a congressman was much less likely to vote for it, but not a guaranteed no.  There were still some yes votes in the low-contribution range.  Note that the axis is a discrete listing of all contributions and note a range.

I produced another quick chart to breaking each congressman into deciles based on campaign contributions.
DefenseSpendingNSA_stackedBar

It would seem to me that although contributions appear to be a deciding factor for voting yes in some of the higher deciles, getting fewer dollars in contributions was by no means a contributing factor in voting yes or no for the bulk of the congressmen.  (Hey maybe those guys getting 0 dollars really want it, so maybe that’s why they voted yes).  Also note that these $ amounts don’t take into effect the total contributions given to a candidate.  It is likely that percentage matters and I didn’t have time to collect all the data.

Zero Hedge June Retail Sales graph. Opposing views or how to add more context to a graphic

First of all, sorry for the delay in posting.  Life has been very different in these last 6 months.  I have a backlog of ideas I need to work through.  Anyways a few weeks ago I read this article from Zero Hedge titled “Spot The Grotesque Retail Sales Seasonal Adjustment Outlier” and it seemed to imply that the Government has grossly over/under adjusted seasonal retail numbers for June. While I admit that the original article does seem to make a good point with the outlier, I wanted to provide more context than their original chart.

For reference here is their original chart:
June Retail Sales Seasonal Adjustment_0

I have provided some other charts using Tableau to show how this June adjustment is not very remarkable (although the sign change from – to + is interesting).

In this first graph I am showing all the adjustments by month for all the past few years.  As you can see June (in red) is hardly as dramatic of an outlier compared to other months:
JuneVsEverything

In this second chart I have plotted the log of the difference in adjustments and I have labeled the min and maximum years.  (Without the log of the difference 1992 seemed to show up more).
AllMonthsAndYear_Log2

So here is the entire view.  You can see that other months ‘straddle’ the zero seasonal adjustment axis, and June ‘was always close’ to the zero line, so it is ‘conceivable’ that there is nothing wrong with June 2013′s adjustment.  However even at this larger view, June 2013 does seem to stands out!  Is it a large deviation?  No not really considering the whole picture.  This is either a legitimate change in American retail sales for June (is Memorial day becoming a larger retail event than in the past?) or a one-time adjustment (June 2013 weather event?).  I have no opinion, but I wanted to present a different picture.