Loads of Soccer Data

In case you missed it and are interested in Sports Analytics, Manchester City Football Club is releasing their data to the public. You must sign up for it and wait a few days (scroll to the bottom of the main analytics page for the sign-up). I got the data late last week and I looked over a few things and was quite impressed. First of all, the ‘lite’ data is 10k+ rows and 200+ columns. To me that is quite a large sample that we can probably gather some real confidence in.

Give it a try.

Posted in Uncategorized | Tagged | Comments Off

Slice by Juice Analytics

I have been very busy making dashboards for work. I am currently using Microsoft Performance Point, and it leaves much to be desired in the formatting area.  Since I am not a graphic designer, nor a web developer, I am a bit hamstrung when it comes to delivering the ‘last mile’ ah-ha moment in SharePoint.  Believe me, what we have now using SharePoint is a million times better than having no visibility. I have several good looking reports, but there are just a few tweaks I would like to do.  Anyways I was recently exploring Juice Analytics’ Slice product and since their Atlanta offices are right near mine I was able to visit their office and they nicely gave me the ability to demo their software. I have used some soccer data that I collected with @gregesque and made a fun dashboard located here (this dashboard displays quite nicely in a vertical portrait format so this might blow your minds, but pic your monitor up and turn it on end and change the settings to portrait mode).

It brings me up to another subject.  What is the best layout is best for a dashboard?  Square, portrait, landscape?  I really enjoyed working with my slice dashboard on a portrait mode 1050×1680.  Obviously I can’t require my users to change their screens to portrait mode to view my dashboards, but for any public interactive dashboards, portrait mode seems awesome.

(Note that this tree map is built off of selected players and doesn’t encompass all the goals scored that season)

Posted in Uncategorized | Tagged , , | Comments Off

The effect of the Sarbanes–Oxley Act on Small IPOs

Last week the released their Facebook IPO, which was by far the biggest IPO in history and the New York Times does a fantastic job of providing an interactive graphic of US IPOs.  Page 2 shows, in a very effective manner, how abnormal the IPO was:

Now the scale here was chosen to show the unique character of Facebook’s IPO, but the chart on the next page show the same information with a log-scale.  I have also added some annotations to that chart.

In the type of data I typically have contact with, (business data) I usually look at things in both a normal scale and a log-scale because the natural distribution of many things follows the Pareto Principle.  By simply including the Log-Scale as one of the options the NYT chart, they have correctly enabled the users to accurately view massive and small things very easily.  This Log-Scale chart (on page 3) really jumped out at me and made me ask a question:  “Why are small IPOs virtually non-existent after 2001?” Before 2001 there was a forest of small little dots on this chart  except during the recessions.

Going through school around that time I was probably much more politically astute than my age would indicate (and taking two accounting classes soon afterwards really helped) I learned about about the Sarbanes–Oxley Act and immediately saw it’s effect after looking at the chart.

Everyone knows that regulations cost ‘something.’ Even if it only costs time to comply with regulations then that time which was not spent doing something else.  These costs may be offset of course by greater social benefits and the public good, but what is the net effect of a regulation on people’s behavior?  Likely that increase in costs affects those least able to pay it, which are smaller businesses or poorer people.  One knee-jerk law affecting business was the Sarbanes–Oxley Act of 2002.  There has been another recent knee-jerk law (both signed by the ‘business friendly’ Bush), the Consumer Product Safety Improvement Act of 2008. This law required required lead testing on all kid items, even if they never had a lead problem before (children’s books), or inherently never contain lead (wool).  It was passed after a series of tainted product came from China, and applies to US and European-made products, even though those areas never supplied ‘leaded’ toys.  Many criticisms can be found on the law.

Sadly, I fear that the data on small businesses in the toy manufacturing industry will be very hard to come by and the same chart as I have shown above will never be created.

Original chart Via Flowing Data

Posted in Data Journalism, Jobs | Tagged , | Comments Off

Guide to Getting a Job: for the 1 in 2 new graduates that are jobless or underemployed

First off, sorry I have been so busy.  I have been spending a bulk of the last few months doing work in three four major areas.

1)  Processing:  This is more fun than anything else I’ve been doing.  I love Processing!

2) Microsoft Project / SharePoint / PerformancePoint:  I have spent a bulk of time at work getting an implementation of Project and SharePoint going and then getting the dashboards built.  Truly awesome.  We have basically transformed the company’s process from one that largely existed in people’s heads, and some spreadsheets on a shared drive, into something that is generating data that can be observed, reported, and visualized in real time.  Those programs are not the best out there, but they are a starting point to turn any business into something measurable.

3) PintsOfData:  A Data Driven product!

4) Talend / SQL Server Agent Jobs:  Learning how to move large quantities of data automatically.  It also helps that both Processing and Talend use Java.

Now I mention those four points because last week this article found it’s way under my eyes: “1 in 2 new graduates are jobless or underemployed“  I can’t understand that!  I’ve been so busy on many fronts of the data business and can’t understand why kids with half a brain and a basic understanding of computers cannot find a job!  I have literally been doing everything along the data science pyramid and I really would like to concentrate more on the upper layers.

So I have decided to dictate my strange travel through College to give current college students a guide to getting a job instantly when they graduate.

I got out of high school not knowing what I wanted to do.  I liked both computers and art, but I did not know what to do with that.  I went to a local university in Savannah, Georgia for two years and then transferred the Georgia Tech.  Here are the courses that best helped me get into my field with links to syllabi that roughly coincide with what I learned from the schools I took them:

Armstrong Atlantic State Univ:  Computer Science, Sculpture, and Physics
Compsci:  CSCI 1301, 1302, 2401.  I learned Java here and I think these were only two classes when I took them.  I took both with Stephen Jodis:  I got a B in the first class and I think a C or D in the second.  He was really hard!
Physics: PHYS 2211 Principles of Physics I and PHYS 2212 II, very useful to making cool stuff in Processing.
Art:  ARTS 3300 Ceramics,  ARTS 3330 Sculpture.  I can find neither syllabus but a description can be found here in the full pdf of the course catalog.  I took them both with John Jensen, Professor of Art. I (eventually) created beautiful things out of base material.  This is needed in the Data Biz so your spreadsheets don’t look like total crap.  My first ones did.  They were ugly garbage.  People judge you by how your spreadsheets look.  Good design is good communication!

Georgia Tech:  Math and Business.
Math:  Intro to Graph Theory 4022 (dropped it but very interesting), Applied Combinatorics 3012 (loved it), Numerical Analysis 4640 (Loved it! It is the science of estimating), Statistics I and II.

I ultimately got my degree in Business, and the things I learned that got me a job in the data business was naturally in the IT Management field.  Now  I honestly can’t remember which class I took, but something on this list (course details) introduced me to Excel and Access.  These two things got me an interview.
MGT 4058-Database Management – This class taught me basic SQL, fundamentals of a good database, keys, indexes, etc.  All these things were utilized soon after getting my first job.  This is your entry ticket to getting and keeping a job in the data industry.

So to Summarize I’d say that if you’re in college and have at least two semester left I would say try these 8 basic classes: 

  • Intro Computer Science I and II.
  • Math.  One Statistics class, either basic (Business school) or advanced (Math).  One logic / proof class (NOTs, ANDs/Intersections, ORs/Unions,  etc.etc.)
  • Business IT Database class (for SQL and such) and a Business computer class (to speak the lingo on that job interview: “Yes I know Excel very well… Vlookups, sure I can do them”)
  • Two Arts, so your first work project doesn’t look so ugly.  Color theory is recommended.

Notes:  I completely ignore Calculus I,II,III and Diff Eq.  The things learned have not really been ‘useful’ per se.  They have had a large impact on changing my mind to become a better problem solver, a visualizer, and an error checker.  These skills VERY useful in my profession, but I assume that there are many other ways to get these skills.  I have also ignored English.  As you can tell from reading this blog I kind of suck at it.  I probably learned more in my high-school English classes than I ever did in college.  That’s not the college’s fault, I just never further pursued it.

Accounting:  I’m sorry, but the intro classes shouldn’t be a college course.  Our world should be teaching this stuff in middle school.  It is that fundamental to civilization.  My accounting Prof was great and I hold her in high esteem.
Economics:  Supply and Demand equals price.  Again it should probably be introduced in elementary school and expounded more in middles school.  Kids need to realize that if I get a degree everyone else gets in an industry with little demand, then they will probably end up as a barrista like the people in that article.  Read Mises.org

Other Essentials:  Video Games!
Games that are slow and complicated:  Civilization or SimCity series.  Complicated rules, menus, controls, and options and restrictions.  This is life in business.
Games that are fast and accurate:  Shooter games.  The faster you move at work the more you will get done, but it must be done right.  Also no one likes to play with rude loudmouths who spawncamp.

Posted in Uncategorized | Comments Off

Government employment during two recessions.

I was reading this article from a Savannah local and it in turn pointed me to this Calculate risk blog.  I checked the data and found some weirdness.  I’m not going to say that CR is cherry-picking data, or being careless, because we are using different data.  I have taken data from the FRED and I am using ‘Job Openings’ as a proxy because I could not find actual employment levels for state and regional governments from their website and I think it will serve as a useful enough proxy.  Obviously proxy data is not the best, but it would seem that neither president has increase federal employment levels.

I ask my readers to look at CR’s blog and then review these three graphs:

Public Sector jobs since Obama came in office:

Public Sector jobs since Bush and Obama came into office:

Public Sector jobs for the Bush administration:

The problem is, that to me it doesn’t look like Bush expanded the public sector workforce as much as the Calculated Risk blog seems to indicate.  I will say that I have heard that certain laws passed in Congress have increase the money sent to the States that they then used to increase their employment, so Bush could have indirectly tweaked state numbers in some way.

In conclusion though, I will say that it is hard to use “Public Employment” as a proxy for how good/bad a president is doing in providing public/private jobs due to the large difference in Federal and State workforces.  Plus each party has their own ‘good’ classes of public employment, like the military or teachers.  We would have to break this government figure into multiple types of government employment.  I will do a shout out to Political Math for their work on BLS data.  If I took either graph and made a baseline per capita it would look like both presidents hated the public workforce.  Look at this population graph:

Has either Local or Federal levels kept up?  To add a few more ideas into your brain, check this graph over several administrations:

Posted in Data Journalism, Jobs | Comments Off

Poor Atlantans have experienced a drastic drop in home values.

I was examining property prices in Atlanta and came up with this basic chart showing how Atlanta fared vs Charlotte and Detroit.  This is not to pick on Detroit, I chose them because many of the cities on the composite index were in the Sunbelt and Western regions and Detroit did not experience as large of a bubble (not to say that they did not experience some type of bubble, they are just not like Las Vegas).  I also chose them because I read this article about how Michigan is the #4 state for underwater homes, and Georgia is a close 5th.  I have also placed Charlotte’s numbers in this chart to give you an idea of another city in the Southeast, but not Florida.

The chart above shows that Altanta home prices have taken a strong hit in the past few months.  I had also heard that Atlanta’s lower-priced third of homes have suffered worse than the other tiers in Atlanta.  Recently the Occupy Atlanta movement has moved from mere ‘protesting’ to actually preventing foreclosures.  I was curious how bad the situation was for Atlanta’s lower third of home-owners.  Luckily there is information on Atlanta housing prices broken out into three tiers.  Using information from the Case-Shiller index, I plotted the prices of three groups of houses:  Tier 1 (Homes under $88,819), Tier 2 ($88,819 – $180,756) and Tier 3 (Over $180,756).  All these prices were indexed to 100 in January of 2000.

I was actually kind of shocked to find that there was a such a large disconnect to the three prices.   If we can assume that lower class people live in lower class housing, then we can see that the poorest third in Atlanta as suffered the largest loss of value in their homes.  I have also read that the poor tend to have most of their personal wealth in their property.  Thus it is probably also likely that the poorest Altantans have lost a large amount of their personal wealth.  It’s almost shocking that the bottom third of Atlantan home-owners are doing worse than the average Detroiter! (Detroitian?)

 

Posted in Data Journalism | Tagged , , | Comments Off

A great movie idea!

So, as I was looking through RedBox last week, I saw an ad for a particular vampire/werewolf/human love-triangle movie and thought that there was a Venn diagram that went into the writing of that series. I bet the thought process was like “lets try a book about the intersection of all three!” Since I am not very fond of that series, but I am of other movie topics I thought that I would like to see a movie that is the intersection of these three topics.

Currently I don’t know of a Nazi-Zombie-Robot movie, but I could not find one.  If there is not then that movie studio stands to make a boatload.  Dead Snow was awesome, and Iron Sky looks to be equally, if not more impressive.  I am not certain there are Nazi robots in that movie, but it looks like it from the previews.

Note:  Terminator is a robot zombie movie.  If you think I need to defend this, then think of the classic zombie traits: scaring on half/most/all of the face, need to crush the skull to kill it, unrelenting desire to kill humans, no ability to reason with it, etc.  So YES I think the classic terminators are clearly the intersection of zombies and robots.

Also, I promise not to turn this into a Venn diagram website, but I just had to post this one.

Posted in Uncategorized | Tagged , | Comments Off

Venn Diagram of Data Science

Here is a Venn Diagram of an idea I had. Honestly I was dreaming about a Venn diagram (I know that’s dorky, but it’s cooler than dreaming about Excel) and my dream only revealed the data corner but I immediately realized when I awoke that the other two had to be art and statistics. I seem to be interested in every part of this graphic. I have worked on a solely IT data-troll side, I have designed forms that people use, I have played with the alpha transparency of images, used math to draw images, and used stats to examine large amount of data,… now if I could just focus all that experience towards the middle! ;)

I would say that this seems to highlight my interests and should serve as a rough ‘guide’ to kids in school to concentrate on some of these areas. It is of course by no means exhaustive nor detailed. Just a fun graphic. The Data Science World seems to best be described as the intersection of information collected by data-trolls, displayed by graphic designers (is there a slang term for them?), and understood by stats nerds.

Notes:
Not sure if ‘GUI designers’ is the best term for someone interested in both graphic design and data, I am sure there are other types of people who are both, but they should be qually interested in both.
Complexity Artists: People like this and this

Posted in Jobs | Comments Off

Is Driving Making us poorer?

So I took some some more numbers from my previous post, added in other piece of data, and saw this cool relationship.  It would appear that there is a strong correlation between miles driven and savings rate.  As the miles we drive increases, the rate we are saving decreases and the two have a -.75 correlation.  I used the number of miles driven over the previous 12 months (available from the US DOT).  The other national number I found was the US Savings Rate (PSAVERT) which is also available from the Fed.

Now of course this is my obligatory causation-is-not-correlation disclaimer:  It could be that cars are not making us poorer, but that Americans want to spend money on other things and use their cars to drive to retail stores more often. Our overall consumerist culture could be the variable that is driving down (pun intended) our savings rate and modifying our driving behavior. Also in the past year, the total miles driven and the savings rate have both declined, although I’ll have to analyze any auto-correlation at a later date.

Another cool relationship is between driving and the number of jobs.  I also found a strong positive correlation between another two pieces of data.  I am using total non-farm employment (PAYEMS), which is again from the St. Louis Fed.  This correlation is extremely strong, except for the past year.  In the past when the number of people employed dipped, the number of miles driven also dipped.  As you can see in this chart below, the last year shows that the number of miles driven has declined while employment has gone up!  I think the US is actually turning a corner and we are becoming a less car-dependent society. I was actually surprised because I thought that people who are looking for any job would be willing to drive much farther. Potentially are we seeing in the last year here the growth of the work-from-home job?

Below is the Tableau workbook showing the relationship between Miles Driven, Non-Farm Employment, Gas Prices, and Savings Rate.

Posted in Cars | Tagged , , | 1 Comment

The US drove the least number of miles since Nov ’04

I was looking to duplicate this graphic from the New York Times when I discovered that miles driven in November 2011 (which is the most recent data) were at a local minima compared to the last few years.  The government has some fabulous data, with raw excel format and charts for the miles driven available here.  Likewise I found data from the BLS on gasoline prices from here.  I have realized that I probably need more information to account for seasonality so I continued to explore the data more.  While looking over the data I have discovered that the US monthly moving annual 12-month total miles driven in November 2011 was at the lowest level since November 2004! (Unless of course there are problems with the data).  The graphs below show the 12 month total miles driven (which is Figure 1 of this pdf) and the gas prices for the same time period.

So here is my chart similar to New York Time’s gas/miles scatterplot showing my monthly data followed by a similar chart using US Government data:

However this result is not nearly as interesting to me as the first discovery!

Posted in Cars | Tagged , , , , | Comments Off