Alteryx, football, Tableau

The relationship between away team performance and distance travelled in the English football league

If you follow football, you often hear about arduous away trips to the other side of the country. This seems to imply that the further an away trip is, the more difficult it is for the away team.

However, is that actually true? Do away teams really do worse when they’ve travelled a long way to get there, or is there no difference?

The football league season has just finished, so I’ve taken each match result from the Championship, League One, and League Two in the 2016-17 season. After some searching, I got the coordinates of each football league team’s stadium, and used the spatial tools in Alteryx to calculate the distance between each stadium. I then joined that to a dataset of the match results, and you can download and play with that dataset here. I stuck that into Tableau, and you can explore the interactive version here.

First, let’s have a look at how many points away teams win on average when travelling different distances. I’ve broken the distance travelled into bins of 25 miles as the crow flies from the away team’s stadium to the home team’s stadium, then found the average number of points an away team wins when travelling distances in that bin (I excluded the games where the away team travelled over 300 miles as there were only two match ups in that bin – Plymouth vs Hartlepool and Plymouth vs Carlisle).

It turns out that it actually seems easier for away teams when they travel further away:

Teams travelling under 25 miles win just under a point on average, while teams travelling over 200 miles win between 1.3 and 1.6 points on average.

This is surprising, but there could be several reasons contributing to this:

  1. Local rivalries. It’s possible that away teams do worse in derby matches than in other matches; this is something to investigate further.
  2. Team bonding. It’s possible that travelling a longer distance together is a shared experience that can help with team bonding.
  3. Southern economic dominance. England is relatively centralised, economically speaking; most of the wealth is in the south. Teams in the South travel further than average to away games, so perhaps the distance advantage actually shows a southern economic advantage; teams in richer areas can buy better players.
  4. Centralisation vs. sparser regions. England is relatively centralised, geographically speaking; most of the population lives in the bits in the middle, and teams in the Midlands travel the least distance on average. Perhaps teams in more centralised areas (e.g. Walsall, Coventry) have more competition for resources like new talent and crowd attendance, while teams in less centralised areas (e.g Exeter, Newcastle) might have less competition for those resources.

I also used Tableau’s clustering algorithm to separate out teams and their away performances based on distance travelled, and it resulted in four basic away performance phenotypes (which you can explore properly and search for your own team here):

Since I had the stadium details, I had a look at whether the stadium capacity made a difference. This isn’t a sophisticated analysis – better teams tend to be more financially successful and therefore invest in bigger stadiums, so it’s probably just a proxy for how good the home team is overall, rather than capturing how a large home crowd could intimidate an away team.

Finally, this heat map combines the two previous graphs and shows that away teams tend to do better when they travel further to a smaller ground. This potentially shows the centralisation issue discussed earlier; the lack of data in the bottom right corner of the graph shows that there are very few big stadiums in parts of the country like the far North West, North East, and South West, where away teams have to travel a long way to get to.

So, it looks like the further an away team travels, the better they tend to do… although that could reflect more complicated economic and geographic factors.

Standard
Alteryx

Now you’re making me cross(tab)… getting around character glitches in Alteryx’s crosstab tool.

I was building an Alteryx app for a client this week, and spent an hour or two tripping up over a really straightforward issue. My workflow worked just fine for a small subset of the data that I was testing it on… but when I fed in the rest of the data, I got this error message:

1-error-message

This isn’t helpful. My data is perfectly clean, thank you very much. I’m not having that. The workflow was working fine for a subset of the data, so there’s no reason it should have tripped up just because more data was added. Or so I thought… but it turns out that Alteryx’s Crosstab tool has a problem with special characters.

Let’s start from the beginning. I’m building an app with a drop down menu which lets you filter the data to a single value. That looks a little bit like this:

3-simple-app-set-up

You can manually type in the possibilities in the drop down tool, but if there’s a lot of them (which there generally are), it’s a bit of an arse ache, and it’s not dynamic either in case the data changes in future… so the best option is to populate the drop down menu with the field names of a connected tool:

2-app-drop-down-configuration

Irritatingly, there isn’t an option in the drop down tool configuration to take distinct values from the rows of a particular field of a connected tool. This means that you have to take the field where the interesting stuff is and crosstab it, so that all the values become a column heading.

This is pretty straightforward. First, I used a summarize tool and grouped the data by the field which has the values which you want to be in the drop down tool. Then, because you can’t crosstab a single field, I simply grouped by the same field again. That gave me this output:

5-first-group-by-twice

…and I just crosstabbed it so that I’d get A * B, A + B, and A – B as the field names, and also A * B, A + B, and A – B as the first row of data.

But no:

7-error-message

The warning message is more informative than the error message here. What’s going on with the multiple fields named “A___B”?

It turns out that the crosstab tool automatically changes special characters, like *, +, and -, to underscores in field names. In my subset of data at work, I wasn’t working with any values with special characters in them; but when I brought the rest of the data in, there were values that were textually different, like A * B and A + B, which became the same thing when replacing the special characters with underscores. I’m not sure why it does this; my guess is that it’s something to do with making field names compatible with programmes like SQL and R, which are more restrictive in the characters they allow in field names.

I wasted quite a bit of time trying to work out what was going on here, but luckily, there’s a simple solution. Instead of grouping by the field in the summarize tool twice, just group by that field once. Then, add a Record ID tool in, so that you get something like this:

9-record-id

Now, you can crosstab successfully. Put the Record ID field as the new column headers, and the thing you’re actually interested in as the values:

8-cross-tab-successfully

The next step is to use a dynamic rename tool to take the column names from the first row of data. Unlike the crosstab tool, the dynamic rename tool doesn’t change special characters when assigning new column names:

9-dynamic-rename

…and there you go. Now you have an app where the populated drop down menu works with special characters!

Standard