Alteryx, football, Tableau

The relationship between away team performance and distance travelled in the English football league

If you follow football, you often hear about arduous away trips to the other side of the country. This seems to imply that the further an away trip is, the more difficult it is for the away team.

However, is that actually true? Do away teams really do worse when they’ve travelled a long way to get there, or is there no difference?

The football league season has just finished, so I’ve taken each match result from the Championship, League One, and League Two in the 2016-17 season. After some searching, I got the coordinates of each football league team’s stadium, and used the spatial tools in Alteryx to calculate the distance between each stadium. I then joined that to a dataset of the match results, and you can download and play with that dataset here. I stuck that into Tableau, and you can explore the interactive version here.

First, let’s have a look at how many points away teams win on average when travelling different distances. I’ve broken the distance travelled into bins of 25 miles as the crow flies from the away team’s stadium to the home team’s stadium, then found the average number of points an away team wins when travelling distances in that bin (I excluded the games where the away team travelled over 300 miles as there were only two match ups in that bin – Plymouth vs Hartlepool and Plymouth vs Carlisle).

It turns out that it actually seems easier for away teams when they travel further away:

Teams travelling under 25 miles win just under a point on average, while teams travelling over 200 miles win between 1.3 and 1.6 points on average.

This is surprising, but there could be several reasons contributing to this:

  1. Local rivalries. It’s possible that away teams do worse in derby matches than in other matches; this is something to investigate further.
  2. Team bonding. It’s possible that travelling a longer distance together is a shared experience that can help with team bonding.
  3. Southern economic dominance. England is relatively centralised, economically speaking; most of the wealth is in the south. Teams in the South travel further than average to away games, so perhaps the distance advantage actually shows a southern economic advantage; teams in richer areas can buy better players.
  4. Centralisation vs. sparser regions. England is relatively centralised, geographically speaking; most of the population lives in the bits in the middle, and teams in the Midlands travel the least distance on average. Perhaps teams in more centralised areas (e.g. Walsall, Coventry) have more competition for resources like new talent and crowd attendance, while teams in less centralised areas (e.g Exeter, Newcastle) might have less competition for those resources.

I also used Tableau’s clustering algorithm to separate out teams and their away performances based on distance travelled, and it resulted in four basic away performance phenotypes (which you can explore properly and search for your own team here):

Since I had the stadium details, I had a look at whether the stadium capacity made a difference. This isn’t a sophisticated analysis – better teams tend to be more financially successful and therefore invest in bigger stadiums, so it’s probably just a proxy for how good the home team is overall, rather than capturing how a large home crowd could intimidate an away team.

Finally, this heat map combines the two previous graphs and shows that away teams tend to do better when they travel further to a smaller ground. This potentially shows the centralisation issue discussed earlier; the lack of data in the bottom right corner of the graph shows that there are very few big stadiums in parts of the country like the far North West, North East, and South West, where away teams have to travel a long way to get to.

So, it looks like the further an away team travels, the better they tend to do… although that could reflect more complicated economic and geographic factors.

Standard
Alteryx, R

How to be an R soul: an introduction to the R Tool in Alteryx

Alteryx is great for a lot of analysis, and the in-built tools improve with every release. But sometimes you just need to work with the R code directly; maybe you’ve inherited an R document that you need to reproduce, or maybe you need to use a specific package for sentiment analysis, or maybe you’re just far more used to R syntax and want to make sure the model is running exactly as you intend.

This is where the R tool comes in handy.

For this blog, I re-ran a section of one of the experiments I did for my PhD. You can find the data and R analysis script here (better still, download the Rmarkdown html and view in your browser to see the code and the command line output), and you can read the paper here. One section of the analysis compared mixed models using the lme4 package, which I’m not sure how to do in Alteryx. I’m sure there’s a way, but the R tool is perfect for making sure that I reproduce the results exactly.

First, drop the R tool into the workflow:

1

It’s not enough to just connect the previous tool to the R tool input, though; you have to specifically tell the R tool to load the data in. You can do that with this bit of code at the top of the scripting panel. The R tool takes multiple inputs so you can bring in various different pieces of data; the R tool recognises them as #1, #2, #3, etc. This line says “read input #1 into the R tool as a dataframe and store it as behdata within the R script”:

2

You then need to load the R packages you’ll be using. It’s a bit tricky to install extra R packages in Alteryx if the installer doesn’t match your version, but Alteryx comes with quite a lot of useful R packages pre-installed anyway (see here for Alteryx 10 and here for Alteryx 11). However, even if the packages are already installed, they need to be loaded each time.

3

Now, you can continue with the R code… for the most part.

4

Once you’ve done your coding, you’ll need to write the results to the R tool output. This code is pretty similar to the input; it reads “write the object modelcomparison to R tool output 1”:

5

However, because Alteryx works with dataframes, you can only write dataframes to Alteryx. This means you’ll have to convert matrices into dataframes, and if you’re dealing with lists, you’ll have to coerce them to dataframes before you can do anything with them.

Sadly, the R tool doesn’t have a command line. When I want to look at the properties of the model, in R I’d simply type summary(modelname) and get a nice result in the command line:

6

One way of doing this in Alteryx would be to store the summary as an object and then write to one of the outputs. However, a model summary like this is a list in R, which can’t be written to Alteryx without converting it to a dataframe first. If you try it, you’ll see this error:

7.png

and it’s a little more complicated than that, but that’s another blog for another time.

 

Standard
Alteryx

Now you’re making me cross(tab)… getting around character glitches in Alteryx’s crosstab tool.

I was building an Alteryx app for a client this week, and spent an hour or two tripping up over a really straightforward issue. My workflow worked just fine for a small subset of the data that I was testing it on… but when I fed in the rest of the data, I got this error message:

1-error-message

This isn’t helpful. My data is perfectly clean, thank you very much. I’m not having that. The workflow was working fine for a subset of the data, so there’s no reason it should have tripped up just because more data was added. Or so I thought… but it turns out that Alteryx’s Crosstab tool has a problem with special characters.

Let’s start from the beginning. I’m building an app with a drop down menu which lets you filter the data to a single value. That looks a little bit like this:

3-simple-app-set-up

You can manually type in the possibilities in the drop down tool, but if there’s a lot of them (which there generally are), it’s a bit of an arse ache, and it’s not dynamic either in case the data changes in future… so the best option is to populate the drop down menu with the field names of a connected tool:

2-app-drop-down-configuration

Irritatingly, there isn’t an option in the drop down tool configuration to take distinct values from the rows of a particular field of a connected tool. This means that you have to take the field where the interesting stuff is and crosstab it, so that all the values become a column heading.

This is pretty straightforward. First, I used a summarize tool and grouped the data by the field which has the values which you want to be in the drop down tool. Then, because you can’t crosstab a single field, I simply grouped by the same field again. That gave me this output:

5-first-group-by-twice

…and I just crosstabbed it so that I’d get A * B, A + B, and A – B as the field names, and also A * B, A + B, and A – B as the first row of data.

But no:

7-error-message

The warning message is more informative than the error message here. What’s going on with the multiple fields named “A___B”?

It turns out that the crosstab tool automatically changes special characters, like *, +, and -, to underscores in field names. In my subset of data at work, I wasn’t working with any values with special characters in them; but when I brought the rest of the data in, there were values that were textually different, like A * B and A + B, which became the same thing when replacing the special characters with underscores. I’m not sure why it does this; my guess is that it’s something to do with making field names compatible with programmes like SQL and R, which are more restrictive in the characters they allow in field names.

I wasted quite a bit of time trying to work out what was going on here, but luckily, there’s a simple solution. Instead of grouping by the field in the summarize tool twice, just group by that field once. Then, add a Record ID tool in, so that you get something like this:

9-record-id

Now, you can crosstab successfully. Put the Record ID field as the new column headers, and the thing you’re actually interested in as the values:

8-cross-tab-successfully

The next step is to use a dynamic rename tool to take the column names from the first row of data. Unlike the crosstab tool, the dynamic rename tool doesn’t change special characters when assigning new column names:

9-dynamic-rename

…and there you go. Now you have an app where the populated drop down menu works with special characters!

Standard