Alteryx, football, Maps, Tableau

Centre of Gravity, Metaphorically: Plotting time-based changes on maps

I haven’t written a blog in far too long. My bad. So, to get back into the swing of things, here’s something I’ve been playing with this week: centre of gravity plots.

It started with an accident. I had some EU member data, and I was simply trying to make a filled map based on the year each country joined, just to see if it was worth plotting. You know, something like this:

1 eu filled.png

Except that I’d been having a clumsy day (the kind of day where I spilled coffee on my desk, twice), and accidentally missed the filled map option and clicked line instead:

2 broken line.png

Now, I normally don’t like connected scatterplots, but realised that I could change a couple of things to this accident to make quite a nice connected scatterplot on a map, joining up the central latitude and longitude of each country, so I thought I’d follow through with it and see what happened.

(by the way, the colour palette I use is the Viridis Palette, which I absolutely love. You can find the text to copy/paste into your Tableau preferences file here)

Firstly, I changed my “year joined” field from a discrete dimension into a continuous measure so that I could make it a continuous line with AVG(Year joined):

3 connected line left right.png

This connects all the countries by their central latitude and longitude as generated by Tableau, but it joins them up in order from left to right on the map. So, I then added AVG(Year joined) to the path shelf as well, which means that each country is joined in chronological order, or in alphabetical order when there’s a tie (as with Belgium, France, Germany, Italy, Luxembourg, and the Netherlands, who formed the EU in 1958):

4 connected line year.png

I was pretty happy with this; it shows the EU’s expansion eastwards over time far, far better than the filled map did.

I got talking to Mark and Neil online, who introduced me to the idea of “centre of gravity” plots, which show the average latitude and longitude of something and how it changes with respect to something else (usually time). In this case, a centre of gravity plot of the EU would show the average central point of Belgium, France, Germany, Italy, Luxembourg, and the Netherlands in 1958, then the average central point of Belgium, France, Germany, Italy, Luxembourg, the Netherlands, Denmark, Ireland, and the UK in 1973… and so on. I figured it should be easy enough, I’d just take Country off detail, replace it with Year joined, and average the latitudes and longitudes together.

Sadly, it doesn’t work that way. The Latitude (generated) and Longitude (generated) fields that Tableau automatically generates when it detects a geographic field like country can’t be aggregated, and can’t be used if the geographic field they’re based on isn’t in the view. That meant I couldn’t average the latitudes and longitudes over multiple countries without creating lots of different groups.

But, there’s a simple way around this! You can create a text table of the latlongs, copy/paste them into Excel or whatever, then read that in as another data source. Firstly, drag your geographic field into the view, and put the latitude on text, like so:

5 create table.png

Then copy and paste it all (I just click on there randomly, hit ctrl+A, ctrl+C, switch to Excel, ctrl-V). Now do the same for the longitude. Save the document, and read it in as a separate data source in Tableau. Now you can blend the data on Country, or whatever your geographic field is, and you’ve got actual latlongs that you can use like proper measures.

And so I did. I recreated the line chart with the new fields, but took Country off detail, and made AVG(Latitude) and AVG(Longitude) into moving average table calculations which take the current value and an arbitrarily high number of previous values (I put in 100, just because). This looked pretty good:

6 cog flawed averages.png

…but then I realised that it wasn’t accurate data. Look at the point for 1973, after the UK, Ireland, and Denmark joined. Doesn’t that seem a little far north?

7 cog flawed illustrated

To investigate it fully, I duplicated the sheet as a crosstab, because sometimes, tables are the best way to go. What I found is that I’ve got a bit of Simpson’s Paradox going on; the calculation is taking averages of averages:

8 cog flawed explained.png

Not so great. If we add Country to the view after the Year joined pill, you can see what it should be:

9 what it should be doing.png

But the problem is, how do we put Country on detail but then get the moving average to ignore it? I tried various LODs, but couldn’t get it to work exactly – if you have a solution, I would love to hear it! My default approach is to try to restructure the data in Alteryx – because that generally solves everything – but I feel like I’m becoming too reliant on restructuring the data rather than working with what Tableau can do.

Anyway, I ended up restructuring the data by generating a row for each country and year that the country has been a member of the EU. That means I can create a data table like this:

10 restructured table

…which removes the need for a moving average calculation entirely, because the entire data is moving with the year instead. Just take country off detail / out of the view, and you get the right averages:

11 restructured table 2

Much more accurate:

12 EU cog.png

This is a better way of structuring the data for this particular instance, because the dataset is tiny; 28 countries, 60-ish years, 913 rows in my Excel file. It’s not going to be a good, sustainable solution for a centre of gravity plot over a much bigger dataset though. I did the same thing for the UN – 193 countries, 70-ish years – and ended up with 10,045 rows in my Excel file. It’s easy to see how this could explode with much more data.

It does look interesting, though; I’d never have guessed that the UN’s centre of gravity hadn’t really left the Sahara since its inception:

12 UN cog

Finally, since I was on a roll, I plotted the centre of gravity for the English football champions since the first ever professional season in 1888-89. Conceptually, this was slightly different; unlike the EU and the UN, the champion isn’t a group of teams constantly joining over the years (although it is possible to plot that too). Rather, I wanted to create a rolling average of the centre of gravity over the last N years. If you set it to five years, it’s a bit messy, moving around the country quite a lot:

13 english football 5 years.png

But if you set it to 20 years, the line tells a nice story. You can see how English football started out with the original northern teams being the most powerful, then it moves south after the Second World War, then it moves north-west during the Liverpool/Manchester era of domination, and finally it’s moving south again more recently:

14 english football 20 years.png

Many thanks to Ian, who showed me how to parameterise this. Firstly, put your hard-coded (i.e. not Tableau generated!) latitude or longitude field in the view, and create a moving average over the last ten years. Or two, or thirteen, or ninety-eight, it doesn’t really matter. Next, drag the moving average latitude/longitude pill from the rows/columns into the measures pane in order to store it. This creates a calculated field. Meanwhile, create a parameter to let you select a number. This will change the period to calculate the moving average over. Open up the new calculated fields, and replace the number ten/two/thirteen/ninety-eight with your newly-created parameter, remembering to leave the minus sign in front of it:

15 calc mov avg param

This will let you parameterise your moving average centre of gravity.

It was a lot of fun to play around with these maps this week. I’ve packaged them all up in a Tableau Public workbook here; I hope you find it as interesting as I did!

(title inspiration: Touché Amoré – Gravity, Metaphorically)

Standard
Alteryx

Between order and (statistical) model: how the crosstab tool in Alteryx orders things alphabetically but inconsistently

I was using my Mahalanobis Distance calculation recently on some of my Spotify listening data, and I ran into difficulty. When I calculated the MD value of one song compared to the benchmark group, it gave me a value of 3.12. Nice. But, when I calculated the MD values of several songs at once compared to the same benchmark group, I got a value of 0.67 for that same song that was 3.12 when calculated individually. The same thing happened for lots of other songs; I got one value when calculating it individually, and another when calculating a whole bunch of them together.

This was weird, and after several hours of diagnosing what was going on, I finally found it. There’s an inconsistency with the Crosstab tool that I’d never noticed before, and this had a critical knock-on effect.

I’ll walk through it step by step with some random data. Here’s the content in a text input tool; note the variety of capitals, lower case, and numbers:

2

For the MD calculation, what I need is two tables; one where there’s a column for each Thing Name, like this:

3

And one where there’s a row for each Thing Name, like this:

4

It should be simple to generate this, but it isn’t because the Crosstab tool orders the Thing Name alphabetically.

First, let’s see what happens when generating the table with Thing Name as columns. Set the Crosstab tool up like this (for the aggregation, you can choose First, Average, Sum, it doesn’t make a difference with this dataset):

5

Run the workflow, and this is the output. Note how the output has reordered the Thing Name alphabetically:

6

It’s put the Thing Names beginning with numbers first, put those in ascending order, then taken all the Thing Names beginning with letters, and put those in alphabetical order a through z.

Right. Let’s now look at what happens when generating a table with one row per Thing Name. Set up the Crosstab tool like this (again, aggregation method doesn’t matter):

7

And here’s the output:

8

 

This time, it’s put the Thing Name in the rows in alphabetical order slightly differently. First come the Thing Names beginning with numbers in ascending numerical order as before… but then it’s treating Thing Names beginning with CAPITAL LETTERS and Thing Names beginning lower case letters separately. It runs through the capital-first Thing Names A through Z, and then and only then does it run through the lower case-first Thing Names a-z.

Considering that the MD calculation involves matrix multiplication where it’s assumed that the order of items in the rows and columns is identical, this creates a massive problem down the line!

There are two solutions. One is to CAPITALISE EVERYTHING before even starting, which will probably work in most cases… but if your Thing Names are identical except for case (e.g. if XXX, xXx, and xxX are different variables), it will collapse them together a bit like it does for punctuation. This is not ideal.

The other solution is to use record IDs and manual reordering to ensure that the rows and columns stay in the same order, like this (which is how I generated the first two tables in this blog):

9

This was an incredibly simple thing that was messing up my calculations, but it took me hours to find. If you’re running into issues – and even if you don’t think you are – check the order of things in your tables!

Standard
Tableau

Time is money, money is power, power is pizza, and pizza is knowledge.

I’m not proud to admit it. I was recently in a pub and pulled out my phone to calculate the area of a set of pizzas we were ordering. There were two sizes of pizza on the menu – 12″ and 18″, for £12 and £18 each – and my group were going to order five 12″ pizzas.

I couldn’t let that slide.

“Nah, let’s get three 18″ pizzas instead”, I said. “More pizza, less money.”

Intuitively, it feels wrong. Yes, an 18″ pizza is bigger, but surely five 12″ pizzas are more than only three 18″ pizzas? This is when I worked out the total area of the pizzas and their price per square inch, and for the record, buying three 18″ pizzas at £18 gets you 763 sq inch of pizza for £54, whereas buying five 12″ pizzas at £12 gets you 566 sq inch of pizza for £60. We bought the big ones.

This got me thinking about how to visualise this in Tableau, and I created the following handy dashboard for settling your pub pizza problems (click here for desktop version, or here for mobile version):

mobile pizza dash.png

This was a fun one to build, not least because there’s basically no underlying data. I knew I wanted the inputs for the calculations to depend entirely on parameters, so all I needed was the names of the two pizzas. This is the spreadsheet I created:

1

Simple.

The next thing was to create three parameters per pizza; diameter, price, and quantity. The values entered here can be passed to the data with a case statement, e.g.:

2

…or directly within a formula:

3

Things were starting to get complicated, so I built out a table to make sure the numbers were working properly:

4.png

From here on out, it was relatively straightforward. It’s really hard to accurately judge area when looking at circles and squares (which is one of the reasons I hate packed bubbles and tree maps), so I started out by visualising the pizzas as actual circles vs. bar graphs.

This makes the two pizzas look relatively similar in size:

5.png

…whereas this shows that one of them is more than double the other:

6

One useful little trick I found was to create a calculated field to place the two pizza icons in specific places:

9.png

This spaced them out nicely rather than piling them on top of each other or putting them side by side and too close together:

10.png

The most fiddly bit was creating the text summary. I dragged a lot of calculated string fields onto the label shelf, and configured it like this:

8

And an example of some of the fields going into making a sentence which automatically updates is the best pizza deal field here:

7

I really enjoyed playing around with this viz. People who use Tableau often talk about liberating your data, but this was a nice exercise in what’s possible with basically no data at all!

April pizza

Standard
Tableau

Paint your target: how to create a bullseye graph in Tableau

Ever thought of tracking whether something’s hitting the target by showing an actual target? I was looking through some old radial blogs recently, and realised you could use a scatterplot on x-y coördinates to show accuracy on a bullseye target.

(to cut straight to the viz on Tableau Public, go here; to find out how to create it, keep reading!)

For example, you could set up an image of a target a bit like this as a background image:

1

…and plot simple x-y points in Tableau over the top to show how close people/departments/countries are to meeting their target:

10

I used MS Publisher to design these concentric circles in the background image because when you select them all and save as an image, it doesn’t save the blank space as white, so you can use it with any colour worksheet or dashboard. Feel free to download this target image here (and if you use it, give me a shout! I’d love to see what kind of cool things you’re using it for).

Sadly, Superstore doesn’t have target data on it, so I’ve mocked up a quick dataset of sales and target sales per country as follows:

2

The next thing to do is calculate the “accuracy” on the bullseye; the nearer the sales figure is to the target, the closer to the middle of the bullseye it should be. However, once you’ve met or exceeded your target, you don’t want the points to keep moving. Countries with sales at 101% of the target and at 300% of the target should both still be in the middle.

The exact middle of the target is going to be at coördinates (0,0), which means that we actually want to create a field which takes some kind of inverse of the accuracy, where the greater the accuracy, the smaller the number in that field.

3

So, let’s create the following calculated fields. I’ve adapted them a bit from this blog on radial bar charts so that you can put the X field on columns and the Y field on rows, which is more intuitive.

Accuracy:

IF [Sales] > [Target] THEN 1.05
ELSE [Sales] / [Target]
END

Radial Field:

1.1 - [Accuracy]

Why have I set the Accuracy equaliser bit to 1.05 instead of 1? And why have I set the Radial Field calculation to 1.1 instead of 1?

Well, it’s a bit fiddly, but it’s about plotting. I want everything where Sales is 100% of Target or more to be in the gold bullseye, so I want that to fill a certain amount of space. If I set the Accuracy equaliser bit to 1 and the Radial Field calculation part to 1, then it plots everything that’s at 100% or more at (0,0) and includes everything that’s 90% or more in the gold bullseye. I want to space out the 100% or more points so that they’re not on top of each other, and I want only the 100% or more points to be included in the gold bullseye. Setting the Radial Field calculation part to 1.1 makes it so that the edge of the gold bullseye denotes 100%. That means that the 100% or more points will be plotted on the edge of the bullseye. So, the 1.05 part in the Accuracy calculation moves those points further inside the bullseye, but not into the exact middle where they’ll be on top of each other.

While I’m at it, there’s a limitation to the Accuracy calculation. Have you spotted how it’s [Sales] > [Target] rather than SUM([Sales]) > SUM([Target])? This is because the angle calculation needs to aggregate the Radial Field further, and will break if the Accuracy or Radial Field calculations are already aggregated. This means that you’ll probably need to do some data processing to make sure that it’s just one row per thing in the dataset.

The next fields to calculate are:

Radial Angle:

(INDEX() -1 ) *
(1/WINDOW_COUNT(COUNT([Radial Field])))
* 2 * PI()

X:

MAX([Radial Field]) * SIN([Radial Angle])

Y:

MAX([Radial Field]) * COS([Radial Angle])

Now you can drag X onto columns, Y onto rows, and put Country on detail:

4

This looks pretty rubbish, but that’s because the INDEX() function in the Radial Angle calculation is a table calc, and Tableau needs to know how to compute it. Edit the X and Y fields to compute using Country instead:

5

…and now you’ll get some points spaced out properly:

6

The calculations mean you can plot the points in a radial way; it’ll go through whatever field you’ve put on detail, and plot the points the right distance away from the centre, starting at 12 o’clock and looping round clockwise. With Country, it’s done alphabetically, so Belgium is on the X axis zero line. If you want to order the points differently, you can add a numeric field to the detail shelf, and change the table calculation to compute using that numeric field (but make sure to keep Country there!).

The next thing to do is to put the target in as a background image. Confusingly, this is under the Map options. Once you’ve found it, you have to specify exactly where the background image is going to sit on the x-y grid of coördinates. With this target, there are four concentric circles, denoting 70%, 80%, 90%, and 100%+. Because I set the Radial Field to be 1.1 – [Accuracy], there’s a distance of 0.1 in a circle around point (0,0) and the edge of the bullseye denoting 100%. That means that the bullseye section is going to run from -0.1 to 0.1 on both axes. Counting backwards, there are three other circles which are the same thickness; the light grey one denoting 90% will run from -0.2 to 0.2 on both axes, the mid grey one denoting 80% will run from -0.3 to 0.3 on both axes, and the dark grey one denoting 70% will run from -0.4 to 0.4 on both axes. As the edge of the dark grey circle is the edge of the background image, this is where you need to tell Tableau to position the background image:

7

Click OK, and there we are! A nice target, with the different points on it:

8

You can change the formatting to remove the zero lines if you like:

9

…but I actually kinda like them on this graph. I’ve also put Country on the colour shelf, and made the borders around the circles black.

10

I haven’t seen this approach before, so I’m not really sure what to call it. I’ve plumped for a bullseye graph, but maybe it already exists under another name. Let me know if somebody else has covered this, and definitely let me know if you find this useful! You can download the Tableau workbook I used to make this example here (it’s 10.2, by the way, but the same approach should work for older versions).

Standard
Tableau

If data is a supermarket, Tableau is a self-checkout machine

I’ve been helping to introduce Tableau to people who are used to getting their data by putting their query through IT and then plotting it in Excel when they get the extract hours or days later. It can be hard to convince people to use it; some say the current system works okay already, others say they don’t want to learn something new which might be difficult, others say they’ve tried it but it doesn’t work perfectly. While working on this, we came across a great analogy for Tableau, what it’s for, and why we’re getting people to help themselves by using it. It’s long and a bit convoluted, but it fits most businesses pretty well:

Remember a few years ago when self-checkout machines turned up at the supermarkets? You’d stand in line for the regular checkout, looking at the new machines, but you didn’t use them because you didn’t know how, and nobody else was sure either. And even when people did start to use them, there were teething problems; every so often there’d be an item it didn’t recognise, or an item too light for the bagging area to detect, so the process would break and it felt like the whole thing was worse than the old checkouts.

But the more you used them, the more errors happened, and the better the self-checkouts got by learning from those errors. Now, they work really smoothly, much faster than going through the old checkouts. And remember, just because things seemed to work in the old checkouts, that doesn’t mean that they did. I often bought a bag of apples but got charged too much because the cashier thought it was a more expensive variety, like I’d buy Braeburns but get charged for Pink Ladies. That’s an error in the process, but you wouldn’t know if you didn’t check your receipt with an expert eye anyway. And how often does something go wrong and the cashier calls over somebody else? It was never a smooth process to begin with, you’re just used to its flaws.

That’s the difference between opening a ticket with IT and going through their queue, and getting the data you need yourself. We’re the people who hover round the self-checkouts, ready to help if there’s an issue. It’s a bit daunting at first, but just try it out; scan your stuff, see how it works, and we’ll be there as soon as you’ve got an unexpected item.

This analogy has been really useful for helping contextualise Tableau as a way of working that speeds up their day, rather than an additional tool to learn that slows down their day.

If you’ve got any similar analogies, I’d love to hear them!

Standard
Alteryx

Now you’re making me cross(tab)… getting around character glitches in Alteryx’s crosstab tool.

***

UPDATE: this blog was written for macro inputs, but I’ve now written about a more general, and better, way of sorting them out. If you’re trying to sort out your underscores in your field names, give this blog a go!

https://gwilymlockwood.com/2019/10/11/alteryx-crosstab-underscores-how-to-dynamically-rename-them-with-metadata/

***

I was building an Alteryx app for a client this week, and spent an hour or two tripping up over a really straightforward issue. My workflow worked just fine for a small subset of the data that I was testing it on… but when I fed in the rest of the data, I got this error message:

1-error-message

This isn’t helpful. My data is perfectly clean, thank you very much. I’m not having that. The workflow was working fine for a subset of the data, so there’s no reason it should have tripped up just because more data was added. Or so I thought… but it turns out that Alteryx’s Crosstab tool has a problem with special characters.

Let’s start from the beginning. I’m building an app with a drop down menu which lets you filter the data to a single value. That looks a little bit like this:

3-simple-app-set-up

You can manually type in the possibilities in the drop down tool, but if there’s a lot of them (which there generally are), it’s a bit of an arse ache, and it’s not dynamic either in case the data changes in future… so the best option is to populate the drop down menu with the field names of a connected tool:

2-app-drop-down-configuration

Irritatingly, there isn’t an option in the drop down tool configuration to take distinct values from the rows of a particular field of a connected tool. This means that you have to take the field where the interesting stuff is and crosstab it, so that all the values become a column heading.

This is pretty straightforward. First, I used a summarize tool and grouped the data by the field which has the values which you want to be in the drop down tool. Then, because you can’t crosstab a single field, I simply grouped by the same field again. That gave me this output:

5-first-group-by-twice

…and I just crosstabbed it so that I’d get A * B, A + B, and A – B as the field names, and also A * B, A + B, and A – B as the first row of data.

But no:

7-error-message

The warning message is more informative than the error message here. What’s going on with the multiple fields named “A___B”?

It turns out that the crosstab tool automatically changes special characters, like *, +, and -, to underscores in field names. In my subset of data at work, I wasn’t working with any values with special characters in them; but when I brought the rest of the data in, there were values that were textually different, like A * B and A + B, which became the same thing when replacing the special characters with underscores. I’m not sure why it does this; my guess is that it’s something to do with making field names compatible with programmes like SQL and R, which are more restrictive in the characters they allow in field names.

I wasted quite a bit of time trying to work out what was going on here, but luckily, there’s a simple solution. Instead of grouping by the field in the summarize tool twice, just group by that field once. Then, add a Record ID tool in, so that you get something like this:

9-record-id

Now, you can crosstab successfully. Put the Record ID field as the new column headers, and the thing you’re actually interested in as the values:

8-cross-tab-successfully

The next step is to use a dynamic rename tool to take the column names from the first row of data. Unlike the crosstab tool, the dynamic rename tool doesn’t change special characters when assigning new column names:

9-dynamic-rename

…and there you go. Now you have an app where the populated drop down menu works with special characters!

Standard