I’ve been doing a lot of market basket analysis at The Information Lab lately. Market basket analysis is a way of looking for things that people buy at the same time (or that people never buy at the same time) in order to spot trends in people’s behaviour. For example, it’s probably obvious that if somebody buys cereal, they’ll probably also buy milk. Or that if somebody buys tofu, they’re not going to be buying sausages. This is a really nice example of how it all works.
Thing is, after a while, using bread and butter or cereal and milk or sausages and tofu as an example gets kinda dry. And talking about Lego shovels and milligram-level accurate scales is sometimes a little unprofessional, even if it is a perfect example of consumer behaviour.
So, I’ve been analysing the Eurovision Song Contest. The jury votes lend themselves pretty well to market basket analysis, because they’re pretty similar to transactions: each country’s jury (or customer) votes for (or buys) ten countries (or items) at a time (in a basket), and the fact that these countries (or items) are a subset of all possible countries (or items) to vote for (or buy) means that you can make the same selection vs. non-selection distinction. And we all know that some countries always vote for some other countries, regardless of how good the song is, which is part of what makes it fun.
I took the historic Eurovision data collected by Stephan Okhuijsen of Datagraver. Then, using Alteryx, I filtered it to all contests from 1993 onwards, because European countries have been relatively consistent since then. I also filtered it to the final only, and to the jury votes only.
I set the minimum support for a rule to 0.01, which is kind of high for a regular market basket analysis using tens of thousands of SKUs in a supermarket, but works fine for such a closed set of possible choices of countries. I also set the minimum confidence to 0.05. That gave me almost 33,000 association rules, of which about 1,600 were one-to-one country mappings.
The full results are in an interactive dashboard here.
In the matrix at the bottom, you can see who consistently votes for who, and it’s pretty predictable. Cyprus and Greece, for example, almost always give each other the most possible points. There’s a big love in between Moldova and Romania, and between Turkey and Azerbaijan. The Nordics are a bit too cool to give each other full marks every time, but it’s still a bit of a Scandi circle jerk. Andorra love Spain, although it doesn’t seem like that’s reciprocated. Azerbaijan have never voted for Armenia, funnily enough. And Austria have given Australia full marks twice, which I like to believe is because they were hoping to exploit a poor fuzzy matching process in the background scoring:
But market basket analysis shows how countries behave as a group, where we can see how some associations are Europe-wide, and some are just confined to the two countries. For example, some of the Scandi trends are reflected in votes across Europe; if a country, any country, votes for Denmark, they’re also likely to vote for Norway:
And surprise, surprise, countries that vote for Ukraine will also vote for Russia:
But the Greece/Cyprus love in is special just for them; in fact, if anything, there’s a slightly negative association between them, meaning that if a country votes for Cyprus, they’re slightly less likely to vote for Greece as well:
Likewise with Turkey and Azerbaijan. Just because they give each other full points all the time, other European countries don’t link the two together in their voting behaviour at all:
Meanwhile, even though Azerbaijan will never give points to Armenia, and Armenia have only ever given one point to Azerbaijan, other European countries are far more optimistic. Maybe they hope that voting for both Armenia and Azerbaijan at Eurovision can resolve the Nagorno-Karabakh dispute. Or maybe they just don’t know anything about the Caucasus region and think they’re the same place, I don’t know.
This is quite nice to illustrate, because the market basket analysis allows you to make the distinction; while there are some obvious associations between countries, like how Greece and Cyprus always vote for each other, it shows that those associations aren’t necessarily transferred to other countries’ voting behaviour.
Click through to the interactive version here to explore in more detail. I’m going to be using this in my teaching examples more often.
Pingback: Market Basket Analysis in, but not limited to, Alteryx | Vizzee Rascal