For the last year and a half, I’ve been working as an applied mathematician at Sport Performance Analytics Inc. (https://www.sportperformanceanalytics.ca/), where we are interested in research related to all aspects of sport, from team movement to player fatigue to ball passing. Our perspective on sport is that it is a complex, dynamical system, whereby the team is more than just the sum of its players and the technical, tactical & physical aspects of the game all seamlessly integrate into team and player performance. Here I present a series of blogs with the aim of creating discussion with like-minded mathematicians, scientists, analysts, and anyone with a shared passion in analyzing the game beyond the pitch.

Over the last month, StatsBomb has progressively released detailed data of the entire senior career of Lionel Andrés Messi Cuccittini, “La Pulga Atomica” from his debut appearance in the FC Barcelona first team in 2004 up to the 2015/16 season. Every dribble, pass and shot from every match the Argentinian played during that time has been recorded and made available publicly at: https://github.com/statsbomb/open-data

Despite playing at FC Barcelona in the Spanish La Liga for over 10 years, Lionel Messi seems to continually outperform himself season after season. While Messi’s unbelievable goals and assist tallies (600 goals from 2004-2018!) are immediately cited as measures of his performance, they don’t quite paint the full picture of how the #10 has developed his role and established his presence over the years.

So how can we measure the importance and influence of Messi? One way to approach this question is through an examination of the “passing network” of a team. Mathematically, a passing network is an example of a weighted, directed graph (“digraph”), where vertices in the graph are the players themselves and edges in the graph represent the passes made between players, oriented from passer to receiver. Consider the image below, showing a passing network from Barcelona’s 2015/16 campaign:

Players are connected by lines, the thickness of which represents the number of passes made and arrows indicating who passed to whom. From the diagram, we can see there seemed to be a prominent passing connection from Dani Alves to Neymar, for example. Capturing passing patterns with graphs gives rise to some more metrics related to measuring the importance of players. Now here’s where we need to start getting a bit more precise with our language. For example, a player like Sergio Busquets might be considered crucial in setting up goals, but less so in terms of finishing them. So when we speak of importance, we usually use it as a descriptor in relation to a particular task or role.

As such, there actually turns out to be a number of ways to measure the significance of a player using pass networks. All these measures are collectively called “centrality” measures, referring to how “central”/relevant/important a player might be in the pass network of the team. Namely, we consider 3 types of centralities: in-closeness, out-closeness, and betweenness.

**In-Closeness & Out-Closeness Centrality**

I’ve grouped these first two centrality measures together because they’re actually quite similar conceptually, even though they measure almost opposite effects. Both are based on the idea of a “closeness” to other players in the network, and that closeness is determined by the number of passes made. Recalling the network diagram from before, Neymar and Jordi Alba are quite “close” and more specifically, Neymar demonstrates “in-closeness” while Alba demonstrates “out-closeness” since the ball is delivered “in” to Neymar, while it is passed “out” by Alba. If we measure distance between players by the inverse of the number of passes made between them, then the average distance of the paths in to/out from a player is called the in/out-closeness centrality. The distance is inverted because a higher number of passes corresponds to players being “closer”, i.e. a smaller distance. Mathematically, we can define the closeness centrality as follows:

Here *CC(i)* is the in-/out-closeness centrality of player *i*, *Ai* is the number of players reachable by player *i*, *N* is the total number of players, and *dn* is the distance (measured by passes) between player *i* and player *n*; to distinguish in- vs. out-closeness the distance is calculated using only in-coming or out-going passes, respectively.

So what does Messi’s in-/out-closeness tell us? Let’s see:

It’s clear that the in- and out-closeness centrality for Messi have gone up over the years, meaning his involvement as a passing source and target have increased. In fact, his role as a passing target for team-mates had become dramatically greater than his teammates by the 2014-2016 period. From the out-closeness plot we can see he also began taking on a prominent responsibility as a source for passes as early as 2010, on the level of a midfielder like Andrés Iniesta; however, Xavi was clearly the dominant source for the Catalan side, up until the tail-end of his career in 2014/2015. But the importance of a player in team passing isn’t always about who starts or ends the move, but also about who can connect players in order to string together passing plays. This leads us to our next centrality measure: betweenness centrality.

**Betweenness Centrality**

A metric like betweenness centrality seems like the perfect descriptor for midfielders, especially for a team who loves controlled build-up play with measured passing as much as Barcelona. To get a sense of why, let’s first look at the pass network example below:

In the network diagram above, Xavi (red) acts a “bridge” in the network, essentially allowing a chain of passes to form between the players in highlighted in green and the players highlighted in blue. The importance of Xavi in this role is captured by the betweenness centrality, which counts the number of times a player ends up being on the shortest path between two other players in the pass network. The formal mathematical definition would be as follows:

Where *BC(i)* is the betweenness centrality of player *i*, *n(i)* is the number of shortest paths between any other two players which pass through player *i*, and *N* is the total number of shortest paths between the other two players. The resulting metric can be normalized without loss of precision so that the resulting centrality value lies between 0 and 1, inclusive.

Let’s repeat the season-by-season comparison between Messi, Iniesta and Xavi, this time looking at the betweenness centrality:

Without a doubt, Xavi was outstanding in the Barcelona team as the link-up player in build-up play, as is confirmed by the plot above. With Messi, we notice a steady rise over the years, likely as his prominence in the side increased, before a meteoric rise in his betweenness centrality in 2014/15 – the last season Xavi played in. The following season both Iniesta and Messi had stepped up in order to fill the hole which Xavier Hernández had left behind.

So those are some centrality measures and how they could be used to measure the importance of a player in a particular passing role… but what about actual player influence beyond only passing? How much of an impact a player makes? Can we measure that too, and if so, how?

**Expected Force**

In order to come up with a metric for measuring player influence, we can take some inspiration from something which seems completely irrelevant: epidemiology, or the science of the spread of disease. Imagine a city, made up of several districts connected by a variety of roads and highways. Now if one district becomes infected with a disease, the question becomes how much of an impact does that have on the rest of the city? That in turn basically boils down to the question of the how connected the district is to other districts via its immediate neighbours. As such, a metric called the *“expected force”* is derived which refers to the expected value of the force of disease transmission (measured by node degree) of all local clusters surrounding an infected node in the network.

Let’s try to translate this concept into football and see if we can find a way to measure Messi’s impact at Barcelona using the following example:

First, since we now want to consider more than just passing behaviour, we add a node in the network labelled as the “Goal”, and connections are made to the goal by taking shots or scoring goals. Hence, similar to the “closeness” between players outlined before, a player and the goal become “closer” as that player takes more shots or scores more goals. More importantly, we can now measure each player’s “distance” to the goal as the length of the shortest path in the network from the player to the goal.

Next, if we want to measure the impact Messi has, let’s first look at who his immediate neighbours are, i.e. who he passes to the most. In the example above, that would be Iniesta, Rakitic and Suarez. Then we can create a pairing of Messi and each of his neighbours to examine the combined effect they would have, both in terms of their degree of connection in the pass network (*yellow lines*) and their “distance” to the goal (*blue lines*).

The *expected force* then becomes the average effect of all the pairings, weighed by the probability that each pairing occurs. More formally:

Where *xF(i)* is the expected force for player *i*, *C* is the total number of local player clusters around player *i* and for every cluster: *pc* is the probability the cluster forms and *fc* is the normalized force of the cluster (measured as the product of the cluster degree and inverted distance to goal). The logarithm factor at the end helps to account for the different number of clusters formed by each player, so a player with fewer neighbours has a lower impact than one with a higher number of neighbours in the network.

Now that we have a way to measure player impact in a team, let’s see how Messi performs with this metric over his career at Barcelona:

Understandably enough, Messi shows a small ramp up period at the start of his career, however he quickly climbs up the ranks at Barcelona and from 2008 onwards he* consistently had the highest impact in the side* in relation to both midfielders and forwards. We also notice the influence Eto’o had between 2004-2008 and how there’s a similar trend between Eto’o and Messi as there was between Xavi and Messi with the betweenness centrality.

Ultimately, while goal and assist tallies already give a sense of the tremendous player that Lionel Messi is, once we analyze his importance in Barcelona’s passing via network centralities and capture his overall impact with a metric like expected force, the magnitude of Messi’s influence becomes even more vivid and impressive than before.

I enjoyed reading this. I like the fact that you’re using this to analyze previous performance and quantify it. Are you also planning on working towards predicting this for the future? Say, using the collected data from other players to select best potential teammates.

LikeLike

Glad you enjoyed it! And prediction is definitely in the works! Have a brief outline of that project under the "Work" tab called "Hypernetworks for Team Selection"… hopefully I’ll be able to write a post about that soon!

LikeLike