Biased Data
Data inflation in Football
Football has seen a tremendous increase in the use of data in recent years. With the advancement of technology, the way football is played, managed, and consumed changes at an ever-accelerating pace. And when things change fast, markets necessarily adapt, often badly. In principle, Big Data offers the potential to take the game apart into ever smaller bits and pieces and hence gather valuable information for transfer decision-making, training, and in-game coaching. But it is crucial to understand that data is not a panacea. The wrong use of data can be easily misleading and cause bad decisions.
One of the major factors for the mishandling is that data feels as if it is objective and therefore appears to be ideal. Such a Utopian Big Data vision, however, is a superficial simplification. Data selection and usage can be highly subjective. Fundamentally, we have to differentiate between data and information. Data itself can be pure noise. More data means more noise that consequently hides the signal.
In this article, we will highlight such challenges in data analysis and football.
Why is subjectivity still a part of Data?
Why isn’t all Data “equal”?
And finally, why we “believe” that Goalimpact offers a lot of value to football managers.
You cannot get rid of subjectivity
Humans are not raw data processors. To see the “world,” we must have an interpretation, a story, mainly based on our underlying motivations. [Podcast] Therefore, fundamentally, we do not see objects; we see tools. We see the world as a place to act in. Of course, the scientific method has stripped subjectivity away from the world. However, in everyday life and from an evolutionary perspective, we still perceive objects as tools and do not see them as material facts without meaning. Therefore, implicit in the usage of data is the question:” What does the data mean?, How should it be utilized?” or “What information is contained in the data, and what do we learn from that?”
These questions point to the reality that interpretation is an inescapable part of data usage, perhaps even especially in football, since sports, in general, is an extremely emotional enterprise and emotions cloud rational analysis. Subjectivity is thus implicit in the data analysis process, often unconsciously.
For example, two different football analysts may interpret the same data differently based on their own biases, experiences, and understanding of the sport. This subjectivity in interpretation is a primary reason why big data in football cannot be considered objective. But not only can subjectivity lead to different conclusions based on the same data set. It could also lead to too similar (perhaps wrong) conclusions.
Why? Because everybody falls for the same dominant narrative (as if there is a gravitational force for narrative and hence a Pareto effect underlying how social narratives form). In football, for example. “a keeper with a high save percentage must be a good keeper”- It could also mean they take too few risks leaving the line and preventing shots from happening in the first place, which is what someone like Manuel Neuer is insanely good at.
The data hierarchy
But more importantly, different data sets have different qualities.
Not all information is valuable, and it is up to the analyst to determine which data is crucial and which is not. Data has thus first to be selected and prioritized before it can be analyzed (“Feature Selection”). This selection process is mainly subjective (usually it is left to Machine Learning) because the analyst must judge which data is qualitatively better. Only because data is gathered, it does not solve the problem of what kind of data sets are more valuable.
(We will discuss below why Goalimpact is, in our maybe not-so-humble opinion, a more qualitative decision-making tool for transfers than many other data analyses.)
Ideally, data is prioritized based on what is valuable. There is a hierarchy of data value, and every analyst must address this problem. This, however, is especially tricky in football. Why?
Because a football game is a complex system, and complex systems are causally opaque - meaning we don’t know (in an absolute sense) what kind of effects specific actions have. In complex systems, causality hides in the web of interdependence.
Let me explain.
Butterfly effects in football
Suppose we have different descriptive data about a football player's performance, including shots taken, goals scored, dribbles completed, tackles made and won, and interceptions. How do you know which of these data sets is more relevant than the other? You can't because Football is a complex system. Hence you may be able to identify factors correlating with others, but that does not mean that one is instrumental in causing the other.
Therefore, in a football game, the interdependence of various factors makes it especially difficult to value the data with respect to its information content. Most data, as shown in the example above, is descriptive. The consequences that should be derived from the data are not at all self-evident. Furthermore, such data sets may only provide a snapshot of the game (or a player), but they cannot provide enough context or insight into the underlying factors that influence the game's outcome.
The idea of the butterfly effect can be applied to football to illustrate this point. Just as the flap of a butterfly's wings in one part of the world can result in a change in the weather in another part of the world (actually, that is not true; The scientific paper rather artistically asked if a Butterfly flap can change the weather anywhere. However the general principle still applies.) so the smallest action by a player in one part of the field can have a cascading impact on the outcome of a situation and the game. And such an action can be even (and indeed is more probable) without the ball and is thus potentially ignored by the data-capturing mechanism. Think of, for example, the communication between the players. This means that a player's performance cannot be accurately evaluated based on cherry-picked data, as it ignores the web of interactions that impact a game. Descriptive statistics entirely miss such complex influences.
Consider a player who loses the ball with a pass to the opponent but whose team still manages to score within the next 10 seconds. The player's action of losing the ball may seem negative, but because of the team's gegenpressing tactics, they quickly win back the ball and score. Descriptive statistics, such as passing ratio, would not accurately reflect the player's impact on the game in this scenario.
This example, however, is too superficial to comprehend the notion of complexity itself. In complex systems, knowledge reaches epistemological limits. Thus, complex systems teach humility because we cannot know why things happen, at least not in an absolute sense. Knowledge slips through the web of explanation like water slips through a net. Consequently, relying on statistics to evaluate a player is potentially deceitful, as it ignores the underlying complexity of the sport in which explanations hide in the web of interdependence.
This meta-problem is often ignored in football, and hence we are already beginning to see a data inflation in football. More & more data - less & less valuable information.
A Bias-Free Approach
Goalimpact is rigorous in that regard.
We do not claim to entangle the causalities in the web of football. Our philosophy is to provide a basis for bias-free decision-making for player transfers. We stop to look for explanatory models, even though we know humans do it naturally. We can analyze the data without hypotheses and thus make objective statements about the quality of a player. We aspire thus to solve the problem of subjective interpretation in scouting.
We acknowledge the fact that we cannot know why a player is good;
but we appreciate that we can define and measure good.
What I find personally so attractive about this approach is that it reflects the foundation of the West based on its Greek origin.
Socrates was wise because he was epistemologically humble: “I know that I know nothing". His student Plato made a case for an objective definition of The Good (although his idea was focused on ethics). And the father of empirical measurement and science was Aristoteles. Goalimpact merges these three philosophical giants into our player rating system.
How?
Our algorithm evaluates a player's performance by measuring the influence of a player on the goal difference independent of how he contributes to the goal difference. Thus off-ball actions, as explained above, are implicit in our rating. This approach considers the player's impact on the game's score (the only thing that objectively matters) rather than just describing a player and inferring quality based on descriptive stats. (Technically, our name is, therefore, Goal-difference-impact, but that does not sound as catchy as Goalimpact). Our algorithm is also corrected for factors such as the home field advantage, the opposition's strength, red cards, and levels of exhaustion.
Therefore, the Goalimpact sits high in the data value pyramid.
For quality data needs to fulfill the following criteria:
Relevance:
it needs to tell something about the decision to be made. If a metric does not correlate with the win percentage, it is only second-order relevant for scouting.
Accessibility:
information needs to be presented in a form that makes it useable in decision-making. If the data is too multi-dimensional or raw, it may not be accessible to the decision maker
Predictive:
information needs to tell you something about the period in which your decision is implemented (i.e. the time after the transfer). It is not enough to present how a player played last season, if the real question is how they will play the next 4 seasons.
All that is implicit in the Goalimpact rating and this is the main reason, why Goalimpact contains a lot of value.
It is now on us to communicate the value to the clubs…#
What is Goalimpact?
Goalimpact measures the influence of a player on the goal difference.
It is thus objective player rating system and a risk management tool for signing football players.
For more about Goalimpact, visit our homepage here or call us online right away to explore how we can help your club!
Nice post. Honest objectivity is difficult to find.
https://dweversole.substack.com/p/when-risk-is-an-imaginary-stop-sign
It also reminded me of a quote by Daniel Khaneman,
"*As the 'what you see is all there is.' rule implies, neither the quantity or the quality of the evidence counts for much in subjective confidence. The confidence that individuals have in their beliefs depends mostly on the quality of the story they can tell about what they see, even if they see little. We often fail to allow for the possibility that evidence that should be critical to our judgement is missing.