Analytics, as in every other industry, started being a powerful tool in sports analytics. Fueled by the sabetmetrics revolution first in baseball and then in other sports, data driven decision making in sports is finally here and (hopefully) not going anywhere soon. Although it was late to the party, football too has finally started receiving its fair share of analytics work provided on both professional and personal level.
There is a growing football analytics community on Twitter and many excellent websites which focus on the statistical aspects of the lovely game we are all in love with. Here at Row-Z Report, I will take a shot (pun intended) at being an additional source for the analytics football community. Let’s hope for the best and get started!
Recently, I had the opportunity to gain access to one of the richest datasets in football, provided by Stratagem. Unless stated otherwise, throughout this post (and in others) all data mentioned is provided by Stratagem.
Starting from the 2016-17 season, this rich dataset contains every shot in every match for 20+ leagues, including information such as team, player, time, assister of the shot. And then some not-so-typical-yet-excellent pieces of data: exact (x,y) coordinates for the shot (and assist, if relevant), defensive pressure, shot quality and chance rating (more on these later), and many more.
Location is probably the most important factor for determining a shot’s outcome. In almost all expected goals models, some form of shot location is used as a major input. It all makes sense: controlled for everything else, a shot from in front of the goal is much more likely to find the net, than a shot, say, from 30 meters.
Below, is an image of 20000+ shots taken in Italy Serie A, Netherlands Eredivisie and Turkish Super League in 2016-17:
Yes, dark blue dots are the goals and that little rectangle on right is the goal. There is no denying the power of data visualization – especially when we see 20,000+ shots at once.
Now, let’s take a look at the following set of images related to selected shot distances to the goal. (1)
Nice to see our hunches were in fact true. The distribution of dark blue dots (goals) and light blue dots (misses) on the pitch is just as one would expect: up to 10 meters, the shot map shows (almost) nothing but dark blue. In the arc created between 20 and 30 meters, they seldom appear.
Let’s end this first (and probably too long) post with the right-most image. With regard to the shot location and distances above, I divided the football pitch into smaller areas (rectangles) and calculated the likelihood of a goal in each shot zone. Darker color denotes a higher chance, whereas the larger size denotes more shots taken in that zone.
Evidence from 20,000+ shots shows that 1) shots in the 6-yard box have the highest chance and 2) more shots are taken around the penalty spot than any other location.
Shot zones feel like a good starting point for building an expected goal model, to which I hope to come in next weeks.
Until next time!
This article was written with the aid of StrataData, which is property of Stratagem Technologies. StrataData powers the StrataBet Sports Trading Platform, in addition to StrataBet Premium Recommendations.
Follow us on Twitter. @rowzreport
(1) Shot distance is defined as straight line distance to the center of the goal.