Alicia Woods and an attempt to classify midfielder roles using data

Daryl Dao
19 min readJan 19, 2025

Yes, I am procrastinating again! I have a big idea/project that I want to finish yet I keep getting distracted by other ideas and I cannot seem to focus on seeing out that big idea that I have. And this is one of those smaller ideas that is getting me distracted from finishing the team analysis of the Brisbane Roar women in the A-League. I will stop rambling now, but I just want to say that this is actually sorta related to that analysis piece and the ongoing Moneyball on Football Manager series that I am writing on the side. Let’s get into the problem.

Problem

Alicia Woods is a young, talented midfielder who is at the heart of the Brisbane Roar women’s 3–2–4–1 in possession system. She is good, like very, very good, and I was blown away by her quality when I watched her played at home against Canberra United in the league. I knew straight away that she was a key player for the Roar alongside Sharn Freier, who I have done a data dashboard on her performance in the 2023/24 A-League Women season. Here is the dashboard for anyone curious, though I think it looks quite rough because I just put every single visualisations on the same plot and did not do much more to it.

Sharn Freier’s data dashboard as a winger in the 2023/24 A-League Women season

Back to Woods and me watching her performance against Canberra United live. This is going to be a small spoiler for the analysis piece, but at this point I do not even know when I will get that out, so I might as well use it here. In build-up, Roar preferred to play with a 3+1/3+2 shape with left-back Chelsea Blissett stayed deep to form a back three while the double pivot of Momo Hayashi and Woods tended to be positioned just behind the opposition’s forward line, which put both midfielders in a good position to receive the ball in between the lines and progress the ball forward.

This is where Woods thrived, because she was good at receiving the ball in tight space, then quickly made a turn to face forward and find teammates who are attacking the final third for a through ball. She was good technically to keep the ball under pressure while her passing range allowed her to find teammates like Sharn Freier on the left or Deborah-Anne De La Harpe on the right to stretch the opposition’s defensive shape wider and created more space in between the lines and channels. She could also make through balls into those opened channels to encourage Laini Freier or Tameka Yallop to make a run into those areas, which became a slight problem as it was an easy way to enter the penalty box while it also took one of the full-backs out of the opposition’s backline.

Yes, I have close-up pictures now! No more TV footage!

Immediately my mind linked her to a playmaker role, but watching her more gave me a bit of a dilemma. Woods was also good at carrying the ball forward and she could get past the opposition’s defenders quite easily. So…is she a ball-carrier then? But she also made tackles inside of the middle third and final third to help the team regain possession quickly while also proved to be quite aggressive out of possession with her challenges. So…is she also a ball-winner? Woods cannot be a complete package, right?

I made a data dashboard about her performance this season (originally still intended for the analysis piece!) and it sort of confirmed my observation. Yes, Woods is a complete package. She excels at almost everything except for goal-scoring when compared to other midfielders in the league who has played quite regularly for their respective team. As expected, her chance creation and passing numbers stand out because she is a playmaker, but her defending numbers are a huge outlier because she makes the most tackles in the middle third per 90s with 3.15 while she is also among the midfielders that make the most interceptions per 90s.

Alicia Woods’ data dashboard as a playmaker after 10 rounds of the 2024/25 A-League Women season

With this dashboard, I naturally grouped her into the playmaker role because it seems like she thrives in that role. But that still leaves me with a question, which role is Woods actually playing for Roar Women? I will be remixing the work of The Athletic’s Mark Carey and BiscuitChaserFC, who both used Principal Component Analysis to identify player roles for midfielders in the top 5 European league and the second divisions of those top leagues respectively. Here, I will be using FBRef ‘s A-League Women data after 10 matches to try and answer my question.

Explanations

I will not bore you with the technical details of Principal Component Analysis (or PCA from now on), but I will try my best to keep my explanation simple. PCA is a machine learning method that aims to find the attributes that best describe a dataset. Because there can be a lot of noise and dirty data in a dataset, PCA steps in to highlight data that says a lot about that dataset and takes the spotlight away from data that gives little to very little value to the analysis.

PCA is important for what I am doing because FBRef provides me with about 40–50 advanced metrics, and some of those metrics are definitely not suitable for identifying midfielder roles. Some of the more obvious ones are save percentage, clean sheets, or goal kicks attempted. I can filter those out easily because they are obviously not used to describe midfielders, duh. But the more dubious ones are goals scored, aerial duels, or tackles attempted, because you can make a case for each of those dubious one to be included in different roles. For example, there are different types of the #10/central attacking midfielder, and goals scored can be important if you want to evaluate a second striker or an attacking midfielder who usually rotates with the striker and make runs into the penalty box more often. So this is where PCA will step in to say which stats are actually important for what role, and which ones are not that important.

Mark and BiscuitChaserFC narrowed the number of roles down to just 5: creator, engine, playmaker, ball carrier, and defensive-minded. While I will somewhat follow these roles, I will also make some changes and introduce a couple of new roles based on the midfield roles in Football Manager while also taking into account the data limitations.

  • A creator and playmaker feels a bit too close in definition for me because both roles can overlap each other. Instead, I will group both roles together under playmaker while introduce a new role called progressor for a midfielder who tends to dictate the play from deep. This role does not necessary focus on creating chances like both a creator and a playmaker normally would, but rather this role tends to receive the ball from deep, makes passes to progress the team’s play, and involves more into the team’s build-up play, which usually takes place inside of the team’s own half or in the middle third. In Football Manager, a progressor would be closer to a Deep-Lying Playmaker or a Half Back.
  • I am slightly fine with the engine role, but this role would be quite close to the progressor based on the examples that both Mark and BiscuitChaserFC gave. So I will change the definition of this role to be closer to a Box-to-Box midfielder. The engine is now literally an engine who works tirelessly on and off the ball and appears everywhere on the pitch. Like the saying goes:

70% of the world is covered by water, the remaining 30% is covered by <insert midfielder name here>.

The caveat for this role is that there is no data to measure off-ball runs because I believe FBRef’s data are just event-level data, not tracking data, so it is impossible to measure off-ball runs for this role. Instead, this role will focus more on location-based actions like tackles in the middle third, touches in the final third, etc.

  • A ball carrier also seems fine for me because there will be players who prefers to dribble or carry the ball from the central of the pitch. It is still a useful role because a ball carrier can bring the team forward and open up spaces in the channels or in between the lines with his/her technical ability and dribbles, while also pushing the opposition closer to their own goal (I am taking this from American football, where the main goal is to get the ball as close to the opposition’s red zone as possible for a touchdown).
  • I will expand the defensive-minded role to two smaller roles, which are reader and aggressor. Readers are midfielders who are less involved in challenges and do not make tackles that often, instead they rely on their ability to read the game and position themselves well to intercept and block passes. Aggressors, on the other hand, love throwing themselves into challenges and will make tackles whenever possible, which can resulted in too many fouls committed and many cards received.
  • I am tempted to also introduce a second striker role, which is where a midfielder is literally a goal-scorer from midfield. But I do not think this role is significant enough and needs a lot of metrics to back it up. I think it only needs goal-scoring metrics like non-penalty xG, xG difference, and non-penalty goals scored.

To recap, these will be the roles that I will attempt to classify: playmaker, progressor, engine, ball carrier, reader, and aggressor. That is a total of seven roles just for midfielders, which shows how much football has involved in the modern days that a midfielder can take on so many different roles at the same time.

Analysis

The dataset that I will be using is definitely a lot smaller than the datasets that were used by Mark and BiscuitChaserFC as they focused on five leagues in total and had a few hundred midfielders to evaluate from. After classifying player positions using data from Soccerdonna ( because FBRef cannot be bothered to do so!) and retaining defensive, central, and central attacking midfielders with 3 90s (minutes played/90) or more after 10 rounds of the 2024/25 A-League Women, I am left with…47 midfielders. That is not a lot, is it? But I will make due with what I have.

Like Mark and BiscuitChaserFC, I will also use metrics that highlight a player’s style more than their end products because it somewhat does not fit the purpose, which is classifying roles, and not measuring efficiency. I will also use possession-adjusted defensive stats using the definition from Wyscout to make sure every teams and players are on the same playing field and their data are not skewed by the fact that one team has to defend more than others. The data have already been per 90 adjusted by FBRef, and I will also normalised and z-scored the dataset to remove the outliers or the unusual data points.

Table of metrics used for PCA

This table might contain a lot of metrics, because it sort of is (29 metrics in total) and I have narrowed this list down from the initial 41 through testing and multiple iterations of running the model. I want to be as detailed as possible but I do expect PCA to ignore most of the metrics listed and still only highlight the ones that are important for each role. As mentioned earlier, there are also a couple of roles that I have to rely on location-based actions instead of using tracking data to quantify things like pressures or work rate, so it is important to include metrics that contain location information to the list.

Results

It took a bit of tweaking, clustering, and getting frustrated as to why the results do not look like what I expect them to be, but I have mostly managed to fit most of the metrics into specific roles. However, this will be the theme for the rest of the analysis and results, but I do not feel very happy about the results that I have here, mainly because there are irrelevant metrics being seen as positively contributed to a role that does not need it, or negatively contributed even though it is supposed to be one of the more important ones.

Take the ball-winner role for example. Even though the model has done a good job to highlight that the defensive metrics contribute positively to the role and some passing metrics do not, it also grouped metrics like attempted take-ons or progressive passes received into the role, which some might argue that a ball-winner does not need to dribble or receive progressive passes that much since they tend to pass the ball to a more creative midfielder after regaining it.

There are other examples as well, and I think you might be able to pick out a few. But for the most part, most of the metrics for each role paint a relatively good picture of what is looked for from each role.

Progressor

Progressors are midfielders who like to get the whole team forward either through dribbles or passes, and will aim to get the ball into the final third or the penalty box in any way possible. Some of the standout metrics for progressors include progressive carries or attempted take-ons to highlight a player’s dribbling ability, progressive passes or passes into the final third to highlight their passing ability, and touches inside the defensive & middle third to highlight their tendency of operating and progressing the ball from deep.

For each position, I will also look at the top 5 or 10 players’ raw data and compare their data to this percentile rankings to see if the model has identified players correctly or not. With the top progressors like Melbourne Victory’s Alana Jancevski, Melbourne City’s Lourdes Bosch, Central Coast Mariners’ duo Peta Trimis and Isabel Gomez, and Newcastle Jets’ Sophie Hoban, they all share common standout metrics like attempted take-ons, progressive carries, successful take-ons, progressive passes received, and touches inside the opposition’s penalty box. This surprises me a bit because this highlights the fact that some of the top players who are identified as progressors are wingers or attacking midfielders who like to dribble a lot. Their pizza charts (I am using FBCharts’ plugin for FBRef because I am too lazy to create 30 to 40 pizzas for this article) are also quite identical to each other as well if you classify them as wingers.

I should not be that surprised because I assume a lot of the top minds of football analytics might already know that PCA is the better tool to find similar players instead of comparing them by raw data and produce overall ratings. There might have been some works done on that already, and after a quick Google search, another of Mark Carey ‘s works comes up once more!

Besides from wingers with a high volume of progressive carries and take-ons (which will be examined in just a bit), there are also midfielders who make plenty of progressive passes or even a combination of the two like Brisbane Roar’s duo Alicia Woods and Tameka Yallop, along with Sydney’s Mackenzie Hawkesby and Adelaide United’s Emily Condon. This is where the progressor role differs from a traditional ball carrier because this role focuses on the ability to progress the ball and does not bother about whether that progressive attempt is made through a pass or a dribble, thus the model also identifies playmakers with good progressive numbers as progressors.

Ball carrier

Similar to progressors, ball carriers also like to dribble with the ball either to wide spaces or down the central area, with the aim of bringing the whole team forward and drag opposition’s defenders out of their positions. Some of the standout metrics for a ball carrier include attempted take-ons and carries to highlight one’s tendency to dribble with the ball, passes/progressive passes received to highlight the fact that they tend to be the focal passing point from build-up play to help the team transition to attack,, and through balls and passes into the penalty box to highlight their value and effectiveness in creating chances after their dribbles.

And not to my surprise, I continue to find Peta Trimis, Alana Jancevski, and Lourdes Bosch sitting at the top of the suitability ranking for ball carrier. There is a slight overlap between ball carriers and progressors, which explain why the three along with Sophie Hoban are ranked highly for both roles. But where both roles differ from each other are at the high volume of dribbles and the low volume of passes made by a ball carrier when compared to a progressor. A ball carrier can be closely identified with a traditional winger or even a mezzala, someone who likes to dribble or carry the ball into the half spaces, thus the role highlights players like Western Sydney Wanderers’ Sienna Saveska, Trimis, Jancevski, Bosch, and Hoban since they are currently playing in a similar role for their respective teams.

Engine

I prefer to see an engine as an all-rounded midfielder who is good at doing most or all things because they are the heartbeat of the team and they tend to run the play in midfield or they will appear almost everywhere on the pitch to get involved in almost every challenge possible. Some of the standout metrics for an engine is location-based touches to highlight their involvement across the whole pitch, location-based tackles to show their tendency to get involved in challenges, and carries along with attempted take-ons to highlight their ability to carry the ball forward for the team.

Through the dashboard from the beginning, it has pretty much confirmed that Woods is an all-rounded midfielder, and Woods showing up at the top of the engine role just adds more confirmation to that observation. Woods is accompanied by her midfield partner Momo Hayashi, who, based on my observations from the Canberra United match and a couple of previous Roar Women matches, also excels in different aspects like ball-winning, progressing the ball, and scoring goals either from set pieces or long shots.

This is also where more well-known names to the league like Melbourne Victory’s Alex Chidiac and Newcastle Jets’ captain Cassidy Davis pops up as they are, quite literally, the engine for their respective team. They have played in such role for many A-League Women seasons and have excelled in that role. Other players like Western Sydney Wanderers’ duo Talia Younis and Amy Chessari feel like misidentified cases for me after checking their raw data. But Younis’ numbers for goal-scoring, chance creation, and ball carrying are quite decent compared to the rest of the league, while FBRef suggests that one of Chessari’s similar players is Alicia Woods (using data from the last 365 days) so I could be wrong here.

Playmaker

Because the creator and playmaker roles have been grouped under the same umbrella, the playmaker now will also include chance creation along with the usual responsibilities of dictate the play in midfield and distribute the ball to offensive players. Along with that, a playmaker can also take a few goal-scoring chances to themselves and make a few shots towards the opposition’s goal. Some of the standout metrics for this role is key passes or passes into the penalty box to highlight their ability to create chances, touches in the middle third to highlight the area where they tend to operate, and passing-related metrics to highlight their playmaking ability.

This is where chance creators like Newcastle Jets’ Libby Copus-Brown, Canberra United’s Mary Stanic-Floody, or deep operators like Canberra United’s Darcey Malone or Central Coast Mariners’ Taylor Ray are highlighted for their chance creation or passing from deep. It is also where players who have played as a centre-back and as a defensive/central midfielder gets identified at the top, which is slightly strange for me, because players like Taylor Ray and Darcey Malone do not have high passing numbers, but their defensive numbers are quite good when compared to other midfielders in the league.

I have also noticed that, even though the defensive metrics have been adjusted for possession, a lot of players from the bottom half teams like Newcastle Jets or Canberra United are at the top of the list, and players from Brisbane Roar, Melbourne City, or Melbourne Victory are in the middle or at the bottom. This could be because I applied the wrong calculation when adjusting by possession because Jets are among the top teams in the A-League Women for possession percentage (54.7%), but the same cannot be said for Canberra United (44.3%). My knowledge of the ALW is not that good as well, so I cannot say if Jets or Canberra are defensive-oriented teams either. Overall, the rankings for playmaker does not sit too well with me.

Readers

Readers are defensive-minded midfielders who do not get involved in challenges that often, as they rely on their reading of the game to position themselves at the right place and appear at the right time to smartly regain possession for their team. This role looks for players with high ball recoveries, clearances, and interceptions to highlight their passive defending trait, and also long passes attempted as an indication of their willingness to play the ball far away from their team’s penalty box after regaining possession.

Perth Glory’s Isobel Dalton stands out for this role since Dalton is ranked 93rd for interceptions and 72nd for ball recoveries while only ranked 66th for tackles in the defensive third in the league this season. This highlights that Dalton is more passive defending than front-foot defending, which is getting stuck into challenges and shows more signs of being an aggressor. FBRef’s similar player suggestions from Dalton’s data also follows the top rankings for this role quite closely, which includes Brisbane Roar’s Momo Hayashi, Newcastle Jets’ Cassidy Davis, Western United’s Chloe Logarzo, and Melbourne Victory’s Alex Chidiac in order of FBRef’s similar players list.

One player that I slightly do not agree with this ranking is Brisbane Roar’s Laini Freier since she is an attacking midfielder and it does not seem like her defending numbers stand out even when compared to just Roar players. This might be another case of misidentification due to other irrelevant metrics like shots attempted, crosses, and key passes being grouped for this role, and metrics where Laini stands out for.

Ball winner

Slightly contrary to a reader, a ball winner loves getting stuck into challenges and making tackles to regain possession for their team. They are naturally strong, which gives them an edge in challenges, while they are also aggressive to come out on top. Some of the standout metrics for a ball winner includes tackles to highlight their tendency to get involved in challenges to win the ball, challenges attempted, interceptions, and ball recoveries to highlight their defensive responsibilities.

This might be another role where I do not agree with some of the identifications. Because even though the model has correctly identified Western Sydney Wanders’ Ena Harada or Western United’s Sara Eggesvik as highly suitable for the ball winner role, other players like Melbourne City’s Laura Hughes or Perth Glory’s Miku Sunaga are completely misidentified since, even after adjusted for possession, their defensive numbers are nowhere near close to the likes of Woods, Mariners’ Isabel Gomez, or Wellington Phoenix’s Maya McCutcheon.

It is frustrating because I thought this would be the easiest role to identify since it is very defensive-focused and, naturally, players with good defensive numbers would be classified as ball-winners. But, based on the metrics list, it seems like the model also puts players with high ball carrying numbers high up on this list instead of putting them closer to the bottom. I still choose to include this ranking here because the list is not a total write-off, but these rankings should be taken with high consideration.

Recap

I set out to find an answer to my question of which role is Alicia Woods currently playing for Brisbane Roar Women, and I came back with some frustration and somewhat of an answer. Like the dashboard at the beginning suggests, Woods is a complete package who is an all-rounded midfielder and is most likely an engine in Roar’s midfield.

Alicia Woods’ role suitability rankings

A machine learning technique (not a model!) seems to also confirm that observation, but it also comes back to me with a lot of misidentified cases that I will have to look further into. By no means are the results that I have presented here are perfect, and I still say that use live observations from matches and raw data to confirm these results. It is also not the technique’s fault that it comes out with wrong results since it is most likely because I might have missed some steps in between inputting the raw data into the model and getting these results, like adding weighting for important metrics for each role or reducing the number of metrics that PCA has to consider.

Still, the results are, at least, promising and will provide me with a foundation to build upon for future works. And like Mark and, potentially, others in the football analytics sphere have said, PCA can be a good technique for player recruitment to identify players that suit the playing style of a team and reduce the chance of getting the wrong players, which can only negatively affect the club’s business. As I have also found out while analysing the results, PCA can also be used to identify similar players as many players whose profiles and data are similar do get grouped with each other for most of the roles above.

This might be one of the few rare occasions where I post a technical-heavy article on here, but this is a topic that I want to explore further and give it a try myself, having read the great work that Mark Carey and others have done. I will return to the regularly-scheduled Notebook very soon as I would love to get back into watching matches and do live analysis, now that I am freshly back from a small break after the ASEAN Cup.

Hope you’ve enjoyed this small deep dive with me, and any suggestions on how I can improve the accuracy of my results are appreciated, as always!

Originally published at https://www.talking-tactics.com.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Daryl Dao
Daryl Dao

Written by Daryl Dao

I write...sometimes. Now I write about things that I like in the world of football. I write more often here: https://talking-tactics.com

No responses yet

Write a response