Biology of Sport
eISSN: 2083-1862
ISSN: 0860-021X
Biology of Sport
Current Issue Manuscripts accepted About the journal Editorial board Abstracting and indexing Archive Ethical standards and procedures Contact Instructions for authors Journal's Reviewers Special Information
Editorial System
Submit your Manuscript
SCImago Journal & Country Rank
1/2026
vol. 43
 
Share:
Share:
Original paper

Key performance indicators of offensive transitions in elite women’s football: a machine learning and explainability approach

Claudio A. Casal
1
,
José Luis Losada
2
,
Ana M. de Benito Trigueros
3
,
Rubén Maneiro
4
,
Iyán Iván-Baragaño
5

  1. Faculty of Physical Education & Sport Sciences, Catholic University of Valencia, San Vicente Mártir, Valencia, Spain
  2. Department of Social Psychology and Quantitative Psychology, University of Barcelona, Barcelona, Spain
  3. HeQoL Research Group. Department of Physical and Sport Education, University of León, León, Spain
  4. Faculty of Education and Sport, University of Vigo, Vigo, Spain
  5. Department of Sports Sciences, Faculty of Medicine, Health and Sports, European University of Madrid, Madrid, Spain
Biol Sport. 2026; 43: 53–64
Online publish date: 2025/08/05
Article file
- 5_04966_Article.pdf  [1.09 MB]
Get citation
 
PlumX metrics:
 

INTRODUCTION

Interest in women’s football is experiencing unprecedented growth. According to the latest FIFA report [1], the number of players participating in organised football has increased by 24% over the past four years, reaching 16 million worldwide [1]. In certain cases, such as Spain, this growth has been even more remarkable, with the number of registered female players doubling over the same period [2] and, for the first time in history, exceeding 100,000 federated licences. Moreover, the media, economic, and sporting impact of the World Cup has played a pivotal role in this expansion, generating an economic impact of $1.32 billion in Australia alone [3].

In the scientific domain, while women’s football has been explored from various perspectives – including those of players, fans, and sports organisations [4] – a significant research gap remains in studies focused exclusively on the women’s game [5]. Some authors have highlighted the limited attention and information available regarding key determinants in women’s football, such as match analysis [6]. However, although research in this area began later than in the men’s game – particularly in fields such as match analysis [7]-, academic publications have nonetheless achieved significant impact within the research community.

Technical and tactical performance analysis has evolved significantly since 2020, driven by the increasing availability of data from various technological sources. A key example is the FIFA’s use of optical tracking, which has enhanced understanding of the relationship between team level and conditional performance [8, 9], as well as its application in tactical analysis through deep learning techniques [10]. Similarly, access to event data from high-level women’s competitions has facilitated studies such as that of Trower et al. [11], which identified 10 distinct player profiles based on the analysis of recorded events across five consecutive seasons of the Women’s Super League (WSL). Likewise, Narayanan & Pifer [12] conducted a comparative study of the U.S. women’s national team and its rivals using data provided by Statsbomb [https://statsbomb.com/es/]. Their findings indicated that the technical performance of the American team exceeded that of its opponents in key indicators such as shots on target and successful dribbles.

At the collective level, various investigations have examined the relationship between tactical indicators and team success – both globally and partially (i.e., based on match outcomes or possession outcomes, respectively) – as well as their association with game strategies. A notable example is the study of Iván-Baragaño et al. [13], which developed a multivariate model combining tactical indicators to estimate the probability of success in a possession. Regarding the contextual variables, Harkness-Armstrong et al. [14] analysed approximately 200 youth players in England and found that the phase of play (in-possession or out-of-possession), in combination with match status, significantly modified players’ conditional demands. Likewise, match status was identified as a key factor influencing team’s offensive strategies during the FIFA Women’s World Cup 2015 (FWWC, 2015) [15]. However, a subsequent study on the FWWC 2019 found that teams maintained consistent offensive strategies and success rates in their attacking play, regardless of match status [16].

On the other hand, studies such as that of Martínez-Hernández et al. [17], which used the WSL as a sample, analysed movement patterns preceding conceded goals, identifying the most frequent as linear advancing motion, deceleration, or turning, among others. In any case, it is evident that the two competitions that have generated the greatest research interest regarding technical-tactical indicators in women’s football have been FWWC 2019 and 2023. In relation to these tournaments, studies such as both conducted by Bradley [8, 9] provided a detailed analysis of conditional demands based on both player position and team characteristics. Their findings highlighted differences in physical demands according to positional roles and the quality of opposition, aligning with evidence presented a decade earlier by Hewitt et al. [18].

Other studies have focused on evaluating the influence of match outcomes and tournament stages in the FWWC 2023 [19], as well as comparing performance across confederations in the same edition. In this regard, Ju et al. [20] found that the teams from the Union of European Football Associations (UEFA) in the FWWC 2023 demonstrated superior technical performance in key metrics such as pass completion percentage. Meanwhile, variability in highintensity distances covered was greater among teams from the Confederation of African Football (CAF) compared to those from other confederations.

Finally, one of the primary challenges facing women’s football remains the disparity between domestic leagues [21], a factor that directly influences spectator interest in the sport. In this context, studies such as that of Casal et al. [22], which focus on analysing a single league, provide valuable insights into the game strategies employed within specific competition.

Regarding the development of predictive models capable of classifying event outcomes, a key challenge is class imbalance, particularly when dealing with low-frequency events. In football research, various studies have attempted to predict possession outcomes [23, 24] or the occurrence of injuries [25, 26], predominantly using binary classification models. In many cases, the event of interest has a low relative frequency, which negatively impacts the model’s ability to predict positive outputs (recall). This challenge arises from the difficulty algorithms face in identifying consistent patterns within the selected training features. To mitigate this limitation, oversampling techniques have been developed [27, 28], generating synthetic samples from the original dataset to enhance the model’s predictive capacity.

This research pursues three main objectives: (i) to identify the technical-tactical indicators associated with goals and shots during dynamic offensive transitions in elite women’s football; (ii) to train a supervised machine learning model, optimized through hyperparameter tuning, to predict possession outcomes; and (iii) to analyse the influence of each criterion on the model’s output using extrinsic explainability techniques.

To maximise the external validity of the findings, three international competitions – both at the club and national team levels – were analysed, as they are considered among the most prestigious tournaments worldwide. As a novel contribution, statistical and machine learning techniques – such as oversampling – were applied, which remain largely unexplored in the context of women’s football.

MATERIALS AND METHODS

Design and participants

This research work was developed within the framework of observational methodology [29], employing the following design: nomothetic, though the analysis of the performance of 3,610 dynamic offensive transitions executed by various teams; punctual intra-sessional, the dynamic offensive transitions corresponding to different matches, and multi-dimensional because we analysed the multiple dimensions that constituted the ad hoc instrument used, formed by a system of categories and field formats [30].

In order to control for situational variables that may influence teams’ tactical and strategic behaviours – such as opponent quality or match status – a total of 35 matches from the final stages of the FWWC 2023 (n = 1,535), UEFA Women’s Euro 2022 (n = 750) and UEFA Women’s Champions League 2023/24 (n = 1,325) were selected. Across these matches, 3,610 dynamic offensive transitions were analysed, following the exclusion of 119 transitions due to lack of observability. Match recordings were obtained from WyScout (www.hudl.com, s.f.) and analysed post-event. Matches were observed for the regular period (i.e. 90 min, excluding extra time). The recording of the information was carried out respecting the spontaneity of the players’ behaviour and in their natural environment. According to the Belmont Report [31], the use of public images for research purposes does not require informed consent or approval by an ethics committee.

Observation and recording instrument

The pillars on which the construction of the observation instrument has been based were the following: (i) a previous theoretical framework; (ii) criteria and categories collected empirically in other observational studies; (iii) and, finally, novel criteria that were tested in this work. The methodological steps implemented were as follows: First, the problem was identified, and a scientific group of experts was formed, composed of two academics (with PhDs in Physical Activity and Sport Sciences) and UEFA PRO coaches, with more than ten years of experience in observational methodology and performance analysis in football. After consulting the empirical evidence and based on the defined objectives, a first selection of the instrument’s tactical indicators was made, and a first exploratory post-event observation was carried out. Subsequently, after a discussion by the expert group, the instrument was readjusted, and another post-event observation was carried out. This process was repeated four times until finally creating the observation instrument ad hoc, called Transfootb (Table 1), consisting of a total of 18 criteria in which information is included regarding contextual variables, start, development and completion of the offensive phase of the observed team and how to execute the defensive transitions of the opposing team.

TABLE 1

Criteria, category and codes to observational instrument, Transfootb

CriteriaCategoryCode
Contextual variablesTournament (TN)FWWC 2023: FIFA World Women CupFWWC
Women’s EURO 2022EURO
UWCL 2023/24: UEFA Women Champions LeagueUWCL

Stage (ST)Round of 16R16
Quarter-finalQF
Semi-finalSF
FinalFF

Period of Match (T)0–15 Minutes: 0–15 minutes of the match time0–15
16–30 Minutes: 16–30 minutes of the match time16–30
31–45 Minutes: 31 minutes – half time31–45
46–60 Minutes: 46–60 minutes of the match time46–60
61–75 Minutes: 61–75 minutes of the match time61–75
76–90 Minutes: 76 minutes – full time76–90

Match Status (MS)Winning: The observed team has scored more goals than the opposing team at the moment of regaining possession of the ballWN
Drawing: The observed team has scored equal goals to the opposing team, or no goals had been scoredDR
Losing: The observed team has scored less goals than the opposing team at the moment of regaining possession of the ballLS

Match Outcome (MO)Win: The observed team has scored more goals than opponent and won the matchFW
Draw: The observed team has scored equal goals to opponent and draw the matchFD
Loss: The observed team has scored fewer goals than opponent and lost the matchFL

Dynamic offensive transitionBall Recovery (BR)Steal: A defending player prevents the ball passed by an opponent from reaching its intended receiver by contacting the ball and maintaining his team’s possession of the ballST
Duel: A defending player dispossesses an opponent of the ball through a physical challenge or defensive pressureDL
Turnover: A defending player collects the ball lost (via clearance or a missed pass) by the opposing teamTR
Goalkeeper Action: The goalkeeper recovers the ball after an opponent’s shot, cross, turnover, etc.GK

Recovery Zone (RZ) (Fig. 1)Defensive zoneDF
Middle defensive zoneMD
Middle offensive zoneMO
Offensive zoneOF

Start Interaction Context (CEI)The goalkeeper regains possession of the ball with the opposing team’s forward line aheadPA
The defensive line regains possession of the ball with the forward line aheadRA
The defensive line regains possession of the ball with the midfield line aheadRM
The midfield line regains possession of the ball against the rearmost lineMR
The midfield line regains possession of the ball against the midfield lineMM
The midfield line regains possession of the ball with the forward line aheadMA
The forward line regains possession of the ball against the rearmost lineAR
The forward line regains possession of the ball against the midfield lineAM
The forward line regains possession of the ball against the goalkeeper

Type of Initial Attack (TA)Positional: Possession starts by gaining the ball in play; the first or second player makes short, horizontal, and non-penetrating passes in an attempt to destabilize the organized defensive system of the opposing teamPT
Direct: Possession starts by gaining the ball in play; the first or second player in action uses long vertical penetrating passes. This type of possession aims to quickly reach the opponent’s goal, challenging the organized defensive system of the opposing teamDT
Counterattack: Possession starts by gaining the ball in play; the first or second player in action uses penetrating passes or dribbles to penetrate; the progression towards the opposing goal involves a high percentage of quick penetration passes (evaluated qualitatively). This type of possession aims to deny the opponent the opportunity to minimize surprise, reorganize their system, and be defensively prepared. It cannot begin with a goalkeeper pass if the goalkeeper controls the ball for more than 4 secondsCT

Passes (PS)0: The attacking team fails to make any passes0
1–2: The attacking team makes between 1 and 2 passes1–2
3–4: The attacking team makes between 3 and 4 passes3–4
5: The attacking team makes 5 or more passes≥ 5

Penetrative Passes (PP)0: The team does not make any passes towards the opposing goal, failing to surpass any player or defensive line of the opposing team0
1–2: The team makes between 1 or 2 passes towards the opposing goal, successfully surpassing some player or defensive lines of the opposing team1–2
3: The team makes more than 2 passes towards the opposing goal, successfully surpassing some players or defensive lines of the opposing team3

Attack Player (AP)1–2: During the team’s possession, between 1 and 2 players voluntarily contact the ball If a player contacts the ball more than once, it is counted only once1–2
3–4: During the team’s possession, between 3 and 4 players voluntarily contact the ball3–4
5: During the team’s possession, 5 or more players voluntarily contact the ball≥ 5

End Zone (EZ) (Fig. 1)Defensive zoneDFF
Middle defensive zoneMDF
Middle offensive zoneMOF
Offensive zoneOFF

Type of Possession (TP)Short possession: one or two passes per team possessionSH
Medium possession: three or four passes per team possessionMP
Long possession: five or more passes per team possessionLG

Dynamic defensive transition opposing teamGeneral
Defensive
Approach (PTGD)
Persistent (pressure): Several opposing players press the attackers during the first 3 seconds of possession. The defenders position themselves near the ball possessor, trying to hinder their actions, and close to the attackers closest to the ball, attempting to prevent passes. Pressing defensive model.PR
Expectant (no pressure): A player pressures the ball possessor, or no player pressures the attackers during the first 3 seconds of possession. Containment defensive model.EP

Number of Defenders (DOP)1–3: At the moment of regaining possession, the opposing team has between 1 and 3 players positioned between the ball and their own goal, excluding the goalkeeper.1–3
4–6: At the moment of regaining possession, the opposing team has between 4 and 6 players positioned between the ball and their own goal, excluding goalkeeper.4–6
7: At the moment of regaining possession, the opposing team has 7 or more players positioned between the ball and their own goal, excluding goalkeeper.≥ 7

Defensive Position (POT)High: The furthest-back opponent is in the opposing halfHG
Medium: The furthest-back opponent is closer to the midline than to their own goal (Middle def zone)ME
Low: The furthest-back opponent is closer to their own goal than the midline (Def zone)LW

Transition OutcomeOutcome (OUT)Goal: When the whole of the ball crosses over the line, between the goal posts and under the crossbar, provided no offence has been committed by the scoring team. The referee awarded a goalGO
Attempt ON Target: An attempt on goal by the attacking team that were heading towards the goal which was saved by the goalkeeper or blocked by a defensive player of the opposing teamAO
Attempt OFF Target: An attempt by the attacking team which was not directed between the dimensions of the goal including hitting the crossbar or goal postsAF
Set-play: A set piece was awarded to the attacking team in the form of a free kick, penalty kick or throw-inSP
Corner kick: The attacking team wins a corner kickCK
Enter offensive zone: advance the ball into the offensive zone, free kicks, and throw-ins in the offensive zone.OZ
Loss of Possession: The attacking team lost possession of the ball through the ball going out of the dimensions of the pitch or an opposing team player regaining possession of the ball, with enough control to have a deliberate influence over the ball’s subsequent directionLP

Procedure and reliability

Data were coded by one observer and, prior to the coding process, to reduce intra-observer variability, ten training sessions were carried out in which 250 transitions, not included in the final sample, were coded. The criterion of consensual agreement [32] among the observer and the principal investigator, so that recording was only done when agreement was produced. Cohen’s Kappa coefficient was calculated to intra and inter-observer reliability test, through reassessment of 361 offensive transitions (10%) randomly selected, four weeks after the initial analysis. Reliability of each criteria is presented in Table 2, with general defensive approach opposing team presenting the lowest value (0.89, 0.84), considered almost perfect according to Fleiss et al. [33] scale.

TABLE 2

Intra and inter-observer agreement

CriteriaKappa coefficient

Intra-observerInter-observer
Period of Match1.001.00
Match Status1.001.00
Match Outcome1.001.00
Ball Recovery0.980.95
Recovery Zone1.001.00
Start Interaction0.950.90
General Defensive
Approach Opposing Team
0.890.84
Number of Defenders
Opposing Team
1.000.98
Defensive Position
Opposing Team
1.001.00
Type of Initial Attack0.970.89
Passes1.001.00
Penetrative Passes0.940.91
Attack Player1.001.00
End Zone1.001.00
Type of Possession1.001.00
Possession Outcome1.001.00

Statistical analysis

First, a bivariate analysis was carried out using contingency tables and the analysis of absolute and relative frequencies. The existence of a statistically significant association between the analysed criteria was quantified from the statistic χ2 (p < .05). For the criteria that showed association, the effect size was calculated from the Contingency Coefficient, considered as small (ES = .10), medium (ES = .30) and large (ES = .50) [34].

Next, a Random Forest [35] supervised model was trained in which all the predictor variables were included with the exception of tournament, stage and end zone, a decision made with the aim of avoiding collinearity problems and increasing the generalizability of the results. To do this, the Outcome variable was mapped to a binary variable creating the categories Success (Goal, Attempt on target, Attempt off Target) and No Success (Rest of categories of the Outcome variable), this decision is justified in order to obtain a greater predictive capacity regarding the binary problem, rather than a multiclass problem. Once mapped and due to the class imbalance of the variable to be predicted (Success = 10.4% – No Success = 89.6%), an oversampling of the unbalanced category was carried out using the SMOTE technique. Although the use of this technique may lead to overfitting in the training dataset, it was considered essential to increase the model’s recall, because previous work has shown an improvement in the classification of the models [27]. The trained Random Forest model, based on the random assembly of decision trees, has been used in other previous studies with classification objectives in the field of sport [36, 37] due to its high predictive capacity. For its training, the resampled dataset was segmented in a stratified way based on the target variable in 70% training and 30% testing. In addition, different combinations of hyperparameters were tested using a cross validation procedure with 5 folds in the training sample. The predictive performance of the model was evaluated from the classification matrix, both on the resampled dataset and on the original dataset, as well as by calculating the area under the curve (AUC), considered as excellent (0.90 < AUC < 1.00), good (0.80 < AUC < 0.90), fair (0.70 < AUC < 0.80), poor (0.60 < AUC < 0.70), and fail (0.50 < AUC < 0.60) [38].

Finally, the explainability technique ShAP [39] was implemented on each category analysed. The technique consists of the calculation of the Shapley Additive exPlanation values. This approach quantifies the contribution of each category by integrating all variables based on their expected value in the model output, following the formula, allowing to attribute to each analysed variable the change on the prediction of the model, an aspect that allows to perform an interpretability of black box models such as Random Forest. For this reason, other authors have applied this technique in applied fields such as data analysis in football [40, 41].

RESULTS

Bivariate results are presented in Table 3. A total of 13 criteria showed a statistically significant association with the mapped Outcome variable.

TABLE 3

Bivariate analysis between the dependent criterion Possession Outcome and the other criteria analysed.

CriteriaCategoriesSuccess (n = 375, 10.4%)No success (n = 3,235, 89.6%)p-value [ES]
TournamentFWWC 2023139 (37.1%)1,396 (43.2%)p > .05 [-]
Women´s Euro 202272 (19.2%)678 (20.9%)
UWCL 2023/24164 (43.7%)1,161 (35.9%)

StageFinal35 (9.3%)224 (6.9%)p > .05 [-]
Semifinal76 (20.3%)686 (21.2%)
Quarter Final186 (49.6%)1,547 (47.8%)
Round of 1678 (20.8%)778 (24%)

Period of Match1–1558 (15.5%)589 (18.2%)p > .05 [-]
16–3059 (15.7%)457 (14.1%)
31–4553 (14.1)523 (16.2%)
46–6058 (15.5%)552 (17.1%)
61–7564 (17.1%)466 (14.4%)
76–9083 (22.1%)648 (20.0%)

Match StatusDrawing177 (47.2%)1755 (54.3%)p < .005 [.05]
Losing89 (23.7%)749 (23.2%)
Winning109 (29.1%)731 (22.6%)

Match OutcomeWin181 (48.3%)1,211 (37.4%)p < .001 [.07]
Draw89 (23.7%)844 (26.1%)
Loss105 (28.0%)1,180 (36.5%)

Ball RecoverySteal107 (28.5%)918 (28.4%)p < .05 [.05]
Duel80 (21.3%)584 (18.1%)
Turnover168 (44.8%)1,402 (43.3%)
Goalkeeper Action20 (5.3%)331 (10.2%)

Recovery ZoneDefensive Zone84 (22.4%)1,307 (40.4%)p < .05 [.20]
Middle Defensive Zone109 (29.1%)1,127 (34.8%)
Middle Offensive Zone132 (35.2%)715 (22.1%)
Offensive Zone50 (13.3%)86 (2.7%)

Start Interaction ContextPA21 (5.6%)335 (10.4%)p < .001 [.19]
RA117 (31.2%)1,500 (46.4%)
RM4 (1.1%)23 (0.7%)
MR2 (0.5%)17 (0.5%)
MM181 (48.3%)1,212 (37.5%)
MA2 (0.5%)53 (1.6%)
AR36 (9.6%)67 (2.1%)
AM9 (2.4%)28 (0.9%)
3 (0.8%)0 (0.0%)

General DefensivePersistent (Pressure)166 (44.3%)1,467 (45.3%)p > .05 [-]
Approach Opposing TeamExpectant (No Pressure)209 (55.7%)1,768 (54.7%)

Number of Defenders
Opposing Team
1–335 (9.3%)81 (2.5%)p < .001 [.13]
4–693 (24.8%)545 (16.8%)
≥ 7247 (65.9%)2,609 (80.6%)

Defensive Position
Opposing Team
High72 (19.2%)1,170 (36.2%)p < .001 [.14]
Medium105 (28.0%1,039 (32.1%)
Low198 (52.8%)1,026 (31.7%)

Type of Initial AttackPositional Attack94 (25.1%)1,347 (41.6%)p < .001 [.17]
Direct Attack175 (46.7%)1,546 (47.8%)
Counterattack106 (28.0%)342 (10.6%)

Passes029 (7.7%)436 (13.5%)p < .001 [.10]
1–2117 (31.2%)1,316 (40.7%)
3–487 (23.2%)639 (19.8%)
≥ 5142 (37.9%)844 (26.1%)

Penetrative Passes038 (10.1%)1,109 (34.3%)p < .001 [.25]
1–2206 (54.9%)1,812 (56.0%)
≥ 3131 (34.9%)314 (9.7%)

Attack Player1–278 (20.8%)1,206 (37.3%)p < .001 [.12]
3–4148 (39.5%)1,218 (37.7%)
≥ 5149 (39.7%)811 (25.1%)

End ZoneDefensive Zone0 (0.0%)176 (5.4%)p < .001 [.42]
Middle Defensive Zone2 (0.5%)821 (25.4%)
Middle Offensive Zone8 (2.1%)1,391 (43%)
Offensive Zone365 (97.3%)847 (26.2%)

Type of PossessionShort Possession46 (12.3%)759 (23.5%)p < .001 [.10]
Medium Possession75 (20.0%)805 (24.9%)
Long Possession254 (67.7%)1,671 (51.7%)

In relation to the trained Random Forest model, the combination of hyperparameters with higher performance obtained through the cross validation procedure was: ‘bootstrap’: False, ‘max_depth’: 20, ‘min_samples_leaf’: 2, ‘min_samples_split’: 2, ‘n_estimators’: 300. The area under the curve over the resampled dataset was excellent (AUC = .99) and fair (AUC = .78) over the original set. The confusion matrices are presented in Figure 2 and the different evaluation metrics of both models (original and resampled test set) in Table 4. Overall, the model presented a high overall classification capacity (accuracy), with a correct classification percentage of 95% in the test set over the resampled dataset and 88% over the test set in the original dataset. In contrast, when correctly classifying the positive output (recall) the performance of the model decreased significantly in the original dataset (18%) compared to the resampled dataset (94%), while the true negative rate (specificity) was 95% and 96% for the resampled and original test sets, respectively. Results highlight substantial improvements in recall following resampling, while maintaining high levels of accuracy and specificity.

FIG. 1

Spatial division of the football pitch.

/f/fulltexts/BS/56514/JBS-43-56514-g001_min.jpg
FIG. 2

Confusion matrices of the Random Forest model over the resampled and original dataset.

/f/fulltexts/BS/56514/JBS-43-56514-g002_min.jpg
TABLE 4

Performance metrics for the success prediction model evaluated on the resampled and original test sets.

Success Prediction

Original Test setResampled Test set
Accuracy0.880.95
Recall0.180.94
Specificity0.960.95

[i] Results highlight substantial improvements in recall following resampling, while maintaining high levels of accuracy and specificity.

Regarding the influence of different criteria and categories on the binary classification model, Penetrative Passes emerged as the most impactful criterion, increasing the likelihood of the model predicting Success when its value reached ≥ 3. The second most influential criterion was Type of Initial Attack, with a higher probability of a positive output when the category corresponded to Counterattack (CT). Similarly, Defensive Position of the Opposing Team affected the probability of success when classified as Low, as did Type of Possession when it corresponded to Long. The influence of each criterion and its respective categories is illustrated in Figure 3, which ranks them in descending order based on their impact on the model’s classification.

FIG. 3

Influence of the analysed criteria and categories on the model’s output.

‘Note: The red-pink colours to the right of the value 0 on the vertical axis indicate that the presence of the corresponding category increases the likelihood that the model will predict the positive category of the target variable.

/f/fulltexts/BS/56514/JBS-43-56514-g003_min.jpg

DISCUSSION

This study aimed to identify key performance indicators (KPIs) associated with offensive success in dynamic offensive transitions in elite women’s football, develop a predictive model for offensive success, and evaluate the influence of these KPIs on the model’s outcome. To enhance the applicability of the findings, an extrinsic explainability technique was employed, enabling a clearer understanding of the impact of each variable on the model.

A total of 3,729 dynamic offensive transitions were recorded across 35 matches, averaging 106.5 transitions per match – consistent with data reported in men’s football by Casal et al. [42]. Of the 3,610 transitions analysed, only 1.3% resulted in a goal, 9.1% in attempt, 21.1% in an entry into the offensive zone, while 48.9% led to possession loss. These results, aligning with previous studies such as that of Iván-Baragaño et al. [22], highlight the relatively low offensive effectiveness of the transitions examined.

Among all indicators analysed, the number of penetrating passes exhibited the strongest relationship with outcome. Specifically, performing three or more penetrating passes had a greater positive influence on success, whereas making none or only one or two was more frequently associated with no success outcomes. As shown in Table 3, 34.9% of possession involving three or more penetrating passes resulted in success, compared to only 9.5% of those that did not. Similar patterns have been observed in men’s football, with studies by Tenga et al. [43] and Zani et al. [44] demonstrating that teams executing a higher number of penetrating passes are more likely to create goal-scoring opportunities. This trend may be explained by the disruptive effect of these passes on the opposing team’s defensive structure, as they facilitate disorganisation, hinder defensive reactions, and allow the ball to be received in more advanced areas with reduced defensive pressure.

The counterattack, as a type of initial attack following ball recovery, was the second most influential indicator in relation to outcome, achieving an effectiveness rate of 28.0% compared to 10.6% for other types of attack. Conversely, positional attacks proved less effective, with a success rate of 25.1% compared to a failure rate of 41.6%. These results reinforce the notion that, in dynamic offensive transitions, capitalising on the opponent’s defensive imbalance immediately after regaining possession is crucial for creating goal-scoring opportunities.

The defensive position opposing team was the third most significant indicator linked to outcome. Specifically, a low defensive position was associated with higher offensive effectiveness, whereas a high defensive position reduced attacking efficiency. This relationship may be explained by the statistical association between defensive positioning and the recovery zone (p < .001). The closer to the opponent’s goal possession is regained, the deeper the opposing defensive line tends to be. Figure 3 illustrates that ball recoveries in advanced zones (OF, MO) promote offensive success, whereas recoveries in deeper zones (DF, MD) hinder it. These results align with previous research, such as Iván-Baragaño et al. [22] and Scanlan et al. [45], which also establish a link between recovery zones and offensive success. Although Scanlan et al. [45] employed a different field segmentation, their findings similarly highlight the middle offensive and offensive zones as the most effective, further supporting our conclusions.

Additionally, the start interaction context demonstrated a strong relationship with the aforementioned indicators and significantly influenced outcomes. Specifically, the MM category was associated with success, whereas RA was linked to no success outcomes. These findings, consistent with Iván-Baragaño et al. [22], suggest that in MM situations, the team regaining possession only needs to bypass the opponent’s midfield and deeper defensive lines to advance towards goal. In contrast, in RA situations, the team must overcome all the opponent’s defensive lines, which complicates progression and diminishes offensive effectiveness. This aligns with the observation that a greater number of opposing defenders negatively impacts the success of an attacking action.

Regarding the type of possession or attacking approach, the results indicate that long possessions are the most effective. Closely link to this, the number of passes made also plays a crucial role in offensive success. Specifically, executing five or more passes is associated with a higher likelihood of success, whereas making only one or two passes is linked to no success. As shown in Table 3, 37.6% of possessions involving five or more passes resulted in success, compared to 26.1% that did not. Conversely, when only one or two passes were made, 40.7% of possessions ended in no success, while just 31.2% led to a successful outcome. These findings align with those of Iván-Baragaño et al. [22], who also identify the number of passes as a key indicator of offensive performance. Additionally, ball recovery had a notable impact on outcome. Consistent with the study by Scanlan et al. [45], turnovers were found to be the most effective defensive actions for regaining possession and generating goal-scoring opportunities.

Finally, the contextual variables of match outcome and period of the match influenced outcome. Winning teams demonstrated greater offensive effectiveness (48.3%, Table 3), suggesting a better ability to capitalise on attacking opportunities. This finding corroborates Iván-Baragaño et al. [22], where this factor was strongly associated with goal-scoring success in the FWWC 2023. Likewise, offensive transitions executed in the final minutes of the match (76–90) were the most successful. This supports previous research, which suggests that physical and mental fatigue towards the end of a match increases the probability of defensive errors, thereby enhancing offensive effectiveness – although this specific aspect was not directly analysed in the present study [22, 46]. However, the influence of this factor was limited, and no significant relationship was found in the bivariate analysis (p > .05).

Regarding match status, while it was not included in the final predictive model, it did exhibit a significant relationship in the bivariate analysis. Specifically, winning teams held a slight advantage in offensive effectiveness (29.1% success compared to 22.6% no success, Table 3). Nevertheless, Iván-Baragaño, et al. [22] suggest that offensive effectiveness does not significantly fluctuate depending on match status.

Regarding the model’s predictive capacity, the evaluation of the confusion matrix on 30% of the validation sample from the original dataset yielded an overall accuracy of 86%, with and 18% success rate in correctly classifying shots or goals. This represents an eightpercentage-point improvement compared to the relative frequency of these events in the original dataset. Compared to the study by Iván-Baragaño et al. [22], the predictive capacity for positive events in this model was five percentage points higher, suggesting that: (i) a larger sample enhances model training and performance, and, (ii) incorporating a greater number of features and fine-tuning hyperparameters is crucial for improving target variable prediction. However, as seen in other studies addressing imbalanced datasets in football [24, 25], the inherent complexity of the sport poses challenges in identifying consistent patterns, which limits the model’s ability to accurately predict goals or shots. In this regard, it is important to consider that goals and shots occur in approximately 2% and 10% of ball possessions, respectively, which highlights the low frequency of these actions and, consequently, the difficulty in predicting them. Nonetheless, a strong concordance was observed between the results obtained through the ShAP explainability technique and the bivariate analyses, reinforcing the utility of this approach in interpreting black-box models and facilitating the practical application of findings in the sports domain. In practical terms, the ShAP analysis highlights key predictors of offensive transition success, offering coaches evidence-based guidance for training and tactical planning. Priority should be given to transitions involving at least three penetrating passes and exploiting disorganised low defensive blocks through structured high pressing. Coordinated build-up involving multiple players, along with training under end-of-half fatigue conditions, may further enhance execution and decision-making in high-pressure scenarios.

Finally, the observation instrument developed in this study has proven to be robust and reliable for analysing dynamic offensive transitions, as 13 out of the 17 evaluated indicators showed a significant relationship with outcome. Furthermore, the consistency of these findings with previous research [22, 4245] reinforces their external validity. However, no significant relationship was found between the general defensive approach of the opposing team and outcome. In contrast, previous studies, have demonstrated the impact of defensive pressure on offensive effectiveness [22].

This study presents a novel methodological approach by identifying offensive transition styles in elite women’s football through a clustering technique that integrates both offensive and defensive variables. The large and diverse sample, drawn from major international competitions, adds to the ecological validity of the findings. Additionally, the inclusion of model interpretability techniques (ShAP) strengthens the practical utility of the results for applied settings. However, limitations include the reliance on notational data rather than spatiotemporal tracking, which may restrict the depth of tactical analysis. Furthermore, the outcomes are sensitive to the selected features and may not fully capture in-game tactical adjustments. The exclusive focus on elite-level matches also limits the generalizability of the findings to other levels of play.

Future research should therefore explore the general defensive approach of the opposing team in greater depth, examining contextual factors that may modulate its influence and further refining our understanding of its role in offensive transitions. Additionally, optimising predictive models by incorporating new success-related criteria while eliminating those with limited impact could enhance dataset efficiency and improve model performance. The findings of this study provide crucial scientific insights into the execution of dynamic offensive transitions and offer valuable practical applications for coaches and analysts in elite women’s football.

CONCLUSIONS

In this study, 13 of the 17 criteria analysed showed an association with the outcome of dynamic offensive transitions. Of these, the number of penetrating passes and counterattacks strategies contributing the most, as shown by ShAP value. In the same way, the opponent’s low defensive position, the MM starting interaction context, the OF recovery zone, ball recovery through turnovers, and long possessions involving five or more players contributed to the model increasing the positive output probability. Additionally, contextual variables such as match status and the period of the match also demonstrated a significant impact on offensive effectiveness. In particular, winning teams and transitions executed in the final minutes of the match were found to be more effective. Encouraging long possessions with a high number of penetrating passes can facilitate ball control and goal-scoring opportunities. Furthermore, employing high pressing to recover possession in advanced areas – particularly through turnovers – and capitalising on the opponent’s initial defensive disorganisation via counterattacks are highly effective strategies. Finally, teams should strategically manage physical exertion to sustain intensity in the final phase of the match, as offensive transitions tend to be more successful in the closing minutes of play. Regarding the trained predictive model, the application of an oversampling technique, the inclusion of a greater number of features, and the use of a larger sample enhanced its predictive capacity compared to previous studies. However, despite these improvements, the model correctly predicted a goal or shot in only one out of five instances, reflecting the inherent complexity of forecasting such events in a highly dynamic and unpredictable sport such as football. Despite everything, the application of ensemble models such as Random Forest showed an improvement in predictive capacity compared to models with greater intrinsic explainability (i.e., decision trees and binary logistic regression). Moreover, the calculation of ShAP values allowed for external interpretability of the model, reducing one of the main limitations of this model.

Data Availability Statement

The data that support the findings of this study are openly available in Zenodo at https://zenodo.org/records/15349140

Conflict of interest declaration

The authors declared no conflict of interest.

REFERENCES

1 

FIFA. FIFA women’s football – member associations survey report 2023. Zurich (Switzerland): FIFA; 2023. https://inside.fifa.com/womens-football/memberassociations-survey-report-2023.

2 

Real Federación Española de Fútbol. Memoria de actividades. Número de licencias federativas. Madrid (Spain): RFEF; 2024. https://rfef.es/es/federacion/transparencia/licencias.

3 

Football Australia. Legacy’23. Post tournament report. Sydney (Australia): Football Australia; 2024. https://www.footballaustralia.com.au/legacy23.

4 

Valenti M, Scelles N, Morrow S. Women’s football studies: an integrative review. Sport Bus Manag. 2018; 8(5):511–28. https://doi.org/10.1108/SBM-09-2017-0048.

5 

McCall A, Mountjoy M, Witte M, Serner A, Massey A. Driving the future of health and performance in women’s football. Sci Med Footb. 2022; 6(5):545–6. https://doi.org/10.1080/24733938.2022.2152543.

6 

Ventaja-Cruz J, Cuevas Rincón JM, Tejada-Medina V, Martín-Moya R. A bibliometric study on the evolution of women’s football and determinants behind its growth over the last 30 years. Sports. 2024; 12:333. https://doi.org/10.3390/sports12120333.

7 

Wang S, Qin Y, Jia Y, Igor KE. A systematic review about the performance indicators related to ball possession. PLoS One. 2022; 17(3):e0265540. https://doi.org/10.1371/journal.pone.0265540.

8 

Bradley PS. ‘Setting the Benchmark’ Part 3: Contextualising the match demands of specialised positions at the FIFA Women’s World Cup Australia and New Zealand 2023. Biol Sport. 2025; 42(1):99–111. https://doi.org/10.5114/biolsport.2025.139857.

9 

Bradley PS. ‘Setting the Benchmark’ Part 4: Contextualising the match demands of teams at the FIFA Women’s World Cup Australia and New Zealand 2023. Biol Sport. 2025; 42(2):57–69. https://doi.org/10.5114/biolsport.2025.142638.

10 

Shen L, Tan Z, Li Z, Li Q, Jiang G. Tactics analysis and evaluation of women football team based on convolutional neural network. Sci Rep. 2024; 14:50056. https://doi.org/10.1038/s41598-023-50056-w.

11 

Trower M, Graham N, Cottrell N, Hengster Y. Clustering women’s football players: identifying functional patterns for performance optimisation. Statsbomb Conference; 2023. https://statsbomb.com/wp-content/uploads/2023/10/Clustering-Womens-Football-Players-Identifying-Functional-Patternsfor-Performance-Optimisation.pdf.

12 

Narayanan S, Pifer ND. A data-driven framing of player and team performance in U.S. women’s soccer. Front Sports Act Living. 2023; 5:1125528. https://doi.org/10.3389/fspor.2023.1125528.

13 

Iván-Baragaño I, Maneiro R, Losada JL, Ardá A. Multivariate analysis of the offensive phase in high-performance women’s soccer: a mixed methods study. Sustainability. 2021; 13(11):6379. https://doi.org/10.3390/su13116379.

14 

Harkness-Armstrong A, Till K, Datson N, Emmonds S. Influence of match status and possession status on the physical and technical characteristics of elite youth female soccer match-play. J Sports Sci. 2023; 41(15):1437–49. https://doi.org/10.1080/02640414.2023.2273653.

15 

Maneiro R, Losada JL, Casal CA, Ardá A. The influence of match status on ball possession in high performance women’s football. Front Psychol. 2020; 11:487. https://doi.org/10.3389/fpsyg.2020.00487.

16 

Iván-Baragaño I, Maneiro R, Losada JL, Ardá A. Influence of match status in ball possessions in the FIFA Women’s World Cup France 2019. Proc Inst Mech Eng P J Sports Eng Technol. 2022; 239(1):12–19. https://doi.org/10.1177/17543371221133624.

17 

Martínez-Hernández D, Quinn M, Jones P. Most common movements preceding goal scoring situations in female professional soccer. Sci Med Footb. 2024; 8(3):60–8. https://doi.org/10.1080/24733938.2023.2214106.

18 

Hewitt A, Norton K, Lyons K. Movement profiles of elite women soccer players during international matches and the effect of opposition’s team ranking. J Sports Sci. 2014; 32(20):1874–80. https://doi.org/10.1080/02640414.2014.898854.

19 

Oliva-Lozano JM, Yousefian F, Chmura P, Gabbett TJ, Cost R. Analysis of FIFA 2023 Women’s World Cup match performance according to match outcome and phase of the tournament. Biol Sport. 2025; 42(2):71–84. https://doi.org/10.5114/biolsport.2025.142643.

20 

Ju W, Cost R, Oliva-Lozano JM. Analysis of match performance of elite soccer players across confederations during the Men’s and Women’s World Cup. Sci Med Footb. 2024; 1–13. https://doi.org/10.1080/24733938.2024.2409679.

21 

Mondal S. She kicks: the state of competitive balance in the top five women’s football leagues in Europe. J Glob Sport Manag. 2021; 8(1):432–54. https://doi.org/10.1080/2470467.2021.18875629.

22 

Casal C, Stone J, Iván-Baragaño I, Losada J. Effect of goalkeepers’ offensive participation on team performance in the women Spanish La Liga: a multinomial logistic regression analysis. Biol Sport. 2023; 40(1):29–39. https://doi.org/10.5114/biolsport.2024.125592.

23 

Iván-Baragaño I, Ardá A, Losada JL, Maneiro R. Goal and shot prediction in ball possessions in FIFA Women’s World Cup 2023: a machine learning approach. Front Psychol. 2025; 16:1516417. https://doi.org/10.3389/fpsyg.2025.1516417.

24 

Markopoulou C, Papageorgiou G, Tjortjis C. Diverse machine learning for forecasting goal-scoring likelihood in elite football leagues. Mach Learn Knowl Extr. 2024; 6(3):1762–81. https://doi.org/10.3390/make6030086.

25 

Freitas DN, Mostafa SS, Caldeira R, Santos F, Fermé E, Gouveia ÉR, Morgado-Dias F. Predicting noncontact injuries of professional football players using machine learning. PLoS One. 2025; 20(1): e0315481. https://doi.org/10.1371/journal.pone.0315481.

26 

Saberisani R, Barati AH, Zarei M, Santos P, Gorouhi A, Ardigò LP, Nobari H. Prediction of football injuries using GPS-based data in Iranian professional football players: a machine learning approach. Front Sports Act Living. 2025; 7:1425180. https://doi.org/10.3389/fspor.2025.1425180.

27 

Last F, Douzas G, Bacao F. Oversampling for imbalanced learning based on K-means and SMOTE. Inf Sci. 2017; 465:1. https://doi.org/10.1016/j.ins.2018.06.056.

28 

Lemaitre G, Nogueira F, Aridas CK. Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017; 18:1–5. https://doi.org/https://doi.org/10.48550/arXiv.1609.06570.

29 

Anguera-Argilaga MT. Observational typology. Qual Quant. 1979; 13(6):449–484. https://doi.org/10.1007/BF00222999.

30 

Anguera MT, Blanco Villaseñor Á, Hernández Mendo A, Losada López JL. Diseños observacionales: ajuste y aplicación en psicología del deporte. Cuad Psicol Deporte. 2011; 11(2):63–76. https://revistas.um.es/cpd/article/view/133241.

31 

Belmont RT. Ethical principles and guidelines for the protection of human subjects of research. (The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research); 1978.

32 

Anguera MT. Metodología observacional. In: Arnau J, Anguera MT, Gómez J, editors. Metodología de la investigación en Ciencias del Comportamiento. Murcia (Spain): Universidad de Murcia; 1990. p. 125–236.

33 

Fleiss JL, Levin B, Paik MC. Statistical methods for rates and proportions. 3rd ed. Hoboken (NJ): John Wiley & Sons; 2003.

34 

Gravetter FJ, Wallnau LB. Essentials of statistics for the behavioral sciences. Belmont (CA): Wadsworth; 2007.

35 

Breiman L. Random forests. Mach Learn. 2001; 45:5–32. https://link.springer.com/article/10.1023/A:1010933404324.

36 

Mandorino M, Tessitore A, Leduc C, Persichetti V, Morabito M, Lacome M. A new approach to quantify soccer players’ readiness through machine learning techniques. Appl Sci. 2023; 13(15):8808. https://doi.org/10.3390/app13158808.

37 

Rico-González M, Pino-Ortega J, Méndez A, Clemente FM, Baca A. Machine learning application in soccer: a systematic review. Biol Sport. 2023; 40(1):249–63. https://doi.org/10.5114/biolsport.2023.112970.

38 

Ciocan A, Hajjar N al, Graur F, Oprea VC, Ciocan RA, Bolboacă SD. Receiver Operating Characteristic Prediction for Classification: Performances in Cross-Validation by Example. Maths. 2020; 8(10):1741. https://doi.org/10.3390/math8101741.

39 

Lundberg SM, Lee SI. A Unified Approach to Interpreting Model Predictions. Adv Neural Inf Process Syst. 2017. p. 4766–75. https://arxiv.org/abs/1705.07874.

40 

Moustakidis S, Plakias S, Kokkotis C, Tsatalas T, Tsaopoulos D. Predicting Football Team Performance with Explainable AI: Leveraging SHAP to Identify Key Team-Level Performance Metrics. Future Internet. 2023; 15(5):174. https://doi.org/10.3390/fi15050174.

41 

Pappalardo L, Rossi A, Natilli M, Cintia P. Explaining the difference between men’s and women’s football. PLoS One. 2021; 16(8):e0255407. https://doi.org/10.1371/journal.pone.0255407.

42 

Casal-Sanjurjo CA, Andujar MÁ, Ardá A, Maneiro R, Rial A, Losada JL. Multivariate Analysis of Defensive Phase in Football: Identification of Successful Behavior Patterns of 2014 Brazil FIFA World Cup. J Hum Sport Exer. 2021; 16(3):503–16. https://doi.org/10.14198/jhse.2021.163.03.

43 

Tenga A, Mortensholm A, O’Donoghue P. Opposition interaction in creating penetration during match play in elite soccer: evidence from UEFA champions league matches. Int J Perf Anal Sport. 2017; 17(5):802–12. https://doi.org/10.1080/24748668.2017.1399326.

44 

Zani J, Fernandes T, Santos R, Barreira D. Penetrative passing patterns: Observational analysis of senior UEFA and FIFA tournaments. Apunts, Educ Fis Deport. 2021; (146):42–51. https://doi.org/10.5672/apunts.2014-0983.es.(2021/4).146.05.

45 

Scanlan M, Harms C, Cochrane Wilkie J, Ma’ayah F. The creation of goal scoring opportunities at the 2015 women’s world cup. Int J Sports Sci Coach. 2020; 15(5–6):803–8. https://doi.org/10.1177/1747954120942051.

46 

Sanmiguel-Codina J, Ballester R, Casal CA, Huertas F. Analysis of goal scoring patterns in the UEFA Women’s EURO 2022. Biol Sport. 2024; 42(2):45–56. https://doi.org/10.5114/biolsport.2025.142646

Copyright: Institute of Sport. This is an Open Access article distributed under the terms of the Creative Commons CC BY License (https://creativecommons.org/licenses/by/4.0/). This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
 
Quick links
© 2026 Termedia Sp. z o.o.
Developed by Bentus.