Differences in running technique between runners with better and poorer running economy and lower and higher milage: An artificial neural network approach

Prior studies investigated selected discrete sagittal‐plane outcomes (e.g., peak knee flexion) in relation to running economy, hereby discarding the potential relevance of running technique parameters during noninvestigated phases of the gait cycle and in other movement planes.


| INTRODUCTION
2][3][4][5] For example, Williams and Cavanagh 6 found that 54% of the variation in running economy could be explained by two kinematic and one kinetic variable (i.e., shank angle at footstrike, peak plantar flexion angle, and net positive power).A similar finding was reported by Folland and colleagues 2 who showed that three kinematic variables (i.e., pelvis vertical oscillation during ground contact normalized to height, minimum knee joint angle during ground contact, and minimum horizontal pelvis velocity) explained 39% of the variance in running economy.Moreover, Lundby and coworkers 4 found that vertical displacement explained 94% of the gross energetic cost of running, with various other physiological, biochemical, and anthropometric outcomes not contributing to the explained variance.Yet most studies that investigate the correlation between running technique and running economy solely study the association between sagittal-plane kinematics and running economy. 5Although frontal-and transverse-plane kinematics contain relevant information for subject identification during running, 7 research regarding these planes of motion and their association with running economy remains limited. 5,80][11] For example, Folland and colleagues 2 investigated the association between peak knee flexion during stance and running economy, among other joint/segment angles.Studies also compare discrete variables between groups differing in running economy.Kyröläinen, Komi, and Belli 12 for example compared the angular velocities of the knee and ankle during the braking phase between a group of runners with better and poorer running economy.The a priori selection of discrete variables (e.g., peak knee flexion), however, discards a considerable amount of potentially relevant data, hereby inevitably omitting unselected but potentially important variables.Further, temporal information and time dependency between different variables are discarded when discretising variables.
Various statistical methods exist to overcome the limitations associated with a priori selections of outcome variables.Statistical parametric mapping, for example, has been used to determine differences in time series of lower limb joint angles between younger/older runners 13 and different shoes. 14Support vector machine classifiers have identified kinematic differences between higherand lower-mileage groups based on principal component scores. 15Finally, an artificial neural network (ANN) has also been used to classify runners into low or high mileage groups based on time series of 3D kinematics and kinetics. 168][19] For example, Giarmatzis and associates 18 showed that an ANN was able to provide more accurate predictions of the knee joint contact force during walking than support vector regression.The superiority of ANNs derives in part from their ability to "learn" relationships among multiple variables within large datasets.Due to their ability to store experimental knowledge acquired through a learning process as weights between different nodes, ANNs represent a generalizable method to perform classification and regression tasks which are established in biomechanical contexts. 20rtificial neural networks, however, act as black boxes as they do not provide information about the relationship between input data and a particular output/classification result.Due to the various nonlinear mappings within the hidden layers, the learnings of the model are difficult to discern.However, for most applications of human research, knowing which variables contribute to the classification is as important as the classification result itself.In other words, knowing which variables contribute to better running economy might be as relevant as knowing whether a participant demonstrates better or poorer running economy.For instance, coaches need to know which components of running technique to modify to improve running economy.Layer-wise Relevance Propagation (LRP) calculates the contribution of each input variable (e.g., data point in a time series of joint angles) to the overall prediction and can therefore be used to gain insight into the importance of single input variables of time series data to a classification task.This method has been used to increase the transparency of ANN's to decode the characteristics of individual running techniques 7 and to differentiate kinematic differences between low and high mileage runners. 16Similarly, it may be applied to study the importance of sagittal-, frontal-, and transverse-plane time series data to differentiate between runners with higher and lower running economy.
The primary aim of this study is to use an ANN approach in combination with LRP to investigate (A) if this approach can accurately classify individuals into a group of higher or lower running economy runners, and (B) identify kinematic components that differentiate the two groups from each other.Since previous research has shown differences in running kinematics between males and females, 15,21 we also explored the kinematic components that differentiated groups of runners with higher or lower running economy for each sex.Finally, weekly running distance has also been associated with specific alterations in running kinematics 15,16,22 and running economy. 23By contrasting the relation between running kinematics and weekly distance on the one hand, and running economy on the other hand, this study aims to yield insights into the kinematic alterations that may predominantly contribute to running economy and thus performance and the kinematics that may contribute to reduced injury risk by allowing runners to complete a high weekly distance.As a secondary aim, we investigate the ability of the ANN to classify individuals into a group of higher and lower weekly distances, as well as the kinematic components that differentiate the two groups differing in distance.

| Participants
Forty-one participants (22 males and 19 females, mean ± SD age 25.8 ± 4.67 years, body height 173 ± 17.2 cm; body mass 70.0 ± 10.2 kg) that were free of any moderate (for previous 3 months) or minor (for previous 1 month) musculoskeletal injuries, were aged 18-35, comfortable with treadmill running, and had a body mass index (BMI) of <26 (to account for extra weight due to clothing), volunteered to participate in this study.The participants included in this study were measured as part of two (larger) studies that aimed to (1) validate wearable sensors during running 24 and (2) a pre-measurement of a randomized controlled trial on the effect of using wearable technology to reduce running injuries. 25Both studies were approved by the local ethics committee (nr.2019-1138 and NL72989.068.20), and all procedures were in line with the declaration of Helsinki.Mean ± SD weekly training distance was 32.0 ± 35.9 km (range 0-140 km) and the training experience and level differed widely between participants (i.e., nonrunners to Olympic level athlete) to ensure a large range in running economy.Weekly training distance was obtained from self-reports.This has been shown to agree closely with weekly running distance obtained from global positioning systems. 26

| General design of the study
All participants completed a single test session and were instructed to avoid strenuous activity for 36 h, alcohol for 24 h, caffeine for 6 h, and a heavy meal 1 h before the session.When entering the laboratory, anthropometric measurements were taken using standardized procedures and the participants completed a questionnaire about their weekly training volume, running experience, and seasonal best times.Participants then ran on a treadmill while three-dimensional kinematics and gas exchange data were collected simultaneously.

| Instruments
All experiments were performed on the computerassisted rehabilitation environment (CAREN, Motek, The Netherlands) system. 27This system combines an instrumented split-belt treadmill (belt length and width 2.15 × 0.5 m, 6.28-kW motor per belt, 60 Hz belt speed update frequency and 0-18 km•h −1 speed range) with threedimensional motion capture.Kinematics were collected at 100 Hz by a 12-camera motion capture system (VICON NEXUS v1.8, Oxford Metrics Group, Oxford, UK) and filtered using a zero-lag 4th order Butterworth filter with a low-pass filter of 20 Hz.Each participant wore their own habitual training shoes during all trials.
For respiratory gas analyses, participants wore a face mask (Hans Rudolp Inc, Shawnee, KS, USA) over the nose and mouth without detectable leakage.The mask was connected to a T-piece that was placed in a free airstream (200 L min −1 ).Respiratory gasses were captured using a total-capture indirect calorimeter (Omnical, Maastricht Instruments, Maastricht, The Netherlands). 28The system was calibrated automatically every 15-30 min using room air and a gas mixture of known composition.Laboratory temperature was kept constant at 18°C-21°C and relative humidity of 50%-55% during all test sessions.

| Data collection
Twenty-six retroreflective skin markers were attached to the skin with double-sided tape using a modified lower limb and trunk marker (Human Body Model v2) set as described previously. 27Modifications included: toe markers were placed on the head of the 2nd metatarsal bone, markers were added to the left and right medial malleolus and epicondyle to improve the reliability of calibration, while greater trochanter, sacrum and navel markers were removed to avoid soft tissue artifacts.Further, the medial malleolus and epicondyle markers were removed after calibration to avoid interference with the running motion.
In both studies, participants first performed 8-min familiarization at either a fixed-paced speed of 2.78 m•s −1 (in the case of the wearable study) or a self-determined comfortable speed (in the case of the randomizedcontrolled trial) to familiarize themselves with treadmill running. 29Data collection was started during the last minute of the familiarization trial at 2.78 m•s −1 (wearable study), or during the last minute of a separate 4 min trial at 2.78 m•s −1 (randomized controlled trial).Steady-state oxygen consumption is typically reached within approximately 2-3 min.Therefore, all runs had a minimum duration of 4 min to ensure the final 1-min steady-state period could be used for biomechanical and gas exchange data analysis.Steady-state oxygen consumption and carbon dioxide production was also visually confirmed during the trials, with the duration of the trial being extended if a steady state was not reached.Participants were instructed to run as if they were running outside and to focus on the simulated virtual forest environment.

| Data preprocessing
Three-dimensional marker positions were labeled in Vicon Nexus and then exported to the Gait Offline Analysis Tool (GOAT) 4.2.1 (Motek ForceLink B.V., Amsterdam) to determine joint angles using a lower body and trunk musculoskeletal model (Human Body Model v2, consisting of nine rigid body segments, 21 degrees of freedom, and 86 muscles).The following angles were computed for each leg: hip, knee, and ankle flexion-extension, hip ab-/adduction and internal/external rotation, ankle pronation, pelvic tilt, rotation and obliquity, and trunk tilt, lateral flexion and rotation.
The output from GOAT analysis was further processed using custom-written algorithms in Matlab (v 2019a, The MathWorks Inc., Natick, Massachusetts, USA) to extract variables of interest and normalize each outcome.Specifically, each outcome was first time-normalized to 100 data points per gait cycle (i.e., from right footstrike to right footstrike), with footstrike being identified when the vertical ground reaction force exceeded 20 N. We then selected the time-normalized time series from the last 65 steps over a period of steady-state running of each individual for further analysis.Time series from multiple steps rather than a single (averaged) step were used to improve the ability of the ANN to learn relevant features and thereby classify individuals in the appropriate group.Sixty-five steps were chosen since this number was achieved for all participants in the analyzed time periods, and thus ensured an equal number of steps for all participants.Further, this number is also well above the number of steps required to achieve stable biomechanical outcomes 30 and to get an appropriate approximation of the true technique at a speed of 2.78 m•s −1 . 31he input data to an artificial neural network needs to be scaled to a range of ±1 to avoid common pitfalls such as exploding/vanishing gradients.Because it is not yet understood how data normalization influences the relevance scores assigned via LRP, we explored two normalization techniques: (1) normalizing all joint angles from ±180° to ±1 and (2) normalizing all joint angles using a common z-transform.With the z-transformation, the average of the participant's value for the specific joint angle was subtracted from its time series and the resulting trajectory was then divided by the standard deviation of the participants' value for the specific joint angle and scaled to a range of −1 to 1.Both normalization approaches ensure a strict separation between the training and test datasets.For both scaling approaches, the time series for each joint were then concatenated into a single vector of dimensions 1 × 1800 (18 joint angular time series × 100 data points), resulting in 65 vectors per participant (i.e., one vector per step), Figure 1.
The rate of oxygen consumption (VȮ 2 ) and carbon dioxide (VĊO 2 ) production were measured continuously and computed at 5-second intervals throughout the running trials.VȮ 2 and VĊO 2 were subsequently used to determine substrate utilization using nonprotein equations, 32 with energy cost being determined as the sum of fat and carbohydrate use.The energy cost was then expressed as J•kg 0.75 •m −1 (i.e., allometrically scaled) to better account for differences in body mass in heterogeneous samples as opposed to linear scaling. 33-37

| Data analysis
For the primary aim, we created two groups that differed in their running economy by determining the average energy cost over the complete sample and subsequently splitting the sample into a group with higher energy cost (mean + 0.2 × standard deviation (SD)) or lower energy cost (mean -0.2 × standard deviation); Figure S1.We used a 0.2 standard deviation threshold as this produced a balance between obtaining a dataset with sufficiently different groups, while also maintaining sufficient individuals in each group.A similar procedure was used with 1 × standard deviation to assess the sensitivity of our results to a stricter threshold (Figure S1).Further, we also split the complete sample into males and females and then applied a similar procedure for creating two groups differing in running economy within each sex.
For the secondary aim, we created two groups based on weekly running distance.To this purpose, the seven individuals with the highest distance were grouped in the high distance group, whereas the eight individuals with the lowest distance were grouped in the low distance group (Figure S2).All other individuals were not included in this analysis.This division was motivated by the distribution of the data showing this approach yielded two distinct groups.Differences between all created groups on the primary differentiating variable were assessed using a linear model.
A shallow neural network that consisted of one input, one hidden, and one output layer with 1800, 3600, and 2 nodes, respectively, was used to classify individuals into the two running economy or distance groups based on their kinematics (Figure 1).The number of nodes for the input layer was derived from the size of the input data, and the number of nodes in the output layer corresponded to the number of classification results (i.e., higher and lower energy cost of running or higher or lower weekly distance).For the hidden layer, twice as many nodes as in the input layer were chosen and the hyperbolic tangent was used as an activation function.This three-layer architecture was chosen since it is known that a single hidden layer is sufficient for learning most functional relationships. 38Further, this architecture achieved the best trade-off between training efficiency and model accuracy using a mini-batch size of 16, 1000 iterations and a learning rate of 0.01.Training was performed on the whole training set, using randomly selected subsets for validation.The network's performance was evaluated by classifying one subject that was F I G U R E 1 Overview of the biomechanical data acquisition and data analysis approach for one participant.(A) 3D marker positions were recorded during running at 2.78 m•s −1 and the lower limb plus trunk joint angles were computed using a musculoskeletal model.(B) Joint angles in all three planes derived from the model were scaled (example shown depicts the z-transform) and concatenated into a single vector per step.(C) The concatenated vectors from 65 steps per participant were used as input to an artificial neural network that tried to learn features relevant to predict group assignment.Note that the number of nodes in the figure for the input and hidden layers is reduced for visualization purposes.(D) The relevance of each data point was then decomposed using layer-wise relevance propagation and averaged over all steps within individuals and between individuals.
not used for training (leave-one out validation), and this process was repeated for all subjects.
The classification accuracy per participant p was calculated by dividing the number of correctly classified steps for a respective participant n p by the total number of steps from that participant N p , multiplied by 100 (Equation 1).A step was classified into one group based on the probability obtained with the ANN.For example, if there was a 51% probability for the data points within a step to belong to the better economy group, the complete step was classified into the better economy group.For the overall evaluation of the neural network's performance, the accuracies were averaged over all participants.
After training and testing the neural network participant-wise, layer-wise relevance propagation was applied to the correctly classified steps to obtain the relevance scores for each input variable.The relevance scores indicate how relevant each variable was to the model's prediction by propagating backwards through the neural network from the correctly assigned group to the input variables (see "glow" lines in Figure 1).The result is a relevance score for each joint angle at a specific time point.Details on the method and the mathematical background can be found in Bach and colleagues. 39The resulting relevance pattern, therefore, consisted of 1800 relevance scores.Since the neighboring data points in the input signals were dependent and represented time-related information, each relevance pattern was smoothed to reduce variability in the signal without influencing the original pattern.To this purpose, we weighted the current point with 50% of its relevance and neighboring points with 25% of its relevance scores.This smoothing process was repeated twice.The weights for smoothing were chosen to sum to 1, therefore a repetitive application would mimic a Gaussian filter. 7Finally, the normalized scores were averaged over all participants and rescaled to values ranging between zero and one, with values close to zero indicating the lowest relevance and values close to one the highest relevance.For better readability and comprehensibility, the relevance scores were visualized using a heat map that shows the relevance scores for each input variable over time.
All data analysis was performed using MATLAB version R2022b (The MathWorks Inc., Natick, Massachusetts, USA).The layer-wise relevance propagation toolbox by Lapuschkin and colleagues 40 was used for the implementation of the ANNs and all analyses based on LRP.The visualization and related processing steps are based on code published by Hoitz et al. 7 3 | RESULTS

| Group information
The number of individuals per group, running economy, and running distance for the different groups are provided in Table 1.All groups differed significantly in either running economy or weekly running distance.

| Effect of average versus multiple steps and different scaling methods
We explored the effect of an average versus multiple steps and two different scaling methods on the accuracy of running economy classifications with the 0.2 SD split as this dataset was considered the primary outcome of the study (see also section 3.3).
Using only the averaged kinematics from each participant resulted in a classification accuracy no better than chance (50% accuracy for running economy).Therefore, all 65 steps were used as input to the neural network.Rescaling the time-normalized joint angle time series to ±1, without any further normalization for intra-or interindividual differences resulted in the highest accuracy   score for running economy classification (60%).Using the z-transformation resulted in a slightly lower accuracy (57%).For the distance classification, the accuracy was 71% and 61%, respectively, for the different scaling methods.

| Classification based on running economy
When averaged over all leave-one-out validations, the average number of steps across all participants that were correctly classified in the poorer or better running economy group was 60% with the 0.2 SD split, and 62% with the 1 SD split.Random guessing would yield an accuracy of ~51% for both classification tasks (due to the uneven numbers in each group), thus demonstrating that both classifications performed better than random guessing.When separating only the female or male runners into groups of good and poor running economy, the classification accuracy was 32% for males, and 61% for females.Because the predictive accuracy did not increase with separate analysis for males and females, the remaining paper will focus on the findings with the mixed-sex group and the 0.2 SD split.The 0.2 SD split dataset was used as primary outcome given its larger sample size and thus better generalizability compared to the 1 SD split.Figure 2A depicts the percentage of steps correctly classified per participant for the 0.2 SD split, showing that the model performed with an accuracy of >80% for a large proportion of the participants (18/34), with an accuracy between 20% and 60% for 5/34 participant, and with very poor accuracy for the remaining 11 participants.Among the participants with very poor accuracy, eight were consistently grouped in the incorrect group (0% accuracy/100% inaccuracy).Figure S3 shows the accuracy for the 1 SD split.

| Relevance of kinematics to running economy classification
When using the scaling approach that normalized joint angles from ±180° to ±1, joints with larger angular magnitudes and range of motion had the largest relevance among the steps that were correctly classified.Specifically, knee flexion had the largest relevance, followed by hip and ankle flexion (Figure 3B).Pelvis tilt also had some relevance to the classification.Right leg mid-swing phase (60%-80% of gait cycle, also corresponding to left leg stance) and right leg stance phase (12%-25% of gait cycle, also corresponding to right leg stance) had the largest relevance to the classification (Figure 3A). Figure S4 shows the same figure, but with only the 20% most important contributors to the running economy classification.The contribution of sagittal-plane kinematics to the classification was largest (78.0%), followed by the frontal-plane (14.1%) and transversal-plane (7.9%).Figure 4 shows the timenormalized joint angles for the correctly classified steps in the better and poorer running economy groups.Results for the z-normalization are provided in the Figures S4-S7.

| Classification based on weekly training distance
When averaged over all leave-one-out validations, the average number of steps that were correctly classified across all participants in the higher or lower weekly distance group was 63% when using the ±180° to ±1 normalization.Random guessing would produce 50% accuracy for this classification.Figure 2B shows the classification accuracies per unseen participant.For most of the participants (8/15), the majority of steps (>80%) were classified correctly.Three participants had moderate classification accuracy (20%-80%), and four participants were consistently classified in the wrong group (<5% accuracy).
Figure 5 shows the relative contribution of each kinematic variable to the classification over a right leg stride.From this figure, it can be observed that the mid right swing (~55%-70%) and left swing phases (~0%-15%) were particularly relevant.The kinematic components that were most important included knee flexion, followed by hip flexion, ankle flexion, and trunk rotation (Figure 5). Figure S8 shows the 20% most important contributors to the distance classification.The relevance of each plane was 71.1%, 17.4%, and 11.5% for the sagittal, frontal, and transverse planes, respectively.Figure S9 shows the time-normalized joint angles for the correctly classified steps in the high and low distance groups.Figures S10-S12.

| DISCUSSION
The primary aims of this study were to investigate (A) if an ANN can accurately classify individuals as having higher or lower running economy based on their running technique, and (B) identify which kinematic components differentiate runners with higher or lower running economy from each other.A secondary aim was to explore the ability to differentiate runners with a higher and lower weekly distance, and to identify which kinematic components differentiate the two groups differing in distance.
The ANN classified individuals in groups of runners with better and poorer running economy based on their running technique, with accuracies up to 62%.When classifying individuals in groups with a higher or lower weekly running distance, the classification accuracy improved up to 71%.

F I G U R E 2
The accuracy of classifying participants in a group with better or poorer running economy (A; 0.2 SD split) or higher and lower weekly distance (B) per unseen participant using the scaling approach with joint angles rescaled from ± 180° to ± 1.The percentage depicts the relative number of steps that were correctly classified to the group with poorer or better running economy.For example, in panel B, ~90% of the steps from participant 8 were correctly classified in the poorer running economy group.Note that the participant ID's in panel A do not necessarily correspond to those in panel B.

| Classification accuracy for running economy
We allocated participants to two groups based on their energetic cost being 0.2 standard deviations higher or lower than the sample's mean.The ANN subsequently classified individuals in these groups based on their running technique with an accuracy of 60%.This is better than random guessing, which would yield an accuracy of 51%.The networks performance slightly increased to 62% when two groups were created based on a stricter 1 SD deviation from the mean energetic cost (Figure S1).The stricter separation may have increased the contrast between the groups, thus improving the classification accuracy, whereas the smaller training set may have reduced the ability to learn relevant features, thus only slightly improving accuracy overall.The smaller training data likely also explained the poorer classification accuracy when using an averaged step per participant as input, rather than multiple steps per participant, even though the use of multiple steps may also decrease the classification accuracy by introducing intra-individual stride-to-stride variability with some steps being less or more economical.Since previous research has shown differences in running kinematics between males and females, 15,21 we also explored whether the classification accuracy increased when separating groups by sex and subsequently classifying individuals as more or less economical.However, this resulted in a no better classification accuracy than the mixedsex classification, potentially due to the smaller training dataset.The findings from this analysis are therefore not further discussed.
The accuracy by which the ANN classified individuals as more or less economical is slightly better than a study where both novice and experienced coaches were unable to accurately classify runners as more or less economical based on visual analysis of their running technique. 41While coaches may not be able to discern and use all information that is present in the 3D kinematics from a video, an ANN can use high-dimensional data as input and consider all information present in the 3D kinematics of multiple steps to classify individuals as exhibiting higher or lower running economy.When our findings are combined with this previous study, 41 they, therefore, suggest that the poor accuracy of coaches may have partly reflected their inability to discern all information relevant to running economy classification.However, it is important to recognize that coaches are "trained" on performance data, with performance being determined by a combination of running economy, VȮ 2max , fractional utilization, and durability. 42,43Moreover, the observation that the ANN was not able to classify individuals according to their running economy with high accuracy also indicates that runners can have a largely similar F I G U R E 4 Time-normalized joint angles for correctly classified steps in runners with better running economy (green) and poorer running economy (red) with the ±180° to ±1 normalization.Bold lines reflect the group mean, while shaded areas indicate the standard deviation.Background colors depict the relevance of the kinematic component at a given time point as determined by layer-wise relevance propagation, with darker areas reflecting higher relevance.Vertical dashed lines at 40% depict toe-off for the right leg.
running economy with various techniques.This is likely because running economy is determined not only by running kinematics, but also by other factors such as fiber type 4,6,[44][45][46] and tendon stiffness. 47,48For example, a participant that has been practising various nonrunning sports (e.g., swimming and cycling) may have developed a relatively good running economy due to alterations in metabolic efficiency, despite a suboptimal running technique.Moreover, the most economical running technique may also depend on the anthropometrics (e.g., leg length and mass) of the individual, 5,49-53 which further complicates estimation of running economy from running technique.As such, the participation in other sports and variety in anthropometrics in our cohort may have reduced the ANN's ability to correctly classify individuals.

| Kinematics related to running economy
Considering that numerous other factors influence the most economical running technique, the classification accuracies of 60% (0.2 SD split) or 62% (1 SD split) may be regarded as reasonable.For this reason, we will discuss the most notable kinematic components that differentiated runners with poorer and better running economy.Such information may guide future research and training, although it should be interpreted with caution due to the moderate classification accuracy.
LRP showed that the mid-swing phase had overall the largest relevance to the running economy classification, followed by the mid-stance phase (Figure 3B).This is partially in line with the high metabolic cost of running to support body mass occurring during the stance phase. 54Knee flexion, hip flexion, and ankle flexion were most relevant to the classification (Figure 3C).Knee flexion exhibited the highest relevant during the mid-swing phase for both legs.During this period, more economical runners exhibited less knee flexion compared to runners with poorer economy (Figure 4).This finding is in line with a study by Folland and colleagues. 2 While a higher knee flexion reduces the energy cost by lowering the leg moment of inertia, it may also increase energy costs by requiring additional contraction of the gastrocnemii or hamstrings to further flex the knee, and of the knee extensors to extend the knee during the later swing phase. 5nterestingly, the period around peak knee flexion during the stance phase had relatively low relevance score and also differed minimally between the groups (Figure 4).This is in partial contrast to the findings of a recent systematic review that found a trend for larger peak knee flexion during the stance phase to be associated with a higher cost of running, likely because this results in a larger knee extension moment. 5However, the authors also noted that there were conflicting findings, possibly because a larger knee flexion angle also optimizes the force-length potential of the quadriceps. 5The net effect of these mechanisms is that both, more flexion and extension may be economical, which may explain the relatively low relevance of this phase to the classification in our study.The runners with poorer running economy in the present study did however exhibit a large knee range of motion during the stance phase, indicating a lower leg stiffness, in line with the findings of Van Hooren and colleagues. 5ip flexion was mostly relevant during the mid/late swing phase and mid stance phases.The hip flexion range of motion was typically larger in individuals with poorer economy, and the peak flexion angle occurred earlier in the swing phase.The larger and earlier flexion may require additional work of the hip flexor muscles, thereby increasing energy cost.Moreover, a notable difference during the stance phase was that the group with poorer economy showed a more pronounced hip flexion after initial contact (10%-15% for right leg hip flexion, 55%-60% for left leg hip flexion), while the group with better running economy showed a more continuous hip extension.The small hip flexion upon ground contact in the poorer economy group may reflect more braking, which in turn forces the hip into greater flexion.Indirect support for this hypothesis is provided in Figure S13, showing a slightly larger braking impulse during the initial contact phase in poorer economy runners.
Ankle extension exhibited the highest relevance during the early swing phase and mid stance phases (Figure 3B).Runners with poorer running economy generally showed a larger ankle extension after toe-off as compared to runners with better running economy (Figure 4).This finding is also consistent with the findings of the systematic review by Van Hooren and coworkers 5 and with a study were reductions in ankle plantar flexion at toe-off were observed when runners became more economical. 55A smaller ankle plantar flexion may optimize the force-length potential 56 and thereby reduce activation-related energy cost. 55Interestingly, during the stance phase the ankle angle was very similar between both groups, yet exhibited a high relevance (Figures 3B,  4).However, the variability in ankle angle was higher in the more economical group (Figure 4).Speculatively, this higher variability in the ankle angle during the stance phase may reflect a better ability to adapt to the varying constraints with each step. 57elvis tilt was another kinematic variable with some degree of relevance to the running economy classification (Figure 3C).The poorer economy group showed a higher overall anterior pelvic tilt, particularly after right leg toeoff (~4%0-48% of gait cycle; Figure 4).This was followed by a more rapid posterior tilt during the late swing phase and early ground contact of the left leg, although the absolute magnitude of anterior pelvic tilt remained higher for the poorer economy group.Mechanistically, anterior pelvis tilt can lengthen the hamstrings of the swing leg and this may store elastic energy in the tendinous tissues. 58The stronger posterior pelvic tilt in the runners with poorer economy prior to ground contact may, however, lead to more dissipation of this elastic energy, which in turn may require more energetically costly concentric work of the hamstrings and gluteus during the stance phase to ensure sufficient horizontal force production to maintain speed.While no other studies investigating pelvic tilt in relation to running economy have been identified in a recent systematic review, 5 a previous study also showed that highly trained (and presumably more economical) runners exhibited less anterior pelvic tilt throughout the gait cycle in comparison with recreational runners, in particular after toe-off and in the late swing phase similar to our findings. 59When combined with our results, these findings question the usefulness of instructions that promote a more anterior pelvic tilt as in "Chi-running". 60hile pelvis obliquity or trunk flexion did overall not show a high relevance to the running economy classification, there were periods with higher relevance.For example, there was a larger contralateral pelvic drop in the poorer economy group, in particular during the left stance phase (55-65% of gait cycle, Figure 4).Further, the poorer economy group exhibited a more backward trunk lean during the early stance phases and immediately after toe-off (Figure 4).This backward position may require increased activation of the abdominal muscles. 61urther, because the trunk accounts for approximately 50% of body mass, 62 the more backward position during the stance phases may also increase the knee extension moment, thereby increasing energy cost as compared to a more upright/forward trunk.This trunk angle may interact with the knee flexion angle in determining an economical technique.Specifically, runners with more backward trunk lean may extend their knee more to reduce the knee extension moment, whereas runners with a more forward trunk lean may have a reduced need for a large knee extension, and thereby speculatively adopt a larger knee flexion angle during stance to optimize the quadriceps force-length potential.

| accuracy for running distance
Weekly running distance has also been associated with altered running kinematics 15,16,22 and running economy. 23s a secondary aim, we therefore, investigated the ability of the ANN to classify individuals into a group with higher or lower weekly running distance, and we explored which kinematic components differentiated the two groups.
The ANN classified individuals better based on their weekly running distance than based on running economy, with the accuracy being up to 71% as opposed to 50% with random guessing.The better accuracy of the distance classification compared to the running economy classification may reflect a more direct relationship between running distance and kinematics as opposed to a more indirect relationship between running kinematics and running economy.The classification accuracy of our ANN (71%) is, however, lower than observed by previous studies that classified runners based on kinematics, 15,16 with accuracies varying between 90% and 93%.The lower accuracy is likely primarily explained by the smaller sample size (seven and eight individuals per group in the present study vs. 40-41 per group in 15,16 ), which in turn results from the more stringent group allocation criteria.However, our stricter allocation resulted in groups that differed substantially more in weekly running distance (97 and 3 km•w −1 in the present study; Table 1, vs. 44 and 15 km•w −1 in 16 or 50 and 20 km•w −1 in 15 ).Our comparison therefore better reflects a comparison of well-trained runners versus novice runners (tier 2-3 vs. tier 1 according to 63 ), whereas previous studies compared a broad mix of all recreational runners (all tier 1-2 according to 63 ).As such, our findings may provide more detailed information on potential kinematic adaptations with higher training volumes used by better trained individuals.

| Kinematics related to weekly running distance
The LRP analysis indicates that the stance phases had a lower overall relevance than the swing phases for distance classification (Figure 5).This contrasts with the running economy classification, where both the mid-stance and mid-swing phases had a relatively large relevance.While these differences should be interpreted with caution due to the small group sizes and low accuracies in both classifications, they may reflect real differences in the relevance of running technique to running economy or weekly distance.Specifically, while a high weekly running distance may contribute to reductions in the energetic cost via both kinematic and metabolic alterations (as also confirmed by a moderate correlation between weekly distance and the energetic cost of running of r −0.32 in our study), being able to complete a high running distance may require optimization of running technique components that may not directly contribute to a better running economy.Instead, optimization of these components may contribute to lower injury risk that in turn allows the completion of a high training volume.
Similar to the running economy classification, knee flexion had the highest overall relevance, with the swing phase being particularly relevant (Figure 5B).However, in contrast to the running economy classification, runners with a higher weekly distance exhibited a larger peak knee flexion during the swing phase, and this higher flexion also occurred earlier during the swing phase (Figure S6).Since the running economy classification showed more economical runners to exhibit less knee flexion (Figure 4), with this also being in line with a previous study, 2 the larger peak knee flexion in the higher distance group is unlikely a feature that improves running economy.Rather, the resulting lower limb moment of inertia may speculatively reduce local fatigue of the hip flexors, hereby contributing to a better durability at the expense of economy. 5,64In parallel, it has previously been argued that ultra-endurance runners may sacrifice running economy to reduce musculotendinous and osteoarticular damage, amongst others. 65In partial agreement with the running economy classification, runners with a higher weekly distance showed a smaller hip flexion angle at initial contact, a smaller hip flexion angle range of motion during stance, and also smaller peak hip flexion during the late swing phase (Figure S9).Similarly, runners with a higher weekly distance showed a smaller ankle extension at toe-off.These latter differences may therefore present adaptations due to a high running distance that also optimize running economy.
Trunk rotation was another kinematic component with a relatively high overall relevance to the distance classification (Figure 5C).Its relevance was particularly high during the early swing phases, with runners with a lower weekly distance showing a higher and delayed trunk rotation during these phases as compared to runners with a higher weekly distance (Figure S9).Moreover, runners with lower weekly distance showed an overall higher trunk rotation range of motion.While one prior study showed greater trunk rotation to be associated with a higher running energy cost, 2 our study did not show a high relevance of trunk rotation to running economy classification, nor a clear trend for higher or lower trunk rotation in the higher or lower economy group (Figure 4).Similarly, Williams and Cavanagh 6 found no significant differences in trunk rotation between groups differing in running economy.This may indicate that trunk rotation reflects a kinematic adaptation that is not directly related to running economy.Speculatively, smaller trunk range of motion may also contribute to les overburdening of individual muscles, thereby allowing for the completion of a higher distance.

| The effect of normalization
Our findings also show that the method used to normalize the input data for an ANN has a large impact on the resulting relevance scores obtained with LRP (c.f. Figure 3 and Figure S4), and also some impact on the classification accuracy.Specifically, the normalization method that only rescaled the kinematic data from ±180° to ±1 had a higher accuracy for the running economy classification than the z-transform method (60% vs. 57%, respectively).The ztransform ensures an equal contribution of all joints to the classification and thus avoids that joints with larger joint angles or range of motion are given greater relevance than those with smaller joint angles. 66For example, a difference of 0.16° between individuals for a joint with a small range of motion could be as important as a difference of 6.7° for a joint with a large range of motion with this normalization procedure.Indeed, with the z-transform, the relevance was more evenly distributed among different kinematic components, whereas the other normalization method resulted in higher relevance assigned to joints with larger angles and joint angular range of motion (i.e., sagittal-plane knee, hip and ankle angles; c.f. Figure 3 and Figure S4).The (slightly) lower accuracy for the z-transform method for the running economy classification may be explained by the reduced interindividual variability with this scaling approach.This, therefore, also suggests that absolute differences in joint angles and joint angular range of motion are relevant to running economy, in addition to the relative changes during a gait cycle.Nevertheless, regardless of the normalization method, sagittal-plane kinematics showed the highest relevance to both the running economy and distance classification.

| Methodological considerations
This study has several strengths, but also some limitations.Strengths include the inclusion of a wide range of performance levels, ranging from an Olympic level athlete to start-to-runners as opposed to a smaller range in previous studies. 15,16,22Moreover, we standardized running speed to reduce the confounding effect of speed on running biomechanics, while this may have influenced the findings of previous comparisons. 15A final strength is the inclusion of both legs, a large number of steps, and both the stance and swing phases for the classification.
The first consideration relates to the relatively small sample size.This may lead to overfitting of the training data, which decreases the generalizability of the findings.However, different hyperparameters were tested to find a combination with a high classification accuracy without overfitting toward single subjects.Further, we used the average result over multiple leave-one-participant-out validations to improve the generalizability and focussed our discussion solely on the results generated with the larger group (i.e., 0.2 SD split) for running economy classification.A second consideration includes the use of unstandardized shoes.This may influence running economy and running biomechanics and thereby reduce the accuracy for both the running economy and distance classification based on kinematics.However, we expect this effect to be small since we allowed no racing flats, nor shoes with thick cushioning or carbon plates and thus ensured relatively homogeneous shoe wear.Moreover, forcing athletes to wear standardized shoes to which they are not accustomed may also alter their preferred running biomechanics and thereby introduce error.Therefore, we believe that the use of unstandardized shoes, while limiting the range of shoes used, presents a good trade-off between altering biomechanics and energetics due to unaccustomed shoes on the one hand, but also altering biomechanics and energetics due to nonstandardization on the other hand.Third, a combination of kinematics and kinetics (e.g., ground reaction forces and joint moments) would likely improve the accuracy for both running economy and distance classification.For example, Xu et al. 16 showed the distance classification to improve to 95% when combining kinematics and kinetics as opposed to 90% with kinematics only.However, we purposely chose to only use kinematics as input to the ANN as these are more directly modifiable in practice.Further, we used time-normalized data as input to the ANN while it has been suggested that complete time series of arbitrary length instead of timenormalized time series may benefit the predictive accuracy of an ANN. 67,68However, we purposely chose to use timenormalized data to aid interpretation with LRP.The fourth and final limitation relates to the ANN itself.Specifically, readers may wonder whether the moderate classification for running economy could be due to the machine learning method used (i.e., ANN), and decisions in the ANN implementation such as the activation function and number of layers.Concerning the former, Horst and colleagues 69 found that different machine learning methods resulted in largely similar conclusions with regard to the relevance of kinematics and kinetics to classification, suggesting that other machine learning methods would have yielded largely similar results.This also confirms that the observed findings are not reflecting arbitrarily random kinematics, but rather dynamically meaningful features that have functional relevance, despite some of the differences being small.Different functions have been shown have a negligible impact on the predictive ability of the ANN, 19 and a single hidden layer has been reported to be sufficient for learning most functional relationships. 38While, some studies indeed report a single layer yields similar predictive accuracy for biomechanical classification tasks compared with multiple hidden layers, 69 other studies have reported improvements in accuracy with multiple layers. 70We therefore explored the impact of an additional hidden layer on the classification accuracy, but found the improvement to be only marginally (up to 2% for the 0.2 SD split in running economy classification).This, therefore, suggests that these decisions had only a minor impact on the classification ability.Similarly, we also found that different normalization methods produced largely similar classification accuracies, although less normalization generally improved the classification accuracy (Section 3.2).Related to this, the ANN implicitly assumed that each data point is statistically independent (i.e., coming from a different participant).However, time series data from the same participant is positively correlated (e.g., two data points representing the knee angle at a specific instant in time such as during two different steps are likely to exhibit relatively similar values when they come from the same participant).Modeling of this dependency may further improve the predictive ability of the ANN (e.g., 71 ).Indeed, within a recurrent neural network, prior inputs can influence the current input and overall output, and this approach has been shown to provide more accurate predictions of biomechanical time series data (~25/50% reduction in root mean squared error) than other neural network approaches such as the ANN used in our study. 17A final consideration related to the classification ability is that individuals clustered in one group based on their kinematics at one speed may not be classified in the same group at other speeds, thus cautioning generalization to multiple speed. 72

| Perspective
The findings of this study have several implications for practitioners and researchers.First, the finding that the ANN was able to correctly classify the individuals as more or less economical based on their running technique better than random guessing (Figure 2A), indicates that running technique does contribute to running economy.Nevertheless, the moderate accuracy also suggests that running economy is a multifactorial component, with running technique being only one of the many influencing factors.This complicates visual assessment of the most economical runners based on their technique, as attempted in a recent study. 41Nevertheless, our ANN revealed several components that contributed to the running economy classification, with more economical runners showing less knee flexion during swing, a more continuous hip extension during initial contact and less hip flexion during late swing, and a smaller ankle extension after toe-off among others (Figures 4 and 5).However, it is important to emphasize that these findings should be interpreted with caution due to the moderate accuracy of the ANN, and the small differences for some components.][75][76] Second, our findings indicate that running technique is associated with running distance.Although we cannot infer any cause-effect relationship with our crosssectional design, previous studies have shown alterations in running technique following a start-to-run program that gradually increased training volume. 55This suggests that the differences between the high and low distance runners may at least partly be a consequence of a high weekly distance.While we observed a moderate inverse correlation between the weekly distance and the energetic cost of running (r −0.32), the kinematics with the highest relevance were only partly similar between the running economy and distance classification (Figure 4 and Figure S9).While these findings should also be interpreted with caution due to the low sample size, they suggest that a high running distance may optimize running technique components that may not always directly contribute to a better running economy.Instead, optimization of these components may contribute to minimizing overburdening of individual structures 5 and thereby lower injury risk, which in turn allows the completion of a high training volume.This higher volume in turn may contribute to metabolic adaptations that may indirectly improve running economy.
Third, although previous research showed that frontal-and transverse-plane kinematics contain the most relevant information for subject identification during running, 7 our findings suggest that sagittal-plane kinematics have the most important contribution to running economy or weekly distance classification.In other words, while transverse-plane kinematics may be unique to an individual, they may not be most informative for assessing running economy or weekly distance in general.Importantly, this effect was similar across different scaling methods (i.e., also when joints with larger angles were scaled to a similar range to give equal importance to joints with smaller angles).

| CONCLUSION
The findings of this study demonstrate that an ANN was able to classify runners as being more or less economical based their running technique with accuracy of up to 62%.Moreover, classifying runners as having higher or lower weekly distances resulted in an accuracy of up to 71%.The kinematic components that contributed to the classification may contribute to future research and training that aims to improve running economy, although it should be emphasized that the classification accuracy from which the kinematic components were derived was only moderate.

1
Mean ± SD energetic cost and weekly running distance for the different groups.

F I G U R E 3
Average absolute relevance of each kinematic component within the right stride for classifying participants based on their running economy (0.2 SD split) with the ±180° to ±1 normalization.In the centre (A), darker colors indicate variables of high relevance, while lighter colors indicate variables of low relevance to the group classification.The top part (B) shows the summed relevance for each of the 100 time points.The right part of figure (C) highlights the summed relevance of each kinematic component to the classification.

F I G U R E 5
Average absolute relevance of each kinematic component within a stride pattern to classifying participants according to their weekly distance with the ±180° to ±1 normalization.In the centre (A), darker colors indicate variables of high relevance, while lighter colors indicate variables of low relevance to the group classification.The top part (B) shows the summed contribution of relevance for each of the 100 time points.The right part of figure (C) highlights the summed relevance of each joint angle to the classification.