Artificial Intelligence‐Powered Acoustic Analysis System for Dysarthria Severity Assessment

Dysarthria is common in movement disorders, such as Wilson's disease (WD), Parkinson's disease, or Huntington's disease. Dysarthria severity assessment is often indispensable for the management of these diseases. However, such assessment is usually labor‐intensive, time‐consuming, and expensive. To seek efficient and cost‐effective solutions for dysarthria assessment, an artificial intelligence (AI)‐powered acoustic analysis system is proposed and its performance in a valuable sample of WD, an ideal disease model with mainly mixed dysarthria, is verified. A test‐retest reliability analysis yields excellent reproducibility in the acoustic measures (mean intraclass correlation coefficient [ICC] = 0.81). Then, a system for dysarthria assessment is trained with WD patients (n = 65) and sex‐matched healthy controls (n = 65) using a machine learning approach. The system achieves reasonable performance in evaluating dysarthria severity with either stepwise classification or regression (all areas under the curve >80%; mean absolute error = 6.25, r = 0.79, p < 0.0001). The diadochokinesis and sustained phonation tasks contribute the most to prediction, and the corresponding acoustic features can provide significant and independent contributions. The present study demonstrates the feasibility and good performance of the AI‐powered acoustic analysis framework, offering the potential to facilitate early screening and subsequent management of dysarthria.

Dysarthria is common in movement disorders, such as Wilson's disease (WD), Parkinson's disease, or Huntington's disease.Dysarthria severity assessment is often indispensable for the management of these diseases.However, such assessment is usually labor-intensive, time-consuming, and expensive.To seek efficient and cost-effective solutions for dysarthria assessment, an artificial intelligence (AI)-powered acoustic analysis system is proposed and its performance in a valuable sample of WD, an ideal disease model with mainly mixed dysarthria, is verified.A test-retest reliability analysis yields excellent reproducibility in the acoustic measures (mean intraclass correlation coefficient [ICC] = 0.81).Then, a system for dysarthria assessment is trained with WD patients (n = 65) and sex-matched healthy controls (n = 65) using a machine learning approach.The system achieves reasonable performance in evaluating dysarthria severity with either stepwise classification or regression (all areas under the curve >80%; mean absolute error = 6.25, r = 0.79, p < 0.0001).The diadochokinesis and sustained phonation tasks contribute the most to prediction, and the corresponding acoustic features can provide significant and independent contributions.The present study demonstrates the feasibility and good performance of the AI-powered acoustic analysis framework, offering the potential to facilitate early screening and subsequent management of dysarthria.conducts the test, which might restrict its usage in developing countries where professional neuropsychological faculties are lacking. [6,7]Meanwhile, additional adaptation and validation efforts are required for application in different countries, further increasing assessment costs.However, from the viewpoint of speech signal processing, dysarthric symptoms, such as voice tremor, rhythm disorder, low tone, and speech dysfunction, are distinct acoustic feature patterns.0] Recent years have witnessed an emergence of automated acoustic measures in movement disorders with encouraging and promising results. [8,11,12]Vásquez-Correa and colleagues used acoustic measures to predict m-FDA scores and several speech deficits in PD patients. [13,14]In addition, they distinguished PD patients with severe dysarthria from mild dysarthric symptoms using glottal-related acoustic features, which provides a potential acoustic marker of dysarthria severity. [15]Similarly, Riad et al. indicated an association between phonatory features and clinical manifestations of HD. [16] On the other hand, the data-intensive nature in healthcare makes it one of the most promising potential applications for artificial intelligence (AI), facilitating the development of an AI-powered clinical decision support system (CDSS). [17,18]CDSSs enhance medical decisions by reasoning with computer-encoded clinical knowledge and healthcare information to improve medical practice and to provide assessment, treatment, monitoring, and long-term care planning recommendations. [19,20]Therefore, it is highly promising to develop automated, accurate, cost-effective, and easily deployable CDSSs based on acoustic measures for dysarthria assessment.
Current research on automated acoustic measures is primarily focused on aids to disease diagnosis and rarely on severity evaluation, let alone a suite of clinical decision-making or standardized assessment protocols for dysarthria.Reliability and validity evaluation of a new medical method or tool are prerequisites for clinical use.
To bridge the gap, we developed an AI-powered acoustic analysis system in the CDSS framework for dysarthria severity assessment, as illustrated in Figure 1.This system explicitly divides people into healthy, mild dysarthria, and moderatesevere dysarthria based on acoustic analysis.Meanwhile, the proposed system provides a report to assess and explain the dysarthria severity of the individual.There are four core decision-making steps.1) Prescreening: At the outpatient clinic, classification model 1 (CLF1) is applied to initially screen the subject as a healthy or dysarthric individual.2) Subdividing: In clinical practice, classification model 2 (CLF2) is further applied to the subject with dysarthria to determine whether he/she has mild or moderate-severe dysarthria.3) Refining: Due to the proximity of the acoustic characterization between healthy and mild dysarthric individuals, they are easily confused.There is a possibility of under-and misdiagnosis for healthy and mild dysarthric individuals, obtained from STEP 1 and 2, respectively.Therefore, the classification model 3 (CLF3) is implemented at this stage to further differentiate between healthy and mild dysarthric individuals.4) Scoring: A regression model (REG) will be used to assess the severity of the subject classified with mild or moderate-severe dysarthria quantitatively for subsequent treatment and monitoring.
To verify the feasibility and validity of the proposed AI-powered acoustic analysis system, we also explored the reproducibility of automated acoustic measures in healthy populations.Finally, the validity of the proposed system was investigated in a WD patient cohort.Dysarthria is the most common prodromal neurological symptom in WD patients, [21][22][23][24] and the clinical manifestations of dysarthria in WD are complex and mostly mixed type compared to other movement disorders. [1,22]or example, PD patients present with hypokinetic dysarthria, and HD patients present with hyperkinetic dysarthria.A successful analysis system in WD will make it possible to extend further to other diseases with different types of dysarthria, such as PD or HD.
Our results demonstrate for the first time that computerized, automated acoustic measurement is a promising approach to dysarthria severity evaluation for WD patients.The proposed AI-powered acoustic analysis system is easily deployed on a large scale, providing clinicians with effective and efficient decision aids in the early detection and monitoring of dysarthria-related disorders.

Results
We limited the number of acoustic features in the model to reduce the probability of a type II error and to reduce potential overfitting.Subjects completed three structured speech paradigms specially designed for dysarthria assessment in the order of sustained phonation (SP), [25,26] fast syllable repetition (also known as sequential motion rates or diadochokinesis (DDK)), [25,27] and [s:] to [z:] (S/Z) ratio. [28]These task paradigms are simple to operate and have been proven effective in assessing dysarthria.We calculated six representative acoustic features from the three speech tasks.][31] The procedures for the three aforementioned speech tasks can be found in Table S1, Supporting Information.

Reliability and Validity of Acoustic Features
Figure 2 shows ICC estimates with a 95% confidence interval (CI) for the six acoustic features extracted from the healthy population.According to Koo and Li's guidelines, [32] almost all features generally achieved good reliability (ICC = 0.81 AE 0.07).More specifically, interpreting the reliability level based on the reported 95% CI of the estimated ICC, maximum phonation time (MPT) was good to excellent; DDK rate, voice onset time (VOT), and pause duration were moderate to excellent; DDK regularity was moderate to good; and S/Z ratio (SZR) was poor to good.
The median range of each acoustic feature, as well as the results of normality tests and group comparisons, is listed in Table S2, Supporting Information.The results showed that all features had a significant effect on dysarthria severity.As shown in Figure 3, a posthoc test using Conover's test with Benjamini-Hochberg FDR correction, revealed significant differences in MPT between healthy and moderate-severe dysarthric groups (p < 0.05), in both DDK rate and pause duration between all groups (healthy vs mild: p < 0.0001; healthy vs moderate-severe: p < 0.0001; mild vs moderate-severe: p < 0.001), in DDK regularity between healthy vs moderate-severe (p < 0.05) and mild vs. moderate-severe (p < 0.05), in VOT between healthy vs mild (p < 0.01) and healthy vs moderate-severe (p < 0.01), and in SZR between healthy vs mild (p < 0.05) and healthy vs moderate-severe (p < 0.05).
The heatmap shows the results of correlation analyses adjusted for multiple comparisons (Figure 4).We found significant positive correlations between m-FDA score with MPT and DDK rate and significant negative correlations with DDK regularity and pause duration.The results revealed the potential of acoustic features to predict dysarthria severity.In addition, correlations were found between some features, including significant negative correlations between DDK rate with DDK regularity, VOT, and pause duration, and significant positive correlations between VOT and pause duration, indicating possible collinearity between a few features.

Principal Factors of Acoustic Features Based on PCA
The principal component analysis (PCA) results showed that all three speech tasks had strong loadings on the principal components, and each task contributed in widely different directions (Figure 5).The corresponding six acoustic features showed different magnitudes and directions in the coordinate space composed of the first two principal components.This indicated that their potential information was different.In particular, the DDK rate had the most substantial loadings on the first principal component and was negatively correlated, while DDK regularity had the best loadings on the second principal component and was positively correlated.

AI-Powered Decision-Making for Dysarthria Assessment
The results of the performance metrics for each model in discriminating the dysarthria severity of WD and corresponding optimal hyperparameters are shown in Table S3, Supporting Information.Figure 6a illustrates the receiver operating characteristic (ROC) curves for each binary classification task on the test set.In the classification task of identifying the healthy control with the dysarthric group, the classifier achieved an area under the curve (AUC) of 88.84% on leave-one-out cross-validation (LOOCV) and 86.05% on the test set with 84.21% sensitivity and 80% specificity.The model detected mild dysarthria in WD from the healthy group with an AUC of 83.18%.Moreover, in terms of distinguishing the dysarthria severity in WD, the final model had an AUC of 80.81% to discriminate between mild and moderate-severe dysarthria.Figure 6b indicates that based on acoustic features, the REG better predicted the dysarthria scores of WD patients (mean absolute error (MAE) = 6.25;Spearman's rank correlation coefficient r = 0.79, p < 0.0001).

Global Explanations for REG
A quantitative analysis for global explanation of the REG was conducted with the Shapley Additive exPlanations (SHAP) algorithm.The SHAP value generated by the algorithm could quantify the contribution or impact of each feature on the output of the model.Specifically, the larger the absolute SHAP value, the more important the corresponding feature.As shown in Figure 7, SHAP summary plot, a visually concise figure, represented the range and distribution of each feature's SHAP value.In the figure, each dot, representing the SHAP value of each feature from a sample, was plotted horizontally and stacked vertically to show the density of the same SHAP value.Meanwhile, the dots were colored by the feature value from low (blue) to high (red).The features were sorted by their average absolute SHAP values, which can be used to indicate their global importance.We can see that DDK rate contributes most to the REG and has a positive effect on the output of the REG, meaning that as the feature value increases, the regression value or the predicted m-FDA score gets larger.In contrast, pause duration is the uppermost negative effect feature, indicating that lower values of pause duration could lead the model to predict more severe dysarthria.

Discussion
Dysarthria is the most common neurological manifestation of movement disorders and is the initial symptom in about half of WD patients. [22,33]The speech signal is an essential digital biomarker for aiding early diagnosis and interventions in dysarthria-related disorders. [27,34]The present study proposes an AI-powered acoustic analysis system in the CDSS framework that integrates machine learning methods.Our results indicate a good reproducibility of the acoustic measures.Because WD can cooccur with both hypokinetic and hyperkinetic, mostly mixed, types of dysarthria, making it an ideal disease model for developing and validating automatic acoustic analysis system, we verify its efficacy with a valuable sample of WD.As far as we know, the present study is the first attempt to build an acoustic analysis system capable of assessing WD dysarthria.

Reliability and Validity of Automated Acoustic Measures
Rigorous science requires prior knowledge about measurement error of the research tools used.Therefore, it is necessary to first estimate the reliability of the assessment tools. [35]Printz et al. reviewed the literature on test-retest accuracy of the automated voice range profile (VRP) assessment for treatment of voice disorders, reporting high reliability and clinical usefulness of VRP, as well as uncertainty due to the low sample size and different procedures. [36]A recent study by Almaghrabi et al. investigated the reproducibility of bioacoustic features, showing that the stability of such features was easily affected by sample duration, speech task, and gender. [37]Despite the widespread interest, studies on reliability of automated acoustic measures still remain scarce, especially since most previous research has focused on voice disorders or psychiatric illnesses, and there has been no research on the reliability of acoustic tasks and features specific to dysarthria.To our knowledge, the present study is the first attempt to explore the reproducibility of automated acoustic measures in dysarthria using AI.Our findings suggest good robustness of the proposed acoustic analysis system.In particular, each feature of SP and DDK tasks can be regarded as an indicator with good reliability (Figure 2).In contrast, the reproducibility of SZR is slightly poorer, and further study on the test-retest reliability in the dysarthric population will be performed later.
Significant differences in acoustic features between healthy and other movement disorder groups have been found. [27,38]e further demonstrated the validity of automated acoustic measures for differentiating dysarthria severity through between-group difference analyses.Notably, DDK rate and pause duration in the DDK task showed the most significant differences between each group comparison, and both effect sizes were large (η 2 ≥ 0.14; Figure 3b,e and Table S2, Supporting Information), indicating that the more severe the dysarthria, the fewer syllables per second and the more silent gaps.
Furthermore, a similar conclusion can be drawn from the correlation analysis.The m-FDA scores were positively correlated with MPT and DDK rate while negatively correlated with other acoustic features.Thus, our experimental result again demonstrated that acoustic features could quantitatively characterize dysarthria severity.

Independence of Acoustic Features
The cross-information provided by vast numbers of acoustic features from multiple speech tasks can help reveal important connections that cannot be captured by individual features, but may lead to duplicate or useless information across tasks and features.To address this challenge, feature engineering is widely used to reduce the number of features and enhance the generalization performance of models.In this study, we systematically examined the independence of acoustic tasks and features, especially the correlation of different features  and the distribution of principal components, aiming to provide practical methods for targeting tasks and features to enhance assessment validity and efficiency.As seen in Figure 5, the different speech tasks and features showed different magnitude and direction vectors, indicating that their roles are independent and nonoverlapping, thus potentially providing complementary information for dysarthria assessment. [39]oreover, compared to the nonsignificant correlations between features across tasks, there were significant negative correlations between DDK rate with DDK regularity, VOT, and pause duration in the DDK task, and significant positive correlations between VOT and pause duration (Figure 4).This is attributed to the fact that these acoustic features, to some extent, reflect the precise coordination ability of the articulators. [27]lthough it implies that a few features in the DDK task might provide overlapping information in predicting dysarthria, the PCA results suggest that the potential information of their contribution to the principal component was not consistent and thus could provide additional information to the prediction model.

Contributions of Acoustic Measures in Dysarthria Assessment
For classification tasks with different levels of dysarthria, we found that the AUC performance of the final model exceeded 80% in all cases.The best performance was found for CLF1 in screening dysarthria, with an AUC of 86.05% on the test set.In contrast, the AUC for CLF3 in detecting mild dysarthria from the healthy group was 83.18%.Although the acoustic characteristics of some healthy samples were similar to mild dysarthria, leading to easy confusion with a specificity of 75%, the model was sensitive enough for the mild dysarthric samples with a sensitivity of 90.91%.Notably, CLF2 performed the worst in discriminating between mild and moderate-severe dysarthria, indicating that the boundaries between the acoustic manifestations were unclear, making it challenging to distinguish dysarthria severity further.For regression analysis, the MAE of REG was 6.25, and there was a highly significant correlation between the predicted and actual scores, further demonstrating that automated acoustic measures are well suited to assess dysarthria severity.
The results of the interpretability analysis for the REG showed that DDK rate, pause duration, MPT, and DDK regularity contributed significantly to the prediction performance, which is consistent with the findings of the correlation analysis.MPT is an easy-to-measure but meaningful acoustic indicator in the SP task.Compared to healthy populations, patients with dysarthria have glottal dysfunction and inadequate closure of the vocal folds, which probably reduce glottal resistance and increase airflow, resulting in shorter phonation times. [28,40]As expected, MPT showed a downward trend with worsening dysarthria.
In particular, the contribution of the DDK task was the most prominent.The DDK task measures the stability and coordination of articulation.Its defects indicate that WD patients may suffer from the uncoordinated movement of the vocal organs or inefficient lip and tongue movements, which inevitably slow down the DDK rate and increase silent gaps and the instability of syllable voicing. [25,27]A previous study of hypokinetic dysarthria provided the first multicenter and multilanguage study of acoustic biomarkers. [41]The study showed cross-language consistency of acoustic measures based on the DDK task in hypokinetic dysarthria.At the same time, trends in speech changes were generally consistent across languages at different dysarthria severity levels among patients with idiopathic rapid eye movement sleep behavior disorder.For hyperkinetic dysarthria, a recent study found increased DDK regularity and prolonged pauses in patients with manifest stage HD compared to controls; both prodromal and manifest HD exhibited slowed DDK rates and unstable articulator steady-state positions. [42]The evidence suggests that acoustic measures can distinguish between different disease stages of HD and thus have the potential to assess disease progression.The previous and current evidence reveals that acoustic measures, especially the DDK task, can provide supporting diagnostic biomarkers for different dysarthria in a multilanguage setting.Thus, CDSS based on automated acoustic measures can be used as a potential language-independent method to assess different types of dysarthria disorders.
Studies have shown that different etiologies or types of dysarthria can exhibit some common acoustic features, that is, different neuropathophysiologies may have similar representations at the acoustic speech level. [43]Therefore, the same acoustic measures and features may be generalized for identifying different dysarthria types and etiologies.However, it raises a new question: How can we observe specific acoustic features in different types of dysarthria?Well-trained clinicians or individuals with little intensive training can make reliable distinctions between different dysarthria types, [44] suggesting the existence of perceptual information or relevant acoustic properties that distinguish dysarthria types.On the other hand, the type and severity of dysarthria are mostly related to the basal ganglia, cerebellum, and thalamus.In contrast, intracranial lesions in WD patients with dysarthria are mostly concentrated in the lentiform nucleus, midbrain, pons, and caudate nucleus.Therefore, future studies should incorporate brain magnetic resonance imaging (MRI) to explore the perceptual and acoustic indicators that distinguish different etiologies or types of dysarthria.

AI-Powered CDSS Framework based on Acoustic Measures
AI-powered CDSS represents a paradigm shift in healthcare. [19]ospitals are increasingly interested in evaluating and implementing CDSS to improve clinical diagnoses and outcomes, where automated systems may be seen as regulated advisors. [17,45]This article developed an AI-powered acoustic analysis system in the CDSS framework for dysarthria assessment to enhance the complex decision-making process of clinicians.The proposed acoustic analysis system is fully automated.The clinician records only the interviewee's speech data, and the system returns decision support with a dysarthria assessment report.
Moreover, the proposed system can be utilized as a component-based framework with an open architecture such that each functional component is replaceable and extensible.Any speech task, feature extraction process, AI algorithm, and interpretation module can be integrated into the framework to achieve universal language-independent decision support.Instead, models trained for other movement disorders with dysarthric symptoms can be served as domain-specific knowledge bases to aid clinical decision-making.The last module of the framework is used to interpret results with CIs and features' contributions to assist clinicians in making fast and accurate decisions along with informing treatments and interventions.In this way, the framework overcomes the imbalance between clinicians' trust and skepticism in AI systems, [17] achieving an optimal partnership between clinicians and CDSS to realize the clinical benefits of AI-powered CDSS.

Limitations
The current research has several limitations.First, although the study used a reasonably large patient cohort, given the scarcity of WD data, and the Synthetic Minority Over-sampling Technique (SMOTE) algorithm is adopted to balance the data between the subgroups, the sample size may still be insufficient for constructing a robust machine learning model, especially since all the data were collected from one single medical center.Meanwhile, the control group is five years younger than the WD group on average.Although there might not be an age effect expected in such a low age of participants, we still cannot exclude that this imbalance might result in discriminatory results.Multicenter studies with more matched enhanced sample sizes would be necessary in the next step.Second, the present study only used crosssectional datasets from WD and did not conduct a test-retest reliability survey in the disease population, which might not shed light on the longitudinal changes during the WD progression.Third, the study is limited to mandarin-speaking subjects, and we plan to evaluate the CDSS system for cohorts of different language groups to evaluate its generalizability.
Finally, compared to functional vocal tasks, connected speech provides broader and deeper language-level information on phonology, lexical semantics, morphosyntax, syntax, and discourse pragmatics, while reflecting cognitive status, although it is difficult to measure some dysarthria-specific metrics, such as MPT, SZR, and VOT.Moreover, although we have carefully considered the reliability and interpretability of the acoustic features, there is a lack of neurobiological-level interpretation of the AI system.Applying advanced neuroimaging techniques would allow observing correlating brain structures and functions.Therefore, we plan to combine neuroimaging, natural-connected speech tasks, and AI technology to explore the cognitive factors and neuromechanisms of dysarthria.

Conclusion
In this article, we proposed an AI-powered acoustic analysis system for dysarthria assessment and verified its performance in WD patients.The reproducibility of acoustic measures was validated in a healthy population.Our results suggested that the DDK and SP tasks are two beneficial paradigms that can provide mutually complementary acoustic features for subsequent automatic analysis.Equally important is that the proposed AI-powered acoustic analysis system is acoustic based and thus language independent, making it a potentially promising avenue for large-scale screening, monitoring, and even remote assessment of dysarthria.Finally, to the best of our knowledge, the present study is the first attempt to develop and validate an AI-automated acoustic analysis system for WD dysarthria severity assessment.

Experimental Section
Subjects: We recruited 130 native Mandarin speakers, including 65 WD patients with dysarthria and 65 sex-matched healthy controls without dysarthria.Although WD is a rare disease, the sample size meets the standards of this field and the requirements of the dysarthric research guidelines. [3,27]All patients were diagnosed according to the Chinese WD guidelines and clinically assessed for dysarthria using the Chinese m-FDA. [46]The m-FDA contains 29 sub-items assessing articulatory organ function, each with 0-4 points, for a total score of 116.Subsequently, the dysarthria was divided into two levels based on the final m-FDA scores, including mild dysarthria (96-115 points) and moderate-severe dysarthria (<96 points).Two professionally trained neurologists determined the distribution of dysarthria types in all patients by perceiving and WD patients' clinical manifestations, including 59 mixed types (54 ataxic-spastic and 5 ataxic-flaccid), 4 ataxic, and 2 spastic types.As WD is commonly present in adolescents, [47] the sample age was concentrated on young subjects who were sex matched in the healthy and WD groups.Table 2 shows the demographic and clinical characteristics of each subgroup.For more detailed clinical information, including blood biochemical measures, brain MRI diagnoses, dysarthria type, medication use, and standardized neuropsychological testing scales for each WD patient, see Table S4, Supporting Information.All subjects received written informed consent forms and had no cognitive impairment, mental disorder, primary language deficits, or substance abuse.The study was approved by the ethics committee of Hefei Institutes of Physical Science, Chinese Academy of Sciences, and was conducted following the Declaration of Helsinki (reference number: SWYX-Y-2021-51).
Standardized Speech Data Acquisition: The automated speech assessments were performed in a quiet examination room with low ambient background noise (less than 45 dB C-weighted).The speech was recorded through a cardioid condenser microphone (AT2035, Audio-Technica, Japan) placed about 10 cm in front of the subject's mouth.The Values are presented as mean/standard deviation (range) or integers.
microphone was connected to a professional USB audio interface (Scarlett Solo 3rd Gen, Focusrite Audio Engineering, High Wycombe, UK).In addition, we put a cylindrical physical noise reduction sponge (PF8, Alctron, Ningbo, Zhejiang, China) on the microphone to reduce the background noise further.The audio signal was transferred to a laptop computer (Thinkpad T430, Lenovo, Beijing, China) with a sampling frequency of 44.1 kHz and 16-bit resolution and saved as the mono lossless WAV audio format.
A trained neurologist assessed all subjects with the help of the 'Quick Cognitive Linguistic Test' (QCLT) GUI developed by our group.The QCLT is an integrative solution ranging from dysarthria evaluation to language and cognitive ability assessment.All assessment operations were conducted through convenient mouse-clicking interactions.Subjects completed three structured speech paradigms.For the test-retest reliability study, we acquired two recording sessions from 41 of all healthy individuals, each 3-4 days apart.In contrast, recordings from the other subjects were collected in a single session.
Acoustic Feature Extraction: Feature extraction was performed using Parselmouth, [48] a publicly available Python package, while audio preprocessing and segmentation were conducted using embedded algorithms of Praat, an open-source linguistic software. [49]ustained Phonation Task: The SP task is a standard speech assessment paradigm, reflecting the stability of vocal fold vibration. [15]The task was used clinically to evaluate laryngeal function. [50,51]The metric for this task was the MPT, a simple, time-saving, and noninvasive aerodynamic measurement, defined as the maximum time a person can maintain the vowel sound of [a:] at a comfortable pitch and loudness on one exhalation. [26,33,52]Previous studies found a correlation between low MPT and laryngeal pathology. [26,28]It was due to increased air leakage through rima glottidis and a significant decrease in glottal resistance in the case of vocal dysfunction, resulting in inadequate closure of the vocal folds and eventual shortening of MPT. [40,52]ast Syllable Repetition Task: The task is widely used to evaluate the articulation ability, [25] especially the coordination of the laryngeal muscle groups and the supralaryngeal articulators. [31]Previous studies have shown that dysarthria may be caused by insufficient tongue elevation and contraction for stops and fricatives, and many patients with dysarthria exhibit significant deficits in the ability to make rapid articulator movements during the DDK task. [31,53]We calculated four acoustic features based on this task, including DDK rate, DDK regularity, pause duration, and VOT, as described in Table 1.
S/Z Ratio Task: The S/Z ratio task is a standardized test for measuring vocal function.The SZR, calculated as the ratio of the most prolonged duration that the subject can maintain the phoneme [s:] and [z:], is a sensitive pathological indicator of laryngeal dysfunction. [28]As pronouncing the [z:] requires vocal folds vibration, any pathology of the vocal folds may cause glottal dysfunction, resulting in insufficient closure of the vocal folds and increased airflow, ultimately leading to a shortened duration of voiced consonants such as [z:]. [28,54]Thus, the higher the SZR, the greater the risk of phonation problems.
Statistical Analysis: To verify the reproducibility of the proposed acoustic analysis system, we performed a test-retest reliability analysis for 41 healthy controls' acoustic features, using an average-measurement, absolute-agreement, and two-way mixed-effects model to calculate the ICC as a measure. [32]The Kruskal-Wallis and posthoc tests using Conover's test with Benjamini-Hochberg FDR correction were applied to the between-group difference analysis for the healthy, mild, and moderate-severe dysarthric groups.For correlation analyses of m-FDA scores with each feature and between features, Spearman's rank correlation tests were used, and the resulting two-tailed p-values were adjusted for multiple comparisons using the Holm-Bonferroni method.
In addition, a PCA was performed to explore the contribution of different speech tasks and corresponding acoustic features to the discrimination ability of dysarthria severity. [39,55]Each feature was normalized by performing Z-score and projected to a new coordinate space consisting of the first two principal components with the largest variance.
AI-Powered Decision-Making Implementation: For stepwise implementation of the clinical decision support for dysarthria severity (see the decision-making step in Figure 1), we built three binary classification models and a REG sequentially using the random forest algorithm, where the classification models included healthy versus dysarthric group (CLF1), mild versus moderate-severe dysarthric group (CLF2), and healthy versus mild dysarthric group (CLF3).
For the classification task, we split the entire dataset into training and test sets in a 7-to-3 ratio stratified fashion, ensuring the same proportion of each group as the initial dataset.Then the training set was normalized, and the test set was transformed to prevent data leakage.[58] We calculated the AUC corresponding to the ROC, sensitivity and specificity as evaluation metrics for the classification task.
Our proposed CDSS is a stepwise dysarthria severity clinical decision support that first classifies interviewees into healthy versus dysarthric groups using CLF1, with subsequent REG applied for the dysarthric group only, resulting in a dysarthric score.For the regression task, we divided the dataset of WD patients into training and test sets in the ratio of 7 to 3. The MAE was used as the evaluation metric for this task.The optimal hyperparameters of all models were obtained by grid search, and then LOOCV was performed on the training set, and performance testing was conducted on the test set.It should be noted that since the m-FDA is designed primarily for patients, and healthy individuals usually receive full scores, the regression task is only performed on WD patients.
Finally, the SHAP algorithm, originating from game theory with the aim to eliminate the black-box effect of machine learning models, [59,60] was used to illustrate the importance of individual features and their impact on the output of REG.Such analysis was expected to provide a global explanation for the model and potentially help us better understand how to select important acoustic features to improve model performance.

Figure 1 .
Figure1.An AI-powered acoustic analysis system for dysarthria severity assessment.

Figure 2 .
Figure 2. Test-retest reliability of acoustic features, measured by ICC with 95% CI.According to Koo and Li's guidelines, the three vertical dashed lines from left to right indicate the thresholds sequentially for moderate, good, and excellent reliability levels.

Figure 5 .
Figure 5. Biplot of PCA for speech tasks and corresponding acoustic features.Different colors are used to distinguish speech tasks.The values in parentheses mean the loadings.The arrow length indicates the loading strength, and the angle describes the contribution of original features in the principal component.

Figure 4 .
Figure 4. Heatmap of correlations between acoustic features and m-FDA score.The squares with a gradient color from blue to red and the values in the lower triangular matrix are indicated as Spearman's rank correlation coefficients.The asterisks in the upper triangular matrix represent adjusted significance levels where *p < 0.05, **p < 0.01, ***p < 0.001 and ****p < 0.0001 are considered as statistically significant, and ns means not significant.

Figure 6 .
Figure 6.Classification and REG performance.a) ROC curves and model performance for each classification task.sen, sensitivity; spe, specificity.b) Accuracy of m-FDA score prediction model.Blue circles represent the subjects, and the dashed red line represents the perfect prediction.Results of MAE and Spearman's rank correlation analysis are provided.

Figure 7 .
Figure 7. SHAP summary plot of REG.The abscissa represents the SHAP value reflecting the model's contribution, and the ordinate is acoustic features with decreasing importance from top to bottom.Each sample is represented by a single dot on each feature row.The SHAP value determines the Â positions of dots, and the color indicates the normalized value of the feature.

Table 1 .
Overview of applied acoustic features.

Table 2 .
Demographic and clinical characteristics of each subgroup. a)