Accuracy of history, physical examination, cardiac biomarkers, and biochemical variables in identifying dogs with stage B2 degenerative mitral valve disease

Abstract Background Treatment is indicated in dogs with preclinical degenerative mitral valve disease (DMVD) and cardiomegaly (stage B2). This is best diagnosed using echocardiography; however, relying upon this limits access to accurate diagnosis. Objectives To evaluate whether cardiac biomarker concentrations can be used alongside other clinical data to identify stage B2 dogs. Animals Client‐owned dogs (n = 1887) with preclinical DMVD prospectively sampled in Germany, the United Kingdom, and the United States. Methods Dogs that met inclusion criteria and were not receiving pimobendan (n = 1245) were used for model development. Explanatory (multivariable logistic regression) and predictive models were developed using clinical observations, biochemistry, and cardiac biomarker concentrations, with echocardiographically confirmed stage B2 disease as the outcome. Receiver operating characteristic curves assessed the ability to identify stage B2 dogs. Results Age, appetite, serum alanine aminotransferase activity, body condition, serum creatinine concentration, murmur intensity, and plasma N‐terminal propeptide of B‐type natriuretic peptide (NT‐proBNP) concentration were independently associated with the likelihood of being stage B2. The discriminatory ability of this explanatory model (area under curve [AUC], 0.84; 95% confidence interval [CI], 0.82‐0.87) was superior to NT‐proBNP (AUC, 0.77; 95% CI, 0.74‐0.80) or the vertebral heart score alone (AUC, 0.76; 95% CI, 0.69‐0.83). A predictive logistic regression model could identify the probability of being stage B2 (AUC test set, 0.86; 95% CI, 0.81‐0.91). Conclusion and Clinical Importance Our findings indicate accessible measurements could be used to screen dogs with preclinical DMVD. Encouraging at‐risk dogs to seek further evaluation could result in a greater proportion of cases being appropriately managed.

regression) and predictive models were developed using clinical observations, biochemistry, and cardiac biomarker concentrations, with echocardiographically confirmed stage B2 disease as the outcome. Receiver operating characteristic curves assessed the ability to identify stage B2 dogs. It is challenging to recognize whether a dog is in stage B2 using information obtained from an external examination. Clinical signs such as a cough, exercise intolerance, inappetence, and weight loss can occur with progression of preclinical disease but might be attributed to diseases of other systems because of their nonspecific nature. [5][6][7][8] Some of these findings, along with findings from physical examination, are associated with the severity of preclinical disease, however their ability to detect cardiomegaly has not been rigorously assessed.

Results
Instead, severity is most commonly evaluated using imaging studies.
When performed by an experienced practitioner, echocardiography is considered the best method of staging preclinical disease. 3 Factors related to the dog, owner, and primary-care practice can influence the decision to pursue echocardiography, 9,10 so ultimately only a minority of cases might undergo advanced evaluation. For these reasons, there has been increasing interest in cardiac biomarkers, which could be used to evaluate a wider range of cases because of the relative simplicity of blood sampling. Research in dogs has predominantly focused on N-terminal propeptide of B-type natriuretic peptide (NT-proBNP) and cardiac troponin I (cTnI) as their concentrations positively correlate with echocardiographic dimensions. [11][12][13][14][15] There is value in interpreting these biomarkers alongside other risk factors for DMVD, 5,16,17 so it is possible that a similar approach could be used to stage preclinical disease in cases where echocardiography is not available.
This study aimed to determine whether data from readily accessible tests could reliably differentiate between dogs in stages B1 and B2. It was hypothesized that dogs in stage B2 would display differences in these variables, consistent with having more advanced disease. Our objectives were to: • Identify variables associated with the likelihood of being in stage B2, • Determine whether analyzing several of these variables in combination was beneficial when making this distinction, and

| Case selection
The study sample consisted of client-owned dogs that received an echocardiographic diagnosis of DMVD after undergoing evaluation by a cardiologist; this was defined as visible prolapse or thickening of the mitral valve and associated apparatus, in combination with mitral regurgitation on color Doppler examination. Dogs were required to be ≥6 years old, weigh ≥2 and ≤25 kg, and have a left apical systolic murmur with a point of maximum intensity over the mitral valve. Dogs were excluded if they had radiographic, historical, or physical examination findings consistent with CHF or if they were already receiving treatment with a loop diuretic. Known comorbidities expected to interfere with echocardiographic measurements or biomarker concentrations were considered additional reasons for exclusion. These included endocrine disorders, renal disease/injury, conditions accompanied by marked systemic inflammation, marked hepatic disease and, with the exception of concurrent tricuspid regurgitation, cardiac disease other than DMVD. Dogs receiving treatment with corticosteroids were not excluded.

| Study samples evaluated
The presence of CHF and administration of loop diuretics excluded dogs from the study sample and any analyses conducted. From amongst the remaining "complete" sample, a "clean" sample was created to remove the influence of potential confounders from analyses. Dogs found to violate selection criteria, such as those with azotaemia, hypercalcaemia, endocrinopathies, or elevations in alanine aminotransferase activity (ALT, >4 × upper reference limit), were excluded from the "clean" sample, 18 as well as dogs whose samples had taken longer than 72 hours to arrive at the reference laboratory. Dogs receiving treatment with pimobendan were excluded from the "clean" sample to eliminate the drug's influence on echocardiographic dimensions. 6 The data from dogs excluded for all reasons other than CHF and the receipt of loop diuretics were retained and used to form a "confounded" sample for use in a subanalysis. This is summarized in

| Clinical evaluation
Data were captured by veterinary cardiologists at the point of examination. It was noted if, in the past 6 months, the dog had developed a cough, exercise intolerance or a reduced appetite. Heart and respiratory rates were measured and the predominant heart rhythm throughout auscultation classified as sinus rhythm, sinus arrhythmia or "other." Murmur intensity was initially attributed a value of I-VI using a modified version of the Levine grading system (Supplementary Materials: Study Protocol); however, this was later reclassified to reduce complexity and account for the low occurrence of murmurs at either extreme of the grading system. 19 Grade I and II murmurs were labeled as soft, grade III as moderate, grade IV as loud, and grades V and VI as thrilling. 20 Body condition (BCS) was scored using the 9-point scale. 21 Echocardiography was used to obtain standard right parasternal views. Left atrial to aortic root ratio (LA:Ao) was recorded from a short-axis, 2D view in early ventricular diastole. 22 The left ventricular internal diameter in diastole (LVIDD) was recorded from a short-axis, M-mode view at the level of the chordae tendinae. Left ventricular internal diameters were normalized to bodyweight (LVIDDN) using the formula: LVIDDN = LVIDD (cm)/ weight 0.294 (kg). 23 Stage B2 was defined as an LA:Ao ≥1.6 and LVIDDN ≥1.7. 3 Dogs that did not meet both of these criteria were classified as stage B1. Where available, vertebral heart score (VHS) was recorded. 24,25 A venous blood sample was taken from all dogs to obtain serum biochemistry and cardiac biomarker concentrations. Results that fell below detection limits of assays were assigned the value of the lower limit. 5,16 All measurements underwent statistical analysis in Systéme Internationale units.

| Factors associated with stage B2 DMVD
Within the "clean" sample, binary logistic regression was used to identify factors associated with the likelihood of having stage B2 disease.
Cases were dichotomized according to whether or not they were in stage B2 and clinical data and blood test concentrations were entered as explanatory variables. Laboratory location was tested as a potential confounder. Univariable restricted cubic spline models were used to assess the assumption of linearity with the logit. 27 When this was violated, continuous variables were quartile transformed for analysis.
Variables that displayed an association at a univariable level (P < .2) were selected for inclusion in a multivariable analysis. Using complete data, backward stepwise elimination was used to select a preliminary main effects model based on likelihood ratio tests (P < .05) and the standardized change-in-estimate criterion (threshold = 20%, R package "abe" v3.0.1). 28,29 Variables excluded by univariable testing were then individually entered into the main effects model and retained if they induced a substantial change in coefficients (>20%) indicative of a confounding effect. 27 Two-way interaction terms were tested for variables remaining in the model as main effects and included in the final model if they displayed a significant association with disease stage. 30

| Discriminatory ability in alternate settings
Model performance was assessed by plotting a receiver operating characteristic curve using predicted probabilities and calculating the area under the curve (AUC) with 95% CIs. In order to evaluate the degree to which comorbidities, sample handling, or pimobendan administration affected discriminatory ability, the coefficients for the explanatory multivariable model were applied to data from the "complete" and "confounded" samples. Respective AUCs were compared to the "clean" sample using a DeLong test. 31 The discriminatory ability of the explanatory multivariable model was additionally compared to other methods that could be used to identify stage B2 DMVD. Disease stage was regressed on NT-proBNP alone and VHS alone, from which AUC was calculated.

| Predicting preclinical disease stage
A series of models were developed to evaluate whether preclinical disease status could be predicted in a subset of known data. Binary logistic regression was tested alongside more complex machine learning (ML) algorithms which do not rely on the same assumptions as regression and offer protection against overfitting. 32 These were ridge regression, 33 support vector machines (SVM), 34 random forest, 35 and the gradient boosting machine (GBM) XGBoost. 36 The clean data were partitioned with 80% used to train models and 20% used to test performance ( Figure 1). Rows containing missing data were not included in this split. Data were then preprocessed according to the requirements of each model (Supplementary Methods). In multivariable logistic regression, features underwent univariable screening (P < .2) and then backward stepwise elimination with the residual chisquared as the stopping criterion (P < .05) to select a model based on parsimony. 37 The hyperparameters of ML models were tuned using a grid search in a 5-fold cross validation loop of the training set. Fitted models were applied to the test set to generate predicted probabilities of being stage B2 and assess performance. Plots of variable importance were also produced where possible for ML models.

| Factors associated with stage B2 DMVD
In univariable testing, age, log 10 (Bilirubin), and log 10 (cTnI) were nonlinearly related to the outcome, so were categorized into quartiles using their nontransformed values. Of the variables tested, 18 were associated with disease stage at a univariable level (Supplementary Table 3). In the multivariable analysis, the following variables were identified as independent risk factors: age, ALT, appetite, BCS, creatinine concentration, murmur intensity, and NT-proBNP concentration (Table 3). A reduction in appetite and lower BCS were associated with greater odds of being in stage B2. Post hoc testing of BCS demonstrated that this was true when underweight scores (BCS ≤3) were compared to almost all other values (Table 4; Figure 2B). Estimated marginal means for murmur intensity showed that the likelihood of being in stage B2 was greater when murmurs were more audible, with the comparison between loud and thrilling murmurs being the only pairwise combination that did not significantly differ (Table 4; Figure 2C). Age was also associated with the outcome, with dogs between 8 and 10 years old at greatest risk. In dogs older than 10, the likelihood of being stage B2 was significantly lower (Table 4; Figure 4A). Increasing serum creatinine concentrations were associated with a reduction in the odds of being in stage B2 (β, −0.02; OR, 0.98; 95% CI, 0.97-0.99; P < .001). In contrast, both log 10 (NT-proBNP) and log 10 (ALT) were positively related to the odds of being in stage B2 when modeled as main effects. ALT and NT-proBNP negatively interacted, meaning that the association between log 10 (NT-proBNP) and the outcome was not as strong at higher values of log 10 (ALT), and the association for log 10

| Predicting preclinical disease stage
When evaluating the AUC for predictions on the test data, multivariable model performance was relatively consistent across the different classification algorithm types with a mean value of 0.87, indicating that all models generalized well to new data. When NT-proBNP was assessed as a sole predictor, the overall accuracy of the model was reduced. Performance metrics are summarized in Table 5. Both NT-proBNP and murmur intensity were consistently found among the most important predictors, with NT-proBNP ranking first in all models tested ( Figure 6). These variables were featured in the predictive logistic regression model alongside appetite, creatinine, and BCS (

| DISCUSSION
Our study found that clinical observations and cardiac biomarker con- barriers to uptake such as cost would be reduced. The model was internally validated against a holdout set of 20% of the cohort and on the basis of this analysis; it is possible to infer that it will perform well if applied to new cases. An important next step is to assess the model's accuracy in the exact set of circumstances in which it is intended for use; primary-care practice. 38 If satisfactory, this model could be used to support clinical decision making in preclinical DMVD.
As the potential impact of this model is influenced by user uptake and engagement, we propose that it is presented in the form of an app.
This digital medium would allow vets to engage with the full model In addition to NT-proBNP, several other risk factors were identified. Murmur intensity, another important predictive variable, is associated with preclinical disease severity. 20,45 Our study found that the likelihood of being in stage B2 increased with murmur grade, with dogs having loud or thrilling murmurs at the greatest risk. Murmur intensity is one of the more subjective measurements when compared to the other variables included in this analysis. Cardiac auscultation is subject to inter-and intraobserver variability, which is potentially limiting considering the apparent importance of this variable. 45,46 As the use of simpler schemes improves agreement, audibility was graded using a 4 level system that has been previously used to assess DMVD severity. 20 This had the advantage of regrouping grades that did not occur very commonly, reducing the dimensionality of our data for analysis. The predictive models we describe use this system and demonstrate good predictive accuracy.
It is still however important to note that all dogs were examined by veterinary cardiologists using a standardized protocol and further research is required to assess whether sampling in a different setting could impact the accuracy of results. The decision to use this system was informed by our research question and data, and we do not necessarily suggest that this replaces methods currently used in practice.
Having a reduced appetite was found to increase the likelihood of being in stage B2. In DMVD, loss of appetite is considered a negative prognostic indicator and some dogs that go on to develop CHF experience reductions in body weight. 4,6,47 Although weight was not examined in the present analyses, poor BCS was associated with increased risk. As in humans, 48,49 cachexia might develop before the onset of CHF, resulting in changes that can be detected as clinical signs. In this study, a negative association was observed between creatinine and the odds of being stage B2, and creatinine was selected by several multivariable methods in preference to symmetric dimethylarginine (SDMA). As both variables vary with GFR, the selection of creatinine suggests that its association with disease severity could in part relate to cachexic losses in muscle mass. Glomerular filtration rate itself might be expected to display an association with the severity of preclinical disease as increases in circulating fluid volume have been shown to induce a more rapid rate of creatinine clearance. 50 Adjusting for creatinine in a model containing NT-proBNP is potentially advantageous as GFR is a confounder of the biomarker's concentrations. 51 Although age and ALT were associated with the likelihood of being stage B2, neither variable was retained in the predictive model derived from a smaller subset of data. In explanatory analyses, the greatest risk was observed when dogs were between 8 and 10, and after this aging dogs were less likely be in stage B2. In humans, age is considered when defining diagnostic thresholds for NT-proBNP and research has shown that this explains additional variation in analyses that already account for creatinine. 52 There is evidence that the propensity to remodel is altered in the aging heart; however, this has not been studied in DMVD. 53 It is possible that profibrotic changes in myocardial composition could affect the tendency of the heart to dilate. 54 Alternatively, these findings could reflect differences in the phenotypes contained within each age group. Early onset DMVD, as noted in some breeds, might be accompanied by a more rapid rate of disease progression. 2,5 The explanatory analysis also identified ALT as a risk factor. As the hepatic vasculature is sensitive to changes in central venous pressure, elevations in ALT can occur in cardiovascular disease as a result of congestion or reduced perfusion. 55 Alanine aminotransferase interacted with NT-proBNP, although the exact relevance of this finding in DMVD is unclear. At high ALT concentrations, NT-proBNP could be partially elevated as a consequence of liver disease, producing a weaker association with DMVD severity. 56 In addition, 15 dogs were receiving treatment with corticosteroids which can affect both variables. 57 This is not the first study to T A B L E 5 The performance of a series of models predicting preclinical disease status

| Strengths and limitations
The study benefited from the large number of cases, which facilitated robust analyses, particularly when developing a predictive model for clinical use. When training any model, it is possible that the algorithm will overfit nonmeaningful noise in the data, reducing its generalisability. 58 In this study, there were enough cases to simulate performance in novel conditions. Several algorithms were compared and there was good agreement in internal validity and the variables of greatest importance. A possible criticism, however, is that the conditions of examination are not those found in primary-care practice.
Data were collected from a referral sample by specialists following a protocol, and blood samples were analyzed at research laboratories. A follow-on external validation study is required to assess whether predictions are reliable when these conditions are changed.
The data were substantial enough to evaluate several ML algorithms and present them in comparison with conventional regression models. Machine learning has potential applications in medicine as algorithms can describe complex, nonlinear relationships among variables. 59 In this study, using ML to distinguish between stages did not produce a marked advantage in performance, so the model derived Notes: Incremental increases in the predicted probability were evaluated as the threshold used to classify dogs as being in stage B2. The utility of each threshold was assessed using training set data. Confidence intervals were calculated using 2000 stratified bootstrap replicates (R package "pROC" v1.16.2). CI, confidence interval; NPV, negative predictive value; PPV, positive predictive value.
Dogs receiving treatment with pimobendan were excluded from the clean data and, in doing this dogs with more severe disease could have been removed (Supplementary Tables 5 and 6). 5

| CONCLUSION
This study found that variables from different aspects of a dog's examination could be used in combination to assess the likelihood of being stage B2. A predictive model that analyses a dog's appetite, BCS, creatinine concentration, murmur intensity, and NT-proBNP concentration could be presented as an app for use in primary-care practice. This has potential as a screening test and might provide an informed way to allocate client and practice resources. The correct application of this prediction model could improve outcomes for dogs with preclinical DMVD.