Machine learning algorithms utilizing blood parameters enable early detection of immunethrombotic dysregulation in COVID‐19

Dear Editor, The pandemic of coronavirus disease 2019 (COVID-19) has stressed and overloaded the existing medical capacity worldwide. From a more pragmatic perspective, the early detection of patients who may experience rapid clinical deterioration will enable prompt interventions and avert disease progression.1 T cell exhaustion, immunothrombotic dysregulation, as well as complement-associated microvascular injury are considered as the hallmarks of disease severity in COVID-19.2–5 It is generally accepted that the identification of useful surrogates, for example, IL-6, TNFα, MIP1α, LDH, ferritin, D-dimer, CK, etc., to represent as immune response to COVID-19 infection is crucial.3,4,6 Nevertheless, no individual parameter was so far predictive of immune-thrombotic dysregulation fueled by a maladaptive host inflammatory response in severe infection with SARS-CoV-2.7–9 We, therefore, consider to develop potential solutions for forecasting thrombotic complications prior to clinicopathological exacerbation. By incorporating whole blood transcriptome profiling and multi-omics analysis, our study characterized immunological and hematological perturbations with respect to different categories of severity (i.e., healthy donors vs. mild or moderate vs. severe vs. critical illness). Functional diversity was found among those groups by unsupervised hierarchical clustering of differential expression profiles (Figure 1A, left). Circus plots revealed that the differentially expressed genes (DEGs) were enriched into the key processes, that is, neutrophil activation, platelet activation, blood coagulation, complement receptormediated signaling pathway, leukocyte activation, and cytokines production. In contrast, the downregulated DEGs were functionally linked with lymphocyte activation/proliferation/differentiation/migration, gamma delta (γδ) and alpha beta (αβ) T cells activation, and so on (Figure 1A, right, and B). More specifically, the

upregulation of gene-signatures in platelet, neutrophil, and coagulation activation, as well as downregulation of lymphocyte activation in severe and critically ill COVID-19 were demonstrated ( Figure 1B).
Multi-omics data incorporating plasma cytokines and chemokines, circulating complements, flow cytometryderived immune cells counts, clinical laboratory outcomes, as well as featured gene-signatures were implicated in pairwise Pearson correlations ( Figure 1C, left). Furthermore, the upregulations of both neutrophil and platelet activation signatures were strongly correlated with downregulation of lymphocyte activation (R = -0.88, p < 0.001) (Figure 1C, middle). Gene-subsets for neutrophil, platelet, and coagulation activations were found to correlate with blood complements C3b, C4a, C6b, and C7b, in contrast to lymphocytes as inverse correlations ( Figure 1C, right, and 1E, left).
The unveiled transcriptional findings were validated in a multicenter cohort of 1219 eligible individuals ( Figure S1). neutrophil, lymphocyte, and blood coagulation activations. (C) Multi-omics characteristics correlation matrix of 43 features of COVID-19 patients. Linear regression for the correlation of lymphocyte-neutrophil-platelet activity in COVID-19. The square size corresponds to the absolute value of the Spearman rank correlation coefficient, with brown (blue) color indicating a positive (negative) correlation. *FDR < 0.05, **FDR < 0.01, ***FDR < 0.001. (D) Heatmap for gene-signatures of activation, recruitment and interactions for neutrophil, platelet, and the formation of NETs (NETosis). (E) Correlation analysis for plasma complements or D-dimer versus transcriptional levels of specific gene-subsets F I G U R E 2 Critical blood parameters as well as age associated with disease severity of COVID-19 patients. (A-E) Boxplots depicting lymphocyte, neutrophil, platelet, hemoglobin, and age with respect to disease severity. *p < 0.05, **p < 0.01, ***p < 0.001. (F) Correlation matrix of lymphocyte, neutrophil, platelet, and hemoglobin in the peripheral blood, as well as age in COVID-19 patients. (G) Intuitive three-dimensional plot shows the interplay of lymphocyte, neutrophil, platelet counts, and hemoglobin level in 1219 COVID-19 patients. (H-J) The featured outcomes from routine blood tests and age were examined independently by ROC curves in discriminating disease severity of COVID-19 patients We used AUCs at 15-, 30-, and 45-days to assess prognostic accuracy, and calculated p values using the log-rank test A summary of patient characteristics is provided (Table S1). Peripheral lymphocyte, neutrophil, platelet counts, as well as hemoglobin and ages among different severity groups were shown (Figure 2A-E). Besides, the demographically predictive of protection against advancement of severity in COVID-19 is female sex, particularly for critically ill and lethal events ( Figure S2). Consistent with transcriptional findings, clinical laboratory outcomes evidenced that lymphopenia, neutrophilia, as well as thrombocytopenia owning to the overconsumption of platelets were notably characterized in the late stages of COVID-19. And those features were of mutual linkages and exhibited correlation to varying degrees ( Figure 2F). A three-dimensional simulation further implicated the dynamic interplay of lymphocyte, neutrophil, platelet, and hemoglobin (Figure 2G), providing a solid basis for mathematical modeling. Nonetheless, an individual blood parameter had relatively poor predictive performance for stratifying patients with different severity ( Figure 2H-J and Table S2).
To improve the discrimination accuracy, machine learning-based severity classification was performed. LASSO regression classifier was applied to train the model utilizing the featured blood-parameters ( Figure 3A). The calibration curve demonstrated a good consistence between the predicted and observed values and favorable predictive performance confirmed by receiver operating characteristic (ROC) analysis ( Figure 3B-D). The discriminative ability was also assessed for testing and validation cohorts ( Figure S3A-H). In parallel, the generalized linear model (GLM) and linear discriminant analysis (LDA) were utilized for the construction and optimization of disease discrimination. Strong discriminative capacities were achieved for both GLM ( Figure 3E) and LDA (Figure 3F)-based algorithms. Eventually, the overall cohort of 1219 patients was stratified into different degrees of severity with a robust hierarchical classification capacity ( Figure 3G).
Machine learning-based prognosis prediction was also studied ( Figure 4A). The calibration curve and the diagonal coincided in general, indicating relatively high prediction accuracy for 15-, 30-, and 45-days in-hospital mortality risks ( Figure 4B). A superior prediction capacity was demonstrated by decision curve analysis (DCA) and the net reduction in interventions was maximized ( Figure 4C, D). The derived survival risk score was associated with immunethrombotic dysregulation. Patients in the training cohort could be, therefore, divided into high-and low-risk groups with significantly stratified fatal risks ( Figure 4E Consistently, highly predictive performance was also evaluated in both internal testing ( Figure 4H-J) and external validation cohort ( Figure 4K-M).
In conclusion, genome-wide whole blood profiling was performed to deciphering the peripheral immune and hematologic pertubations to COVID-19, revealed an interesting feature of uncontrolled neutrophil-complementcoagulation interplay associated with immunethrombosis in severe and critically ill patients. Via machine learning techniques as well as the inclusion of large-scale multicenter cohorts of 1219 patients, an optimized precision of prediction algorithm by integrating platelet, neutrophil, and lymphocyte counts and hemoglobin was established. Taken together, we developed and validated mechanisticdriven rather than purely data-driven algorithms to assess the specific risks of immunothrombotic dysregulation in COVID-19. In principle, it might be used as a potential surrogate of decision-making for the ICU patients with coagulation abnormalities, enabling more timely interventions, such as low molecular weight heparin-treatment, and/or anticytokine therapies. Of note, those patients in ICUs are largely incapable of communicating and with very limited access to standard imaging utilizing computed tomography (CT). This algorithm will assist in guiding clinical decision-making in more individualized managements and provide insights for longitudinal surveillance of severe and critically ill individuals. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. All authors had full access to all the data in the study and had final responsibility for the decision to submit for publication.

C O N F L I C T S O F I N T E R E S T
The authors declare no potential conflicts of interest.

E T H I C S A P P R O VA L A N D C O N S E N T T O PA R T I C I PAT E
This study was approved by the Ethics Committee of Nanfang Hospital, Southern Medical University (approval number: NFEC-2020-033) and the Ethics Committees from the collaborated centers.