Label‐Free Blood Typing by Raman Spectroscopy and Artificial Intelligence

Label‐free blood typing by Raman spectroscopy (RS) is demonstrated by training an artificial intelligence (AI) model on 271 blood typed donor whole blood samples. A fused silica micro‐capillary flow cell enables fast generation of a large dataset of Raman spectra of individual donors. A combination of resampling methods, machine learning and deep learning is used to classify the ABO blood group, 27 erythrocyte antigens, 4 platelet antigens, regular anti‐B titers of blood group A donors, regular anti‐A,‐B titers of blood group O donors, and ABH‐secretor status, from a single Raman spectrum. The average area under the curve value of the ABO classification is 0.91 ± 0.03 and 0.72 ± 0.09, respectively, for the remaining traits. The classification performance of all parameters is discussed in the context of dataset balance and antigen concentration. Post‐hoc scalability analysis of the models shows the potential of RS and AI for future applications in transfusion medicine and blood banking.


Introduction
The non-labeling nature of Raman spectroscopy (RS) makes it highly attractive for phenotyping in clinical medicine.RS provides a spectral fingerprint specific to the biological sample of interest.Supported by artificial intelligence (AI), (machine learning (ML) and deep learning (DL)), it is possible to extract DOI: 10.1002/admt.202301462extremely subtle and extensive information about the composition of a sample. [1]In recent years, numerous applications of both ML [2] and DL [3] frameworks have been suggested, to translate the complexity of RS to a clinical setting.11][12] RS has shown a potential to recognize distinct features of the different erythrocyte blood group antigens and thus distinguish between ABO blood group antigens, but has so far relied on comprehensive sample preparation steps applied to isolate specific substances or components of the blood, thus limiting the throughput and the resulting dataset sizes.A laser tweezers Raman spectroscopy (LTRS) system has been proposed to probe single trapped erythrocytes, [13] and surface-enhanced Raman scattering (SERS) spectroscopy on purified globulins from blood plasma have been used to discriminate ABO blood types. [14,15]Raman analysis restricted to the individual substances has the advantage of reducing the simultaneous scattering and absorption factors that otherwise need to be considered, and produce features in a Raman spectrum, which are otherwise hidden by the strongly scattering hemoglobin in erythrocytes. [4,16,17]However, in many point-of-care applications and particularly in transfusion medicine, time is a decisive factor, and RS must be applied directly on whole blood to avoid time and labor consuming preparatory steps.Comprehensive work on RS on blood within flow-through tubes has shown the potential to acquire Raman spectra with minimal sample preparation in a fast and noninvasive way. [18,19]Finally, some previous studies of Raman blood analysis suffer from a low number of samples, subsequently limiting the access to large datasets for ML and DL analysis thus making the translation to clinical use difficult. [20]BO and RhD blood group determination for donors and patients is pivotal for safe transfusion medicine. [21]Determination of blood groups and subsequent matching is mandatory in transfusion medicine, and only RhD-negative erythrocytes are given to RhD-negative recipients.O RhD negative erythrocytes are universal in the sense that any recipient can receive them without the risk of reactivity with anti-A and anti-B antibodies and without Raman-based microfluidics RS None None All 1 min Yes the risk of inducing irregular antibodies to the RhD antigen.This makes the O RhD negative erythrocytes vital in emergency situations where the recipient blood group is unknown.However, only ≈4% of the population are O RhD negative.Therefore, blood banks strive to obtain fast determination of patients' blood groups and use donors with identical blood groups. [22]This motivates the development of fast and comprehensive analytical technology.
The immune system always produces antibodies against the ABO antigens that are not present on the surface of the RBCs of the person, hence the term regular anti-A and anti-B antibodiesand the interest in avoiding universal O type donors with excessive anti-A and anti-B antibody.Extended phenotyping of additional blood group systems is done as a supplement to the basic determination of blood groups for ABO and RhD.[25] A comparison to state-of-the-art blood typing methods is presented in Table 1.Most other methods are well established and tested with clinically relevant accuracies with both low cost and fast options.However, most rely on antibody and antigen specific reagents and are susceptible to the expression levels of certain antigens or limited by weak agglutination.Alternatively, PCR based genotyping has become state-of-the-art in extended blood type matching [26,27] but remains time-consuming and expensive on a larger scale due to the requirement of highly trained technicians and multiple reference laboratories. [28]n this work, we address the analytical challenges of the consumable-heavy serological methods as well as the timeconsuming and expensive genotyping methods, using a flow cell protocol to acquire RS from 271 whole blood samples with no pre-analytical preparative steps.Based on a single Raman spectrum for each sample measured in 1 min, we aim to simultaneously map out 35 blood group traits and determine titers of regular blood group antibodies.
Notably, a trait is not a direct measurement of a low concentration antigen, but rather a characteristic of the donor that could encompass several distinct components in the blood.The identity and molar concentration of the entire set of informative analytes are not clear, but it is assumed that the entire collection of informative molecular moieties is far more abundant than the molar concentration of the single product representing the molec-ular end-point of a trait.A similar approach of using a multiparameter association has recently been reported for proteomics and blood groups ABO and Rh. [37]A strict association between molar concentrations of the blood group antigens and the performance of RS is not to be expected.Consequently, the complex correlation between reference blood traits and RS is analyzed by multivariate statistics, ML and DL, and the challenge of detecting a rare trait is addressed by resampling methods such as random undersampling, adaptive synthetic (ADASYN) [38] oversampling and synthetic minority (SMOTE) [39] oversampling, as well as using ensemble learning. [40,41]The observed scaling of accuracy with size of training set supports the principal feasibility to achieve clinically relevant accuracy on determining blood traits by RS and AI trained on 30-60 000 donor samples-with a perspective to radically improve transfusion medicine via fast and easy access to all clinically relevant traits for donors and patients.

Ethical Approval
Blood was collected from voluntary blood donors after obtaining informed written consent for use of blood as normal material.Samples were anonymized and accompanied by reference data (ABO, phenotype predicted from genotype and hematology reference data).According to legislation, this use of anonymized donor samples as normal material does not require approval by an ethical committee.

Microfluidics
Whole blood samples in EDTA tubes were collected from donors at the blood bank in Copenhagen University Hospital and stored at 4 °C for no longer than 60 h before Raman measurements were done.The main source of spectral change over time can be attributed to oxygenation of hemoglobin, which was visible in regions related to oxygenation and deoxygenation bands (1200-1230 cm −1 and 1500-1660 cm −1 . [42]A short window from the time blood is collected to a Raman spectrum was acquired, limits the contamination of oxygenation, although some variance between donors can be expected (see Figure S2, Supporting Information).The blood samples were drawn directly from the EDTA tubes into a fused silica microcapillary with an inner diameter of 250 μm and outer diameter of 360 μm (TSP250350, Polymicro Technologies LLC) by a syringe pump (Harvard Apparatus 11 Plus) to measure Raman spectra while flowing the sample as shown in Figure 1a-c.This flow cell configuration was chosen to avoid elaborate sample preparation steps and multiple disposable sample holder parts, to make the setup suitable for high throughput applications [43] in comparison with previously reported Raman-based methods, where sample preparation has been the rate-limiting step.In addition, the flow of the sample accommodated the fact that stationary blood samples were vulnerable to photodegradation and laser-induced denaturation of erythrocytes. [44]Photodegradation is in large part due to the strong absorption of hemoglobin at blue and green wavelengths, however, even at near-infrared excitation and low laser powers, photodamage of hemoglobin can occur.Heme aggregation due to thermally or laser induced denaturation creates spectral inhomogeneity in the measurements, which ultimately will disturb the blood type analysis. [45]A constant flow rate of 10 μL min −1 was found to be optimal in terms of limiting the needed sample volume per measurement while avoiding photo-bleaching anddegradation in the sample (see Figure S1, Supporting Information).A maximum of 70 μL of whole blood was used per Raman spectrum.In-between sample exchange the capillary was cleaned by a bleach-based cleaning solution (Cat.no.BD 340345, BD Biosciences) followed by a buffer solution consisting of 1000× 0.5 mm EDTA mixed with phosphate buffered saline in a 1:1000 ratio to avoid any EDTA gradients and clot formation.The capillary was cleaned for ≈5 min in-between sample exchange.Sample collection and measurements were carried out over several weeks using consistent sample loading and cleaning protocols.

Raman Spectroscopy
The flow cell was mounted on a motorized translational stage of an inverted microscope (Nikon-Ti) with a 50× air objective (NA 0.8, WD 1 mm).A diagram of the optical setup is shown in Figure 1c.A tunable 765-805 nm diode laser (Toptica DLC DL-pro) was used for Raman excitation at 785 nm and sidebands were filtered using a tunable band-pass edge filter installed with a fixed angle with respect to the incident light.The laser source was then coupled into the inverted microscope in a backscattering configuration and focused by the objective to a small spot inside the fused silica capillary.A laser power of 25 mW at the excitation spot was used.A long-pass edge filter (785RS-25, Semrock) and a notch filter were inserted to suppress Rayleigh scattering such that only Raman scattered light entered the spectrometer.The microscope field of view was imaged by a Shamrock 303i imaging spectrometer, with an integrated Newton 920 deep-cooled backilluminated CCD camera (Andor Technology).A 300 μm wide slit was positioned at the entrance of the spectrometer to suppress light away from the focus of the laser spot.A grating of 1200 lines mm −1 was applied to image the spectral distribution of focused Raman scattered light onto horizontal positions of the CCD camera, while vertical positions corresponded to scattered light at various positions at the entrance slit of the spectrometer.Positions that were spatially offset from the focus of the laser excitation spot collect Raman scattered light from sample volume that was not in focus with the focal plane of the microscope. [9]The capillary was aligned perpendicular to the optical axis of the microscope objective with the center of the capillary in the xy-plane.Raman signal from the fused silica capillary walls was suppressed by focusing the excitation laser along the z direction until fused silica Raman peaks disappear and signal from the blood samples was maximized.The position of the laser spot with respect to the sample flow was calibrated by conducting a z-scan, tracking fused silica bands (495, 606 cm −1 ) and hemoglobin bands, such that a sample volume 10 μm from the inner capillary wall can be probed consistently (see Figure S3, Supporting Information).The short penetration depth of the laser in the whole blood sample forces the measurement position to be relatively close to the capillary wall, where a cell-free region formed in close proximity to the capillary walls, [46,47] creating a narrow region without fused silica background.A z-scan calibration ensured that no unwanted variability in the spectra from glass signal, varying cell concentration and flow velocity was introduced.The flow of the sample enabled probing of a large sample volume and long integration time without damaging the blood cells.An integration time of 60 s was used in each measurement such that a relatively large sample volume can be analyzed, and sufficient SNR achieved (6.8±1.7).For each measurement the spectral range of 393-1869 cm −1 was used with the spectral dispersion varying from 0.63 to 0.49 cm −1 per pixel.Wavenumber calibration was performed using the 520 cm −1 vibration of a single crystalline silicon wafer sample.

Dataset
The Raman dataset was collected from whole blood samples in EDTA tubes from 69 type A, 69 type B, 70 type O, and 63 type AB donors.Three separate Raman measurements were conducted on each donor sample resulting in a total of 813 hyperspectral Raman images with 2663×256 pixel values.The row with maximum intensity in the spectral region of interest was extracted for further pre-processing.The reference dataset consisted of serologically determined ABO and RhD blood groups for all the donors in the dataset and antibody parameters determined by serology for blood group A and O. Smaller subsets of the total donor cohort had an additional set of 51 blood group antigens predicted by genotyping.Only 38 of the traits were deemed suitable for classification purposes in terms of providing sufficient occurrences of both positive and negative donors for principal component analysis (PCA) and support vector machine (SVM) analysis.27 of the traits were erythrocyte antigens (molecular end-point of trait), 4 were platelet antigens (molecular end-point of trait), and 2 were ABO antibody (Ab) titers, anti-B (aB), and anti-A,-B (aAB).The anatomical location of the trait, blood group system, frequency of the antigen, and molar antigen concentration are outlined in Table 2.
All blood group antigens and haematological parameters were provided by the donor database at the Copenhagen University Hospital blood bank.The blood group antigen testing was carried out by PCR methods [27] and haematology parameters were determined by a commercial haematology analyzer (Sysmex).

Spectral Pre-Processing
To maximize the performance of the classification algorithms, a series of pre-processing steps were carried out on the Raman spectra.Fluorescence and measurement noise create a background drift in the spectra that needed to be accounted for, such that any spectral differences due to thermal fluctuations and other environmental factors are avoided.Background baselines for each spectrum were removed using an asymmetrically reweighted penalized least squares smoothing (arPLS) algorithm that effectively corrects noisy baselines while maintaining an accurate peak height estimation. [48]The arPLS algorithm can be tuned by a regularization parameter, , which was treated as a hyperparameter when training the classifiers.An outline of the importance of  can be found in Figure S4, Supporting Information.

SVM Baselines and Training Details
SVM were used as the baseline classification method for all blood traits due to their efficient implementation and high performance on small datasets. [49]The four ABO blood groups were divided into 6 pairwise sub-classification problems, while the rest of the parameter determinations were reduced to a binary classification of either positive or negative.The input dataset consisted of 813 vectors, each with 2663 intensity variables.Analyzing data in a high-dimensional space was both computationally costly and often required large amounts of data to obtain reliable results.PCA was used to reduce the high dimensional (2663 intensity variables) input data to 15 principal components (PCs) before SVM classification.The number of PCs was treated as a hyperparameter when training the SVM classifiers.The SVM classifier used a radial basis function (RBF) kernel to project the input space into a higher dimensional space, such that the non-linearly separable classes can be distinguished.A kernel coefficient, , and a SVM regularization parameter, C, were then optimized along with the rest of the hyperparameters using fivefold crossvalidation.The area under the receiver operating characteristics curves (AUC-ROC), balanced accuracy (BA), F1 score, precision, sensitivity/recall, and specificity were used to evaluate the performance of the models (see Section 2.9).BA was used as the loss function when optimizing the models due to the imbalanced nature of most of the parameters (see Section 2. 7).The open source Python module Scikit-learn [50] was used for all SVM analysis.

Dataset Imbalance and Ensemble Learning
The donors were selected to have a balanced ABO blood group dataset and consequently the majority of the other parameters Table 2. Overview of the different antigens, the cell types or substances in which they are found, the designation of the corresponding blood group system, number of antigens per cell, and concentration in μm..  have a highly imbalanced distribution of positive and negative instances as they approximately reflect the balance in the population.Most ML models trained on an imbalanced dataset were susceptible to producing models which were biased toward the majority class.SVMs, especially, have been shown to produce separating hyperplanes ignorant toward the minority class, generating more false negative predictions. [51]Different re-sampling methods were applied to balance the datasets before classification.Random under-sampling, random over-sampling, and synthetic over-sampling methods, SMOTE [39] and ADASYN [38] were applied on the training data in each cross-validation evaluation to solve the problem of class imbalance.Additionally, bagging ensembles of SVM models were used instead of a single estimator, to further improve the classification performance. [40]Each individual SVM was trained independently on subsets of the training dataset, and an aggregate prediction was determined by majority voting (see Figure 2c).The subsets were randomly chosen by a bootstrap technique such that a spectrum can be used repeatedly in the training of multiple models, and each subset was balanced by resampling before training. [41]The size of the ensemble (number of n estimators) was treated as a hyper-parameter when opti-mizing the models.The open source Python module imbalancedlearn [52] was used for all imbalance and ensemble learning analysis.

CNN Architecture
When potentially increasing the dataset to thousands of donors and combining all traits in a unified classification task, using a single or ensembles of non-linear SVMs was practically infeasible. [53]CNNs were established as one of the most utilized DL methods for pattern recognition and image classification, [54] and various CNN architectures were proposed as an efficient and accurate classifier of spectroscopic data. [55]The simple 1D CNN model used in this study consisted of three convolutional layers and a pooling layer for feature extraction, followed by two fully connected dense layers for classification.A block diagram of the architecture can be seen in Figure 2d.To avoid overfitting, a dropout of 0.2 was used in between each block, meaning 20% of the nodes were left out randomly from one layer to the next.A single 1D max pooling layer was applied after the last convolutional layer, to reduce the dimension of the feature map before classification.The CNN architecture hyperparameters, including the number of filters and kernel size in each convolutional layer as well as the number of hidden units in fully connected dense layers, were optimized using the Hyperband tuning algorithm. [56]he open source Python modules Keras [57] and TensorFlow [58] were used for all DL analysis.

Performance Metrics
The metrics used to evaluate the classification performance of the traits are defined as follows Here TP, TN, FP, and FN are true positive, true negative, false positive, and false negative predictions, respectively.The AUC is the area under the curve defined by the true positive rate (TPR) and false positive rate (FPR) for arbitrary values of  (posterior probability between 0 and 1)

Overfitting and Data Leakage
In order to prevent overfitting and data leakage, pipeline frameworks in both Scikit-learn [50] and Imbalanced-learn [52] were utilized to make sure that feature selection were completely separated from the validation data, that is, the PCA was recomputed using only the training data for each run in the crossvalidation scheme and then the validation data was projected into that subspace.Similarly, the resampling methods were implemented within each fold and not across all cross-validation folds.
Using BA as the metric for model selection ensured that the models were not biased toward the majority class.Both training scores and cross-validation scores were computed in the scalability analysis to show that the performance can be generalized.Too large a discrepancy between training and cross-validation curves indicated overfitting on the training data.In that case the SVM regularization parameter, C, was decreased to limit model complexity by preventing the weights of the models from becoming too large.

ABO Blood Group Classification
As mentioned, most bands in the whole blood Raman spectra can be attributed to vibrations of the hemoglobin molecule.The peaks are identified and assigned to different vibrational modes in Figure 3a.The PC loadings in Figure 3b suggest that certain Raman shifts contribute significantly more than others to the most descriptive PCs.In Figure 3c the intensity distribution at 24 of the most interesting bands are plotted.Distributions of spectra belonging to each of the ABO blood groups are plotted separately to account for any differences between the groups.Distinct differences can be seen at the Pyrrole (Pyr) breathing modes ( 15 ,  6 ,  46 ) as well as at the asymmetric ( 44 ,  30 ) and symmetric ( 41 ,  4 ,  12 ) Pyr half-ring modes, and the Pyr quarter-ring mode ( 20 ).Similarly, differences can be seen at the Phe skeletal C-C mode (898 cm −1 ), the deformation modes of amino acid side chains ((CH 2 /CH 3 )) and amide I. Other porphyrin stretching modes such as (C m H) ( 13 ,  42 ,  21 ) and (C  C m ) asym ( 10 ) seem to be ABO blood group dependent as well.Differences at vibrational modes specific to Porphyrin between blood groups are consistent with the existing study on single trapped erythrocytes. [13]Positive and negative distributions of the intensity values at the same 24 bands for the rest of the traits can be found in Figure S5, Supporting Information.
The ABO blood group classification is carried out without any resampling or ensemble methods as the dataset is already balanced by construction.The performance results of the SVM (RBF) model are plotted in Figure 4 as ROC curves and confusion matrices using fivefold cross-validation repeated ten times, and all performance metrics are summarized in Table 3.The AUC values are 0.92 ± 0.03, 0.88±0.04,0.93±0.02,0.87±0.04,0.90±0.03,and 0.95±0.0215] Type AB, A and B seem to be more difficult to discriminate which could be explained by the fact that antigen A and antigen B differ only by having the sugar N-acetylgalactosesamine and galactose attached at the terminus, respectively.
The determinations of ABO Ab titers are given as regular anti-B titers of blood group A donors ⩾10 (positive) or <10 (negative), designated the aB Ab trait, and regular anti-A,-B titers of blood group O ⩾ 50 (positive) or <50 (negative) designated the aAB Ab trait.As outlined in Figure 5 and Table 3 the aB trait is classified with an AUC of 0.80±0.06 on relatively limited reference dataset of size 243, showing great potential of extending RS and ML to antibody testing.The antibody titer laboratory tests are equivalent to a specific level of antibodies in the donor plasma, thus a quantitative determination in contrast to the rest of the antigen traits.Using RS to routinely measure quantitative levels of specific antibodies in donor plasma would be a highly attractive clinical tool for blood banks.Strong homology of anti-B in type A donors has previously been reported, [61] which supports the claim that there is a measurable difference in the Raman signal between donors below and above 10 anti-B (aB).
The aAB reference dataset is highly imbalanced (24% positives) resulting in an AUC of 0.72±0.07but a poor F1 score of 0.52±0.06,suggesting the applied resampling and ensemble methods have not been able to completely prevent underfitting.The Se (dominant) and se (recessive) traits are the ABHsecretor status of the donor, referring to a donor's ability to make ABO antigens in their plasma and secretions.Classification of Se shows promising results using random undersampling with an AUC of 0.88±0.07despite significant class imbalance (see Figure 5).

Erythrocyte and Platelet Antigen Traits
In order to maintain as large a dataset as possible, each erythrocyte and platelet antigen trait is classified as either negative or positive, rather than a homozygote/heterozygote discrimination, for example, Jk a Jk a and Jk b Jk b .Intuitively, homozygote discrimination would be easier due to a more distinct difference.However, the number of spectra available per class would be reduced significantly, and the nature of the RS ensures that both forms present in the heterozygote are clearly visible in the spectral data.Assignments where  is followed by a number subscript are based on a labeling scheme developed for metalloporphyrins, [4,59,60] and constitute various vibrations in the porphyrin, the main structure of hemoglobin.The distribution of positives and negatives for each trait is given in Table 3 as the fraction of positives in both the population [62] and the actual dataset for training.Four classification performance examples of erythrocyte antigens, S, Do b , Fy b , and Jk a , are presented in Figure 5. 50.1%, 87.1%, 78.9%, and 79.1% of donors in each dataset are genotyped positive, respectively.The highly imbalanced traits Do b , Fy b , and Jk a are all randomly undersampled to balance the datasets before classification, resulting in promising AUC values of 0.75±0.1,0.72±0.05,and 0.74±0.05,and F1 scores of 0.86±0.03,0.75±0.06,and 0.75±0.05,despite the significant reduction of training dataset size.The S antigen dataset is almost perfectly balanced from the start and is classified with an AUC of 0.73±0.04without any resampling of the training dataset.

Dataset Balance and Antigen Concentration
Traits with very few positive observations (<10%) Kp a , Lu a , Cw, K, and Yt b are resampled during training using ADASYN and have promising AUC values 0.93±0.08,0.86±0.07,0.83±0.07,0.72±0.15,and 0.67±0.19.However, all traits suffer from low precision and consequently low F1 scores (see Table 3) due to very few available positives in the test splits, and the general performance seems to vary significantly in each iteration of the repeated cross-validation (see Figure 6).This can be attributed to a lack of positive values when testing the performance in each crossvalidation split, and illustrates the challenge of constructing large validation datasets for the rare antigens.Antigen e, which is positive for 98.1% of the donors, performs extremely well on all metrics, and oversampling seems to sufficiently describe the minority (negative) class.
As expected, the ABO blood groups are easiest to classify as the datasets are both balanced and the molar antigen concentration among the highest (see Table 2).Other high concentration antigens, RhD, M, and N do, however, not perform as well with AUC values of 0.7±0.05,0.61±0.07,and 0.58±0.06,suggesting that a balanced distribution of positives and negatives is more critical than the molar antigen concentration during classification.HPA-15 is the antigen with the lowest density of 1000 molecules per platelet which with a platelet density of 250× 10 9 per L corresponds to 417 pm.The two homozygous forms have comparable frequencies of 0.26 and 0.24 and the amino acid difference between the two antigens is a tyrosine to serine substitution.Despite the antigen location on the relative sparse platelets, a set of metrics above the average is observed (HPA-15a AUC = 0.76±0.05and HPA-15b AUC = 0.72±0.08).Similarly, the more abundant platelet antigens HPA-1b (1.46-2.17nm) and HPA-5b (1.25-2.1 nm) are classified with above average AUC values of 0.68±0.05and 0.78±0.08.For future construction of datasets containing the entire cohort of genotyped donors at Copenhagen University hospital (≈35 000 and growing by 10 000 each year), only 2% would be expected to be positive for the antigen Kp a , meaning less than 1000 positive spectra can be collected.Even fewer donors would be available for a specific phenotype such as homozygote Kp a Kp a (0.04%), so to create a sufficiently large validation dataset, oversampling and data augmentation of training data are necessary.

Scalability and Learning Curves
The sample size of the training dataset remains one of the most significant parameters of model selection and performance evaluation in the field of ML and DL. [63]The selected model must be able to capture the complexity of the problem, and its own parameters must be able to be estimated through the available observation points.The total number of model parameters are affected by various factors.Since the model is expected to represent the underlying data point distribution, the model could be extremely simple or complex, which decreases or increases the total number of model parameters, respectively.Generally, a large number of observation points improves the estimation quality of the model parameters and the generalization ability of the model.Low complexity ML algorithms perform well on small datasets but tend to plateau when the training data reaches a certain size, while the performance of DL algorithms is superior with larger amounts of data. [64]s reported in a study on the scaling of ML and DL models trained on the UKBiobank brain images dataset, [65] one of the largest biomedical datasets in the world, higher model complexity does not necessarily improve the performance of a classifier significantly, even when increasing the sample-size.It depends entirely on the presence of learnable non-linearity in a dataset, whether going from a linear SVM, to a shallow non-linear kernel SVM (RBF) or a deep nonlinear neural network (CNN) will improve the accuracy.If such non-linearity is accessible in a dataset, the performance of a SVM model with a non-linear kernel is expected to surpass a linear SVM model.This seems to be the case for our dataset as seen in Figure 8b where both the SVM (RBF) and CNN model clearly outperforms a linear SVM model.
To give an estimate of the potential performance of the models when increasing the dataset significantly (more than 30 000), a post-hoc sample-size analysis of the collected datasets based on cross-validation is conducted. [66]Fivefold cross-validated training and test scores are generated using different training test sizes.Plotting the performance as function of the number of observations in the training subset then provides an approximation of the learning process of the model and a saturating inverse power law is fitted to each of the learning curves [67][68][69] The learning rate, , and decay rate,  are estimated by nonlinear least squares.In order to have reasonably balanced data, a scaling comparison of the SVM (RBF) and CNN model is carried out on each of the ABO discrimination datasets, and the results are presented in Figure 7.The learning curves suggest that at a certain training size the CNN model starts to outperform the SVM (RBF) model and approach clinically relevant AUC values faster.To reach a combined (three steps) 1 error in 500 000 determinations of donors, a single determination accuracy of 0.99 would be required. [70]The learning potential does not seem to have stagnated at the current dataset size, suggesting a larger dataset could improve the performance.The potential level of improvement should be investigated further by collection of a large dataset.Both hemoglobin and albumin are high abundance components in whole blood and dominate the Raman spectra, which could mask signal from other analytes providing valuable information to determine a trait.To fully exploit the sensitivity and specificity of RS, blood components could be   spatially separated in-line such that multiple analytes can be probed independently without introducing sample preparation steps or compromising the label-free nature of the method. [71,72]dditionally, a more efficient collection of Raman scattered light will improve SNR without increasing the exposure time, and online suppression of fluorescence background using modulation of the excitation wavelength [73,74] or shifted-excitation Raman difference spectroscopy (SERDS) [75] could increase the amount of accessible information in the Raman spectra significantly.Using multiple excitation wavelengths could further improve the detection level of low concentration analytes, by probing several resonances and pre-resonances with electronic states in the blood components. [76]arning curves are computed for 12 additional traits and presented in Figure 8c.All 12 traits show better performance when increasing the training size, however, superior scaling of the CNN model is not clear from this analysis.For some traits (Do a , Jk b , RhD, aAB, s, and se) the CNN model out-performs the nonlinear kernel SVM model, both at current dataset size and at the projected large training sizes.For other traits, (Fy a , Fy b , Jk a , S, Se, and aB), the simple CNN model used in this study does not seem to improve the classification performance.
Raman spectra are high dimensional data so a model trained on a small dataset risk leading to over-fitting and it can be difficult to accurately estimate the learning ability of the model.Dimensionality reduction methods mitigates this problem, [77] but for small training and subsequently test datasets (<100 samples per class), bias and high uncertainty in the testing strategy can falsely indicate a model's learning rate and overestimate the performance at low training sizes. [78,79]To examine the generalizability of the models the training and cross-validation scores are plotted in Figure 8.The training and cross-validation scores approach each other as the training size increases, implying that the SVM (RBF) models do not overfit on the ABO datasets, however, a larger number of donors is needed to examine this further.

Conclusion
We demonstrated that using RS on whole blood in a fused silica micro-capillary enables determination of multiple blood types in one single, fast, label-free measurement with no pre-analytical preparation.A peak analysis of the Raman spectra showed that hemoglobin dominates the Raman scattered signal of whole blood at 785 nm excitation, as expected.ABO blood groups were pairwise discriminated with an average AUC of 0.91±0.03[15] However, the direct use of whole blood significantly increases the throughput of our method and lack of reproducibility due to any heterogeneity of SERS substrates is avoided.Post-hoc scalability analysis indicated that increasing the training size to the entire donor cohort at Copenhagen University Hospital (30-60 000) would improve the performance, potentially enabling clinically relevant values, and scaling of the performance of a simple three-layer CNN model supports the potential of increasing the dataset size in the future.
We demonstrated a correlation between the presence of various erythrocyte and platelet antigens and the Raman spectrum of whole blood.Spectra containing rare frequency erythrocyte antigens such as Kp a , Lu a , Cw, K, and Yt b were classified with an average AUC of 0.81±0.09by oversampling the minority class during training.The class imbalance presented a challenge in the development of the ML and DL models and a larger validation dataset should be used to improve the model stability and minimize the test variance.Highly imbalanced traits such as Do b , Fy b , and Jk a were randomly undersampled resulting in AUCs of 0.76±0.09,0.72±0.05,and 0.74±0.05.The presence of low molar concentration platelet antigens HPA-1b, HPA-5b, HPA-15a, and HPA-15b were correlated to the Raman spectra and classified with an average AUC of 0.73±0.4.Furthermore, Anti-B titers of blood group A donors (aB) above and below 10 were classified with an AUC of 0.80±0.06and the dominant ABH-secretor status (Se) were classified with an average AUC of 0.88±0.7.We emphasize that this is a proof-of-concept study and significant efforts need to be made to achieve clinically relevant performance.The specificity and sensitivity of the method are limited by high abundance analytes, which dominate the Raman spectra.In addition to increasing the amount of training data, efforts need to be made on emphasizing low concentration analytes, using methods such as in-line separation of blood components, background fluorescence suppression, or multi-excitation RS to truly enable clinical use.
Expansion of our approach to the entire available donor cohort requires particular handling of rare frequency antigen donors, using resampling, data augmentation and repeated measurements in the training process, such that the number of validation samples are sufficient.Our study shows the feasibility of developing RS and AI as an accurate and fast clinical tool to determine a range of donor traits that are otherwise time-consuming and expensive to procure.

Figure 1 .
Figure1.a) Antigens on the surface of erythrocytes, platelets, in plasma, and antibodies in plasma all determine various blood groups and other traits for a specific donor.b) A sample of whole blood is collected in EDTA with no pre-analytical sample preparation and drawn directly through a fused silica microcapillary by a syringe pump during the acquisition of the Raman spectrum.c) The RS setup is built on an inverted microscope: The 785 nm excitation laser source is focused to a spot inside the micro-capillary and Raman scattered light is collected by the microscope objective, filtered by notch and long pass edge filters and delivered to a spectrometer.

Figure 2 .
Figure 2. a) Each Raman spectrum is baseline corrected and normalized.b) The dimension of the input data is reduced to a few principal components (PCs) before the models are trained.c) The input data is then used to train an ensemble of SVM models where subsets are created with replacement from the total dataset and balanced by resampling.The final prediction is computed by majority voting.d) Alternatively the preprocessed Raman spectra are used directly as input data in a 1D convolutional neural network (CNN) model, with 3 convolutional layers, a max pooling layer and two fully connected dense layers (see Section 2.8 for details).The number of filters (F) and the kernel size (K) in each convolutional layer are denoted (F,K).

Figure 3 .
Figure3.a) Peak assignments of blood components in a normalized average spectrum of all measurements.Assignments where  is followed by a number subscript are based on a labeling scheme developed for metalloporphyrins,[4,59,60] and constitute various vibrations in the porphyrin, the main structure of hemoglobin. 15 ,  6 ,  44 ,  30 ,  41 ,  4 ,  12 ,  20 , and  38 are all in-plane stretching modes belonging to Pyr, the outer ring structure of Porphyrin. 7 is in-plane deformation mode belonging to Pyr.  45 ,  5 ,  18 ,  13 ,  42 ,  21 ,  28 ,  11 ,  19 ,  37 , and  10 are other modes belonging to the Heme b structure in hemoglobin.More specifically in-plane stretching and deformation vibrations of C-H and C-C at various ,  and meso (m) positions in porphyrin.Protein assignments include phenylalanine (Phe), an amino acid which is present in both hemoglobin and other membrane proteins, deformation modes (CH 2 /CH 3 ) from amino acid side chains and amide I. Tyr refers to the amino acid Tyrosine.b) PC loadings for a few important PCs, describing the importance of each intensity variable.c) Distributions of the intensity values at 24 of the most significant peaks according to (a) and (b) for all Raman spectra.
Figure3.a) Peak assignments of blood components in a normalized average spectrum of all measurements.Assignments where  is followed by a number subscript are based on a labeling scheme developed for metalloporphyrins,[4,59,60] and constitute various vibrations in the porphyrin, the main structure of hemoglobin. 15 ,  6 ,  44 ,  30 ,  41 ,  4 ,  12 ,  20 , and  38 are all in-plane stretching modes belonging to Pyr, the outer ring structure of Porphyrin. 7 is in-plane deformation mode belonging to Pyr.  45 ,  5 ,  18 ,  13 ,  42 ,  21 ,  28 ,  11 ,  19 ,  37 , and  10 are other modes belonging to the Heme b structure in hemoglobin.More specifically in-plane stretching and deformation vibrations of C-H and C-C at various ,  and meso (m) positions in porphyrin.Protein assignments include phenylalanine (Phe), an amino acid which is present in both hemoglobin and other membrane proteins, deformation modes (CH 2 /CH 3 ) from amino acid side chains and amide I. Tyr refers to the amino acid Tyrosine.b) PC loadings for a few important PCs, describing the importance of each intensity variable.c) Distributions of the intensity values at 24 of the most significant peaks according to (a) and (b) for all Raman spectra.

Figure 4 .
Figure 4. ROC curves and confusion matrices for the ABO classification results.A ROC curve is produced for each of the five validation splits and the average is calculated.Similarly, the confusion matrices are averaged over five iterations of CV splits.

Figure 5 .
Figure 5. ROC curves and confusion matrices for classification results of selected traits.

Figure 6 .
Figure 6.ROC curves and confusion matrices for classification results of traits with very few positives (Kp a , Lu a , Cw, K, and Yt b ) or very few negatives (e).

Figure 7 .
Figure 7. Learning curves computed by varying the training size during fivefold cross-validation.Scaling comparison between a SVM model with an RBF kernel and a CNN model.

Figure 8 .
Figure 8. a) ABO learning curve fits for both the training and cross-validation score.b) Comparison between ABO learning curve fits using a linear SVM model, non-linear SVM model and a CNN model.c) Average learning curve fits for 12 selected additional traits.Learning curves of the remaining traits can be found in Figure S6, Supporting Information and include poor fits due to imbalance, small datasets, or outlier learning rates at low training sizes.

Table 3 .
Bagging ensembles with either a single or 50 estimators and a SVM based estimator.Each dataset is either not resampled, under-or oversampled.Hyperparameters are optimized by fivefold cross-validation using BA as the loss function.The cross-validation scores are presented in the table..