Power-Law-Based Synthetic Minority Oversampling Technique on Imbalanced Serum Surface-Enhanced Raman Spectroscopy Data for Cancer Screening

Surface-enhanced Raman spectroscopy (SERS) has shown highly promising for existing cancer screening. However, previous “ proof-of-concept ” studies ignored the natural imbalance of cancer types in the population

Surface-enhanced Raman spectroscopy (SERS) has shown highly promising for existing cancer screening. However, previous "proof-of-concept" studies ignored the natural imbalance of cancer types in the population, leading the model to be biased toward learning more features in majority class during the learning process at the expense of ignoring minority class. Herein, a power-law-based synthetic minority oversampling technique (PL-SMOTE) method is proposed to guide the resampling of multiclass serum SERS data by analyzing the long-tailed (power-law) distribution of cancer prevalence in the population. The proposed PL-SMOTE method balances the number of minorities to resample and the number of overlaps between classes by introducing modulating factor. Modeling on resampled datasets synthesized by PL-SMOTE verifies the effectiveness of proposed PL-SMOTE method. After further fine-tuning, the parameters of the deep neural network model and PL-SMOTE method, an optimal cancer screening model with an optimal macroaveraged Recall score of 97.24% and an optimal macroaveraged F2-Score of 97.38% is obtained. A new method for multiclass imbalanced resampling is provided, which has significant improvement on model performance in terms of SERS cancer screening. The method also inspires in other multiclass imbalanced scenario, such as biological medicine, abnormal detection, and disaster prediction.
In recent years, the issue of data imbalanced has been encountered in a wide range of domains, such as face recognition, [16] network intrusion detection, [17] telecom fraud identification, [18] and medical diagnostic decision-making. [19][20][21] Exactly, many empirical distributions encountered in biomedicine and other realms of inquiry exhibit a behavior of power-law distributions, in which the few head classes contain mass examples and numerous tail classes include only a few examples. [20,[22][23][24][25] According to cancer statistics reported by IARC 2022, [1] the estimated prevalent cases of top 25 cancers in 5 years around the world (Table S1, Supporting Information) also exhibit a trend of power-law distribution.
In general, when facing such data imbalance, solutions are mainly divided into three types: data level methods, algorithm level methods, and hybrid methods. [26] In data level method, the synthetic minority oversampling technique (SMOTE) [15] resampling algorithm is considered "de facto" standard in the framework of learning from imbalanced data. Unlike random oversampling duplicating the minority examples, SMOTE produces synthetic minority examples by using K-nearest neighbors (KNN) of the minority example under consideration. However, multiclass imbalanced classification is more complicated than its binary counterpart. In this case, a class may be a majority one when it is compared to some other classes, but a minority or well-balanced for the rest of them. Previous cancer screening studies have only considered binary classification scenarios, but in reality, cancers are diverse and have varying prevalence, and traditional data resampling methods do not have explicit guiding criteria for the number of each class to be sampled. Simply rebalancing training set toward the majority or minority class is not a seemly approach. [27] Hence, a dedicated method that will adjust the sampling procedures to both individual properties of classes and to their mutual relations is highly desired for multiclass problems. Power-law (PL) distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. [28] In the real scenario of cancer screening, the PL principle can reasonably regulate the size of data resampling according to the actual prevalence of each cancer type to minimize the bias of data imbalance modeling to minority classes and thus help build more reliable models.
In this work, serum samples from three types of cancer corresponding to prevalence ratio and a group of normal healthy controls were collected and subjected to SERS measurements. In response to the neglect of the minority sampling strategy by the traditional SMOTE in multiclassification, and considering the power-law distribution property of the data distribution, we proposed a power-law-based SMOTE (PL-SMOTE) algorithm. The PL-SMOTE fits the distribution of the original datasets and then introduces a power-law modulating factor γ as a hyperparameter for generating different minority resampling strategies. Instead of simply generating the number of samples as a majority number, the resampling strategies generated by PL-SMOTE method maintain the power-law pattern held by the original distribution. By tuning modulating factor γ, we can instruct the synthesis of the minorities in an appropriate scale, without generating either insufficient number of samples that induce model-biased learning or superfluous samples that lead to overlapping classification boundaries between classes.
To the best of our knowledge, this is the first study combining power-law distribution pattern of datasets and SMOTE method in multiclassification imbalance scenario. Meanwhile, it is also the most authentic and the largest dataset used to study SERS screening so far. The experimental results show that our method achieved encouraging performance. To better illustrate the design of the experiment, Figure 1 illustrates the schematic diagram of PL-SMOTE on imbalanced serum SERS for cancer screening. Serums were first sampled from population ( Figure 1a) and subjected to SERS measurements (Figure 1b). PL-SMOTE method was then introduced to instruct minority synthesis from original SERS datasets (Figure 1c-e). Finally, resampled SERS datasets were inputted into a learning model to train a cancer screening model (Figure 1f ).

Serum Surface-Enhanced Raman Spectroscopy Measurements and Dataset Description
At the beginning, we studied the estimated prevalent cases of top 25 cancers in 5 years (both sexes, all ages) around the world according to cancer statistics reported by IARC 2022 [1] (see Table S1, Supporting Information, for the detailed list). The corresponding histograms of different cancer types prevalence cases are shown as blue bar in Figure 2a. Sorted in a descending order, distinct differences were found for the prevalence of different cancer types and histograms roughly show a power-law (long-tail) distribution trend. That is, a few cancer types at the head occupy the majority of the sample, while many categories at the tail occupy only a small portion of the total sample. Furthermore, to illustrate the characteristics of the distribution, we fit the histograms with a best power-law fitted function as shown in red dashed with a R 2 = 0.897. The blue curve in the upper right inset shows the log-log plot of histograms, along with a best power-law fitted line in red dashed. The fitting results show that the distribution conforms to a distribution of power law.
Based on the statistics of the prevalent cases of cancer types in the population, serum samples of three cancer types including breast cancer (BC, n = 299), lung cancer (LC, n = 159), and leukemia (LK, n = 78), as well as an appropriate proportion of normal healthy control group (NM, n = 936) were selected from the population for the modeling of this experiment. The histogram of the number of serum samples selected for this experiment is shown in Figure 2b. A distinctly skewed distribution of four classes can be observed. Figure 2c shows the serum SERS spectrum measurement environment. Serum samples were incubated with silver nanoparticles (Ag NPs) for 2 h in a grooved pure aluminum plate. Raman spectroscopy equipped with a 785 nm laser was used for SERS measurement. All SERS spectra are measured under same experiment settings with a laser power of 20 mW and integration time of 5000 ms.
After SERS measurements and preprocessing of spectra, four exclusive serum SERS datasets consisting of 1472 normalized serum SERS spectra have successfully constructed via procedure mentioned above. Each single spectrum is characterized by 2081 discrete features at biofingerprint region of 400-1800 cm À1 . To obtain more insight to our datasets, we visualize the feature distribution of the normalized mean SERS spectra by using kernel principal component analysis. [29] The dimensionality of the SERS spectra is reduced from 2801 to 50 and top 2 explained variances PCs (PC1, PC2) were plotted as scatter in Figure 2d. Normalized mean SERS spectra of the NM (blue, n = 936), BC (orange, n = 299), LC (green, n = 159), and LK (red, n = 78) serum samples are shown in Figure 2e. SERS peaks at 488, 586, 630, 722, 808, 888, 908, 1012, 1134, 1208, 1288, 1386, 1436, 1581, and 1662 cm À1 were consistently observed in the four serum groups. Spectra intensity disparities of some peaks at 488, 630, 808, 888, 1132, 1208, 1436, and 1581 cm À1 are glaringly evident when one class is in contrast with others. Comparing the mean spectra and the standard deviation, differences in profiles between the different groups are also evident. The mercurial standard deviation of the same group indicates that the samples in the same group also shared a divergence to some extent. Several studies [30,31] have shown that various biological constituents in serum samples, such as lipids, proteins, and nucleic acids, affect corresponding SERS signal intensity (Table S3, Supporting Information). Thus, intensity changes in the SERS revealed specific and subtle biomolecule variations in the serum samples, laying a biomedical foundation for the subsequent serum SERS data modeling.
On the one hand, class NM SERS dataset containing 936 spectra is taken as majority class; classes BC, LC, and LK SERS datasets containing 299, 159, and 78 spectra, respectively, are taken as minority classes. Imbalanced ratio (IR), which defines as IR ¼ N datasets N minority , [32] is a common metric used to assess the degree of imbalance in a dataset. In this way, imbalanced ratios (IRs) of BC, LC, and LK are counted to 4.92, 9.26, and 18.9, respectively. For class LC and especially for class LK, IR is in a relatively high level. Thus, a modeling method without considering data imbalance will be biased toward the majority class and neglect more significant minority classes.
On the other hand, the differences of the normalized spectra among the different class clusters are too small to be distinguished and overlapping occurs after dimension reduction. From this point of view, traditional linear discriminant methods are not sufficient to construct discriminant equations for classification. Dedicated algorithms need to be introduced into our task.

Power-Law-Based Synthetic Minority Oversampling Technique Method Implementation
To cope with the problem of data imbalance, we introduced PL-SMOTE method to generate different resampling strategies, www.advancedsciencenews.com www.advintellsyst.com and SMOTE algorithm was used to synthesize the minority classes to a specified scale. Before PL-SMOTE resampling process, NM, BC, LC, and LK serum SERS in original dataset were labeled as target 0, 1, 2, and 3 for quaternary classification. The original dataset was split via stratified fivefold cross-validation method (see Figure S1, Supporting information) [33] by which original dataset was divided into five fair parts with each part including corresponding target portions. In each validation process, training set and test set are including 80% and 20% of original dataset, respectively. Training dataset was fed into model to learn and test dataset was used to evaluate the performance of obtained model. Before training, dataset was shuffled randomly to make dataset have same classes portion of data in each batch. In the fivefold cross-validation process, the sample numbers of NM, BC, LC, and LK in the training dataset are set to 80% of original dataset, i.e., 749, 239, 127, and 62, respectively. As shown in Figure 3, training datasets histograms consisting of four serum SERS was fitted in red dashed as a power-law function with a R 2 ¼ 0.997.
In practice, PL-SMOTE method introduced a modulating factor γ which is tunable in a range of (0, 1). When γ is set to 0.0, new power-law function will be degenerated to the original power-law function and resampling strategy will be set to original distribution and no resampling proceed. In the conventional SMOTE multiclass full resampling strategy, all minority classes are oversampled to the size of the majority class. In extreme cases, γ is set to 1.0 and new power-law function will degenerate to conventional SMOTE multiclass resampling strategy. The proposed method unifies the sampling and nonsampling strategies by introducing modulating factor γ to achieve more flexible control over the imbalance ratio and data size of the dataset. Table 1 shows the generated resampling strategies with γ setting to 0.0, 0.3, 0.6, 0.85, and 1.0. By means of PL-SMOTE, training datasets size was augmented to 1177, 1487, 1921, 2503, and 2996. Average size for each class is 294, 371, 480, 626, and 749, respectively. SMOTE oversampling algorithm was then employed to synthesize new training set. On the one hand, when γ is set to 0.0, training datasets are set to original training datasets. On the other hand, when γ is set to 1.0, the size of the minority classes is all equal to the size of the majority classes.
As shown in Figure 4a, original datasets distribution is fitted as power-law function plotted as black dashed line. Power-law  www.advancedsciencenews.com www.advintellsyst.com function adjusted by γ is plotted as gray dashed line and new resampling strategy can be obtained by sampling on γ-adjusted power-law function. Resampling strategies plotted as histograms are generated by the proposed PL-SMOTE with varying γ (γ = 0.0, 0.3, 0.6, 0.85, 1.0). SMOTE algorithm accompanied by corresponding resampling strategy was then implemented on the original datasets. After the above steps, we obtained the enhanced dataset with corresponding γ resampling strategy.
To visualize the data synthesis process under varying sampling strategies, Kernel-PCA was introduced in this article to achieve dimensionality reduction on the resampled enhanced  www.advancedsciencenews.com www.advintellsyst.com dataset. The dimension of resampled training datasets was reduced to two principal components and plotted as scatter plots, as shown in Figure 4b. When the training dataset is reduced to two principal components using kernel PCA, the stack-up scatter plot of the four types can be seen to coincide significantly. Linear discriminant models do not work in this case. In Figure 4c-f, scatter plots of resampled datasets were decomposed into classes NM, BC, LC, and LK, which provide an insight in visualizing process of minority class sample synthesis. As γ increases, the newly generated samples of BC are in the same core cluster; however, for LC samples, the initial sample can be divided into two subconcepts. Thus, in the process of sample synthesis, there will be two distinct clusters generated. The scatter distribution of LK can be regarded as a main cluster annex with some free noise spots in the initial state. But during the synthesizing process, potential noise spots form a new subconcept, which may have potentially adverse effects on the model training.
When γ is set too small, the number of synthesized minority classes is not enough to rebalance the dominant influence of the majority classes, and when γ is set too large, the excessive number of synthesized minority classes will lead to the expansion of too many noise samples or boundary samples of the minority classes to blur the boundary of the classification, resulting in the degradation of the performance of the whole model. Therefore, an appropriate γ helps to train an efficient and robust cancer screening model.

Modeling on Power-Law-Based Synthetic Minority Oversampling Technique Resampled Datasets
To verify the modeling performance of resampled datasets synthesized by our proposed PL-SMOTE method, three traditional machine learning models including KNN, random forest, support vector machine (SVM), and a DNN model were introduced in this article.
KNN [34] is one of the most popular nonparametric classification approaches. KNN algorithm calculates and orders the distance from new data to the existing one, classifying this input according to the frequency of the labels of the K-nearest ones. This distance is usually measured by the Euclidean norm. SVM [35] belongs to a family of generalized linear classifiers. SVM algorithm allows to find the optimal classification, maximizing the margin between classes, whose border is defined by support vectors. In case there is not a linear separation, the kernel trick is applied to redefine inner products. Random forest (RF) [36,37] is an ensemble learning method based on bagging. RF algorithm is composed of many decision trees, each one created from a subset of features to be considered. Each one votes and the algorithm computes all of them to choose the best prediction. Three traditional classification models were implemented on Scikit-learn machine learning package [38] in python environments and parameter optimization was performed through hyperparametric grid search to achieve the best performance. Although convolution neural networks (CNN) currently offer the best solutions to many problems in the analysis of image data. [39,40] However, for classification of 1D serum SERS spectra composed of spectral peaks with different wavenumbers and widths, a DNN model was preferred in this study. In this experiment, a simple DNN model with three hidden layers was constructed and implemented on PyTorch framework in python. [41,42] The evaluation of classification performance plays a critical role in the design of a learning system and therefore, the use of an appropriate measure becomes as important as the selection of a good model to successfully tackle a given problem. Traditionally, standard performance metrics have been classification accuracy. For a binary classification problem, accuracy can be easily derived from a 2 Â 2 confusion matrix, as shown in Table 2. These metrics provide a simple way of describing a classifier's performance on a given dataset. However, considering an imbalanced dataset that includes 1% of minority class examples and 99% of majority examples, a naive approach of classifying every example to be a majority class example would provide an accuracy of 99%. The accuracy seemly superb at first sight; however, on the same token, this description fails to reflect the fact that all minority examples are misidentified.
In lieu of accuracy, other evaluation metrics are frequently adopted in the research community to provide comprehensive assessments of imbalanced learning problems, namely, Precision, Recall, F-score. Intuitively, Precision is a measure of exactness, i.e., from among the examples labeled as positive, how many are actually labeled correctly. Whereas Recall is a measure of completeness, i.e., how many examples of the positive class are labeled correctly. F-score incorporates both measures to express the trade-off between them.
In early cancer screening scenario, cancer FN results are more serious than FP ones because FN results lead to optimistic expectations and delay optimal early treatment for patients. Although FP can also come with problems of overtreatment, which is tolerable to early screening for malignant disease relatively. Thus, we assigned Recall with more importance than Precision in this case and F-score was degenerated as F2-score. Recall and F2-score are defined as below Meanwhile, multiclass cancer screening confusion matrix was flattened as sequence so as to compare with other γ parameters as shown in Figure 5a. Figure 5b shows newly formed comparative matrices with varying γ. In comparative matrices, [A ≥ B] indicates that class A is wrongly predicted to class B and numbers in the matrix denote the misclassified cases of these misclassification types.
Here, varying γ from 0.0 to 1.0 with a step of 0.1 was employed on original training datasets using PL-SMOTE. Resampled training datasets were then trained on four proposed models and remained 20% test dataset was used for model evaluation. In model evaluation process, cancer screening confusion matrix was flattened as comparative matrices as shown in Figure 5c-f. Recall score and F2-score of four models with varying γ are plotted as Figure 5g,h. In Figure 5, the most frequently misclassification types were [BC->LK], [LC->NM], and [LK->BC] for four models. [SUM] indicated the sum of all listed misclassification types. As shown in Figure 5, the number of misclassifications is observed to gradually decrease with increasing γ for all four proposed algorithms, which illustrate the enhancement effect of the data synthesis method on the imbalanced dataset. On the other hand, synthesizing a larger scale sample of minorities by increasing γ does not always result in modeling improvements. When γ was set to 0.7 or larger, the increment of γ will no longer reap a huge boost as expected when γ is small. It is also worth noting that traditional practices, setting γ to 1.0 to instruct synthesizing minority classes as much as majority class may result in adverse effects because none of four models gain its best performance when γ was set to 1.0. Thus, searching an appropriate γ is indispensable for us to train an optimal model. In Figure 5, the average misclassified cases of three adjacent consecutive cases are 87, 58, 64, and 48 for KNN, RDF, SVM, and DNN, respectively. Also, when compared with 14, 23, and 22 wrongly predicted cases in KNN, RDF, and SVM models, respectively, there are only 12 cases of lung cancer patients were wrongly predicted to normal healthy people for DNN in this experiment. From the perspective of Recall score and F2-score, for four proposed algorithms, DNN gains an overall better www.advancedsciencenews.com www.advintellsyst.com performance than others from the color of heatmap. These results show that DNN has a best capacity for our dataset than all other models.

Fine-Tuning on Power-Law-Based Synthetic Minority Oversampling Technique and Deep Neural Network for Cancer Screening Modeling
From the previous section, γ is an essential factor to guide the synthesis of minority classes in imbalanced datasets. It is also clear that the underoptimized simple DNN reveals the best model performance compared to the other three commonly used machine learning models. Therefore, parameters fine-tuning on PL-SMOTE method and DNN is implemented in this section to train an optimal cancer screening model. Figure 6a shows the DNN architecture for the serum SERS multiclass classification. DNN in this work comprises an input layer, an output layer, and multiple hidden layers. The serum SERS with fingerprint region ranging from 400 to 1800 cm À1 consisted of 2081 wavenumbers with an interval of 0.5 cm À1 . Therefore, input layer contains 2801 nodes corresponding to 2801 Â 1 float input array features. Hidden layers are used for computations and fewer hidden layers or fewer neurons in each layer will lead to insufficient capacity of the model. Increasing the number of layers in the network and widening the number of neurons in each layer can improve the performance of the network. However, unlimitedly increasing the complexity of the network will make network training difficult and bring about a reduction in computational efficiency. Choosing an appropriate hidden layer depth and width has a crucial impact on the overall performance of the model. We achieved a balance between a simple network and a complex one by setting the depth of the hidden layer to four layers and the neuron width of each layer to 512. At this time, the network yields a large capacity and high computational efficiency. A neuron in hidden layers is the www.advancedsciencenews.com www.advintellsyst.com elemental unit of each layer. A neuron computes the weighted sum of its inputs, adds a bias term, and drives the result through a nonlinear activation function called as rectified linear unit (ReLU) to produce a single output. In addition, we added a batch normalization layer before the activation function to improve the speed of model training and convergence. The practice has proved that the batch normalization layer can standardize the features of a certain layer in the network, which solves the problem of numerical instability in the DNN, makes the features of the same batch not similar, and makes the network easier to train. [43] In output layer, output nodes are corresponding to target classes and derive the predicted value. The Softmax function was used just after the output layer to fit the probabilities and figure out final predicted label. Before model training, NM, BC, LC, and LK serum SERS in resampled dataset were labeled as target 0, 1, 2, and 3 for quaternary classification. The training epoch of DNN was set to 400 and the early stop strategy will be activated when there is no performance improvement after 10 epochs. To balance memory limitations and computing efficiency, the batch size was set to 400 in each epoch. Each training instance is passed through the network and the output from each unit is computed. The output predicted by DNN is compared with the target label to calculate the crossentropy loss, which is fed back through the network. The Adam optimizer [44] was used in the learning process, with a learning rate of 1 Â 10 À6 .
In modeling process, the loss drops rapidly initially and some fluctuations in the test loss occurred because of drastically weights updating (Figure 6b). The loss drops below 0.001 at 75th epoch and 100th epoch for training set and test set, respectively. Both the training loss and the test loss level off when training reaches to 350th epoch. The training accuracy and test accuracy are updated in sync and DNN achieves a near-perfect accuracy at 200th epoch (Figure 6c).
When γ was set to 0.85, DNN model achieves the best performance (Table S2, Supporting information), with an optimal macroaveraged Recall score of 97.24% and an optimal macroaveraged F2-score of 97.38%. The detailed performance on the individual classes is displayed in the training confusion matrices ( Figure 6d) and test confusion matrices (Figure 6e). Fine-tuning DNN model achieved excellent classification result among four imbalanced classes and only a few samples were misclassified in this case. The receiver operating the characteristic curves (ROC) was generated to further evaluate the DNN model performance for cancer screening, as shown in Figure 6f. The area under the curve (AUC) was used to evaluate the performance of the classifier performed on OvR (One vs Rest) method.

Conclusion
In this work, an exclusive imbalanced dataset of serum SERS spectra was obtained in population by sampling 1472 serum samples based on power-law characteristic of the estimated prevalent cases of cancer types. No resampling or too little resampling leads to no significant improvement on imbalance dataset, while the traditional multiclassification full resampling synthesizes redundant samples that increase the computational effort and blur the classification boundaries, resulting in degradation of model performance. To cope with the problem of data imbalance, a multiclassification resampling method based on power-law distribution called PL-SMOTE was proposed to address the issue of a natural imbalance in cancer prevalence across different cancer types. The proposed method compromises between full and no resampling strategies by introducing a factor γ to modulate imbalanced ratios of different classes and combines with SMOTE method to synthesize an appropriate number of minority class samples. Therefore, it can closely match the natural distribution pattern of the samples to the maximum extent. We validated the effectiveness of the PL-SMOTE method by using three commonly used machine learning models and a simple DNN model. The model evaluation results show that the DNN combined with the proposed PL-SMOTE method has the best optimization potential on imbalanced serum SERS dataset. After further fine-tuning the parameters of the DNN model and PL-SMOTE method, we obtain an optimal cancer screening model with an optimal macroaveraged Recall score of 97.24% and an optimal macroaveraged F2-Score of 97.38%. This article provides a new method for multiclass imbalanced sampling, which has significant improvement on model performance in terms of SERS cancer screening. Furthermore, the validity of this method for the application on other multiclass imbalance datasets needs to be evaluated. Finally, the proposed method would provide a new perspective to enlighten more multiclassification imbalance domains, such as biological medicine, abnormal detection, and disaster prediction.

Experimental Section
Dataset Construction: Synthesis of Silver Nanoparticles: Stable silver nanoparticles (Ag NPs) solution was prepared by a deoxidizing method developed by Leopold and Lendl. [45] According to this method,a total of 4.5 mL of sodium hydroxide (0.1 mol L À1 ) and 5 mL of hydroxylamine hydrochloride (0.06 mol L À1 ) were uniformly mixed. Then, 90 mL of silver nitrate aqueous solution (0.0011 mol L À1 ) was promptly added to the mixture accompanied by intense stirring, until a homogenous milky gray mixture was obtained. The final Ag NPs solution was characterized using UV-Vis spectroscopy representing an absorption peak at 425 nm with a full width at half maximum of %100 nm ( Figure S2, Supporting Information). A transmission electron microscopy (TEM) image (inserted picture in Figure S2, Supporting Information) of the prepared Ag NPs was also used to characterize colloidal particles. The particle sizes followed a normal distribution with a mean diameter of 42 nm and standard deviation of 5 nm. Based on our experience, the resulting silver colloidal solution could maintain stability for 1 week. A centrifugal machine was applied to concentrate the silver colloidal solution at a speed of 10 000 rpm for 10 min. The supernatant portion was discarded and the final concentration parts were used later with the serum samples.
Dataset Construction: Preparation of Serum Samples: In this work, a total of 1472 human blood serum samples were collected from four subject groups in the partner hospitals and ethical approval had obtained from local ethics committee. Four subject groups consisted of 299 breast cancer (BC) patients, 159 lung cancer (LC) patients, 78 leukemia patients with clinical confirmed, and 936 healthy volunteers confirmed by health examination. Informed signed consent was obtained from the patients (Fujian Medical University Union Hospital). After 12 h of overnight fasting, single peripheral blood samples were obtained from the study subjects www.advancedsciencenews.com www.advintellsyst.com between 7:00 and 10:00 A.M. with the use of coagulant. Blood cells were removed by centrifugation at 1000 rpm for 10 min to obtain the blood serum. Prior to SERS measurement, serum samples were refrigerated in freezer at the degree of À80°C. Dataset Construction: Surface-Enhanced Raman Spectroscopy Measurements: Large batches of samples will encounter the problem of measurement standardization, i.e., how to ensure that the samples are measured under consistent conditions. Lin et al. [10] proposed a superhydrophobic substrate with droplet self-localization ability. Experiments show that the substrate can significantly improve the stability of the measurement because the substrate alleviates the "coffee ring effect". Therefore, we adopted the same protocol to synthesize the same substrate for SERS measurement. In the experiment, same volume of serum sample (2.5 μL) and Ag NPs (2.5 μL) were mixed on superhydrophobic substrate and incubated at room temperature. A high-throughput Raman spectrometer (ATR8000, OPTOSKY (XIAMEN) OPTICAL LTD.) equipped with a 785 nm laser was used for measuring the SERS in the range of 300-3500 cm À1 . All SERS spectra are measured with same integration time of 5000 ms and a laser power of 20 mW.
Dataset Construction: Spectra Preprocessing: In biological applications of SERS, the first preprocessing usually consists of truncating the biofingerprint region of 400-1800 cm À1 . It is reported that the model performance in the fingerprint region is better than in the high region due to less water interference and the presence of more complex chemical features. [46] Nevertheless, the truncated SERS signal still suffered from autofluorescence signals interferences. Thus, a fifth-order polynomial fitting algorithm [47] was deployed for all spectra we measured to eliminate autofluorescence background. In addition, the absolute intensity variations caused by laser fluctuations and sample concentration inhomogeneity need to be taken into account. To eliminate the spectral intensity variations between different spectra and facilitate more accurate spectral shape analysis, all measured SERS spectra were normalized by the integrated area under the curve in the 400-1800 cm À1 wavenumber range after the removal of autofluorescence background from the truncated SERS data.
Power-Law-Based Synthetic Minority Oversampling Technique Principle: The pseudocode for PL-SMOTE is presented in Algorithm 1. Consider an original dataset Data original ¼ ½D class 1 , D class 2 , : : : , D class x , : : : , D class m , x ∈ ½1, m, where m is referred to the number of classes and D class m is represented as the Dataset of class m . The number of class x is sorted by descending order.
In our proposed PL-SMOTE method, original dataset distribution Dist original was first obtain. Original distribution was then fitted to a power-law function as shown below pðxÞ ¼ Cx Àα ¼ n class 1 x Àα 0 (3) where α 0 is a constant scaling exponent of the distribution estimated by "powerlaw" toolbox, [48] and C is known as scaling constant which is replaced by the majority class size n class 1 in this experiment. A scaling exponent normalization modulating factor γ ∈ ½0, 1 was then introduced to the fitted power-law function to adjust the power-law morphology. New power-law function was defined as below pðxÞ ¼ n class 1 x Àα 0 ð1ÀγÞ ¼ 8 > < > : n class 1 x Àα 0 , n class 1 x Àα 0 ð1ÀγÞ , n class 1 , In practice, modulating factor γ is tunable in a range of (0, 1), γ are guilds to generate resampling strategies of variant scales as is depicted in black dashed. When γ is set to 0.0, new power-law function will degenerate to the original power-law function and resampling strategy is do nothing but keep original distribution Dist original . In extreme cases, γ is set to 1.0 and new power-law function will degenerate to scaling constant, i.e., pðxÞ = C = n class 1 . Consequently, the new resampling strategy will conform a uniform distribution, by which all minority classes will synthetic as many samples as majority class. In the end, Data original , Dist new , and parameter Kneighbors were fed on standard SMOTE algorithm to synthesize a new resampled datasets Data new , which were used later during model training.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.