Age‐stratified and gender‐specific reference intervals of six tumor markers panel of lung cancer: A geographic‐based multicenter study in China

Abstract Background Serum biomarkers have been widely adopted in clinical practice for assisting lung cancer diagnoses, therapeutic monitoring, and prognostication. The function of a well‐performing tumor biomarker depends on a reliable reference interval (RI) with consideration of the study subjects’ age, gender, and geographical location. This study aimed to establish a RI for each of 6 lung cancer biomarkers for use in the whole country of China on Mindray platform. Methods The levels of serum 6 lung cancer biomarkers—namely progastrin‐releasing peptide (ProGRP), neuron‐specific enolase (NSE), squamous cell carcinoma antigen (SCC), carcinoembryonic antigen (CEA), cytokeratin‐19 fragment (CYFRA21‐1), and human epididymis protein 4 (HE4)—were measured utilizing the chemiluminescence immunoassay on the Mindray CL‐6000i platform following the laboratory standard operating procedures in apparently healthy Chinese individuals on large cohort, multicenter, and geographical consideration bases. The CLSI EP28‐A3C guideline was followed for the enrollment of study subjects. Results The age‐stratified, gender‐specific RIs for ProGRP, NSE, SCC, CEA, CYFRA21‐1, and HE4 lung cancer biomarkers in the Chinese population have been established as described in the results and discussion in this work. In addition, various levels of the six lung cancer biomarkers among nine geographical locations in China have been observed. Conclusions The sample volume of study cohort, age, and geographical location should be considered upon establishing a reliable biomarker RI. A RI for each of six lung cancer biomarkers has been established. The results from this study would be helpful for clinical laboratories in interpreting the analytical results and for clinicians in patient management.


| INTRODUC TI ON
Lung cancer is the most common neoplasm both in incidence and mortality worldwide, including in China. 1-4 Globally, the lack of reliable tools for early screening, diagnosis, and treatment monitoring has resulted in late-stage or terminal diagnoses. Low dose computed tomography (LDCT) and tumor markers are common tools currently available for lung cancer diagnosis in clinical practice. The procedure, however, has a high false positive rate, limiting its efficacy in helping to identify cancer. To date, several tumor markers have been used for lung cancer screening, diagnoses, therapeutic monitoring, and prognostication in clinics, with the assay procedure being minimally invasive, convenient, and easy to access with low costs in clinical practice. 5,6 Elevation of CEA has been found in many types of diseases, including lung cancer, with lung cancer being more specific for adenocarcinoma of the lung. 7,8 Elevation of CYFRA21-1 has been found to be associated with worse five-year overall survival and local regional relapse-free survival in non-small cell lung cancer (NSCLC). 9,10 NSE is considered a marker for small cell lung cancer invasiveness. 11 Squamous cell carcinoma antigen (SCC) has been considered as a squamous cell carcinoma specific marker. 12 HE4 is usually considered as a biomarker for ovarian cancer and used in the diagnosis of a neoplasm. Elevation of HE4 in serum and pleural effusion were found in NSCLS patients, making it a potential new lung cancer biomarker. 13,14 To date, the sensitivity and specificity of lung cancer biomarkers are still a bottleneck to overcome in lung cancer screening and diagnosis. There is thus an urgent need to improve lung cancer risk assessments because current in vitro diagnosis-based screening criteria miss a large number of cases. 15 Although there are plenty of reports regarding lung cancer biomarkers and their diagnostic performance available thus far, 16-23 a well-established reference interval for each lung cancer biomarker is still desired to enhance the performance of the aforementioned biomarkers. To establish a well-designed and dependable reference interval, certain criteria of the study subjects-such as age, gender, geographic location, and life style-should be considered, because they may have an impact on the levels of biomarkers of individuals to be investigated. The sample volume to be enrolled in the study is another important factor when establishing a well-performed reference interval. Although CLSI 24 requires a minimum of 120 samples to satisfy the sample volume in establishing a reference interval considering the cost of conduct, a larger volume sample will result in a better Poisson distribution and represent a "near-true" value in the population, if the budget allows. To conduct a reference interval study for a tumor biomarker, a standardized evaluation of tumor markers on a large population with age-stratified, gender-specific, and geographic location well represented of healthy subjects are desired. Lastly, establishing a reference interval with a multi-marker panel of lung cancer biomarkers in multiple hospitals simultaneously is a challenge. We report the establishment of a reference interval for six individual lung cancer biomarkers, namely the progastrin-releasing peptide (ProGRP), neuron-specific enolase (NSE), squamous carcinoma antigen (SCC), carcinoembryonic antigen (CEA), cytokeratin-19 fragment (CYFRA21-1), and human epididymis protein 4 (HE4) as phase I of our recent multi-center clinical study series with age-stratified, gender-specific, large cohort, and geographic population considerations from 9 large tier-3 hospitals in China.

| Study design and ethic approval
The design of this study was based on the CLSI EP28-A3C "Defining, Establishing, and Verifying Reference Intervals in the Clinical Laboratory; Approved Guideline-Third Edition". 24 Laboratory parameters from individuals who visited the health examination center for routine health checks in all participating hospitals were collected.
These subjects were provided with a health condition questionnaire Results: The age-stratified, gender-specific RIs for ProGRP, NSE, SCC, CEA, CYFRA21-1, and HE4 lung cancer biomarkers in the Chinese population have been established as described in the results and discussion in this work. In addition, various levels of the six lung cancer biomarkers among nine geographical locations in China have been observed.

Conclusions:
The sample volume of study cohort, age, and geographical location should be considered upon establishing a reliable biomarker RI. A RI for each of six lung cancer biomarkers has been established. The results from this study would be helpful for clinical laboratories in interpreting the analytical results and for clinicians in patient management.

K E Y W O R D S
carcinoembryonic antigen, human epididymis protein 4, lung neoplasms, progastrin-releasing peptide, tumor biomarkers before blood collection in order to meet the requirement from CLSI EP28-A3C. The exclusion criteria including smoking, alcoholism, medication, diabetes, any cancer or cancer history, any known infection, hypertension, abnormal kidney function, anxiety, recent hospitalization, family inherited diseases, menstruation period, lactation, pregnancy, and use of vitamin supplements. The enrolled study subjects' name, gender, age, and medical record number were also collected.
The study was carried out under the permission and approval from the Institutional Review Board (IRB) / Ethics Committee of all participating hospitals.

| Study site selection
Nine large tier-3 hospitals were selected, representing North, Northwest, Southwest, Central, Central South, and East China.

| Sample collection and storage
Fasting blood was collected from all ostensibly healthy individuals visiting the health examination center of a participating hospital who met the requirements of the study questionnaire. A serum collecting tube, routinely used in each participating hospital, was used to collect the blood, and the samples were transferred to the clinical laboratory for processing by a qualified technician to isolate the serum. The collected serum was then stored at −80 O C for a period of 1-3 months, until required.
Results were deposited in the Laboratory Information System to be further analyzed.

| Determination of outliers by Dixon's test
According to the CLSI C28-A3 guidelines and the principle of statistics, 24,25 the outliers were identified and removed following a report from Liu et al. 25 Specifically, Dixon's test was used to remove the outliers in the datasets following CLSI C28-A3 and Liu et al.. 24,25 The outliers were determined by a D/R ratio in Dixon's test, where D is the absolute difference between an extreme observation (large or small) and the next largest (or smallest) observation, and R is the range of all observations, including extremes. If D/R ≥ 1/3, then the specific data will be removed.

| Normality test of datasets
The distribution of datasets of 6 individual lung cancer biomarkers of 9 participating hospitals was analyzed using One-Sample Kolmogorov-Smirnov Test, a p value <0.05 is considered significant in difference. This analytical result will determine whether parametric or non-parametric statistical method will be used in next step analysis by SPSS version 18.0 software.

| Transformation of skewed data
After normality test, the skewed distribution (non-normal distribution) was transformed into normal distribution by using the Box-Cox method.

| Sub-classification determination for reference interval establishment
Two common factors are considered when establishing a RI, the subclassification (subgrouping) based on gender and age. In this work, the recommendation from CLSI C28-A3 of Establishment of Reference Interval for Clinical Laboratory Test Items was followed, and the Z test was used to determine whether sub-classification is needed for each tumor biomarker. By definition, if Z > Z*, then the difference between the RIs is statistically significant (p < 0.05) between two groups, thus a RI for each group is needed. In other words, if Z < Z*, then the difference between the two RIs is not statistically significant (p > 0.05), and the RIs can be combined. 24,25 However, when the Z value >Z* between gender of a specific biomarker, the sub-classification of age should be also performed regardless of the Z value. In this study, we have grouped the age into > = 50 and <50 groups only considering the sample size to be met the minimum of 120 based upon the CLSI guidelines as well as the fact that most of lung cancer occurred in the elderly people.

| Production of reference intervals for 6 lung cancer biomarkers
To establish a RI for each of six lung cancer biomarkers, following CLSI C28-A3 guidelines and data process are described above. A 95% percentile was presented for the upper scale of the RI, and a 90% confidence interval (CI) was also displayed.

| RE SULTS
The basic information of healthy subjects is listed in Table 1.

| Normality test results of datasets of 6 individual lung cancer biomarkers from 9 participating hospitals
The distributional pattern of 6 individual lung cancer biomarkers of 9 participating hospitals and pooled datasets of all 9 hospitals was analyzed as displayed in Supplementary Figures S1-S7, in which Figures S1-S6 represent datasets of individual hospital, while Figure   S7 represents pooled dataset of all 9 hospitals for each biomarker. Thus, all the data have been transformed into normal distribution by using the Box-Cox method.

| Sub-classification determination for reference interval establishment based upon gender and age (Z test)
Tables 3 and 4 show the statistical results for determination of subclassifying for RI establishment based on gender and age following the CLSI C28-A3 guidelines using a Z test. Results indicate that ProGRP and CYFRA21-1 require 2 RIs to represent each age group (age <50 years and > = 50 years) because the Z value is greater than Z* value. For NSE, there is no need to perform sub-classification since the Z value is smaller than Z*. Since SCC had a Z > Z* in subclassification, thus it requires 2 RIs to represent each gender group and 2 RIs for age sub-grouping regardless of Z and Z* values; CEA and HE4 require 4 RIs to represent gender and age groups, respectively.

| Determination of reference intervals and
90% confidence intervals of 6 lung cancer biomarkers with age-stratified, gender specific, and geographical consideration Table 5 shows the RIs generated for 6 biomarkers based on CLSI C28-A3 recommendation.
All the above data analysis flow chart is displayed in Figure 1.

| DISCUSS ION
Since the performance of lung cancer biomarkers is still debatable in clinical practice, their use for lung cancer diagnosis, therapeutic monitoring, and prognosis prediction is ambiguous, of which is partly because of lacking a rigorous standardized reference interval. 17-33 A reliable reference interval is therefore critical for the performance evaluation of a biomarker. Apart from following the requirements of CLSI EP28-A3C guidelines, this study also considered geography to explore whether physical location influenced outcomes when establishing a reference interval for biomarkers. By doing so, the following information was compiled, enabling us to discuss the merits of a specific reference interval, which is supposed to be considered in clinical laboratory.

| Study subjects enrolled in this study
As indicated in Table 1, a total of 2259 ostensibly healthy individuals were enrolled in this study, meeting the requirements of CLSI.
In fact, each participating hospital in this study enrolled more than Although the CLSI requires a minimum of 120 samples, however, the larger the sample size is, the better distribution it will obtain in statistical analysis which means better in representation. Thus, our study enrolled more than 120 samples in each hospital.

| Normality testing of datasets and data transformation
The purpose of normality test is to evaluate if the sample distribution is normal or not. If the sample distribution is normal, then the parametric method of statistical analysis will be used. In other words, if sample distribution is not normal (skewed distribution), then the non-parametric method of statistical analysis should be used to obtain adequate results. In this study, the results revealed that the datasets of 6 biomarkers were skewed distribution. Thus, the data was transformed into normal distribution by using the Box-Cox method for the following Z test.

| Application of Z test and the results interpretation
As mentioned above, the Z test is applied only when the data is normally distributed according to the CLSI guidelines. Thus, the datasets which showed skewed distribution were transformed into normally distributed data by using the Box-Cox method prior to

| CON CLUS IONS
An age-stratified, gender-specific, and geographical considered reference interval has been established in Chinese population for

| Limitations of this study
It is worthy to point out some limitations of this study: (1) sample size could be larger if the work-flow was performed more efficiently and rigorously in subject enrollment during the study; (2) age and gender matching in subjects enrollment in individual participating hospital and among hospitals could be controlled better, thus avoiding the bias in statistical results; (3) multi-platform comparison is an ideal work in the future effort which is lacking in this study due to budget issue; (4) following-up of those individuals who had elevated serum biomarker (s) is an interesting task to conduct which is lacking in this study also.

CO N FLI C T O F I NTE R E S T
Mindray Corporation provided the chemiluminescence immunoassay analyzer and reagents to the participating hospitals. A proportion of intellectual property right will be shared by Mindray Corporation and the participating hospitals.

DATA AVA I L A B I L I T Y S TAT E M E N T
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.