Genome‐Wide Analysis of circular RNAs and validation of hsa_circ_0006719 as a potential novel diagnostic biomarker in congenital scoliosis patients

Abstract Congenital scoliosis (CS) is a form of spinal curvature resulting from anomalous development of vertebrae. Recent studies demonstrated that circRNAs could serve as potential biomarkers of disease diagnosis. Genome‐wide circRNAs expression in seven CS patients and three healthy controls was initially detected. Bioinformatics analysis was conducted to explore the potential pathological pathway of CS. Quantitative PCR (qPCR) was performed to validate the selected circRNAs in the replication cohort with 32 CS patients and 30 healthy controls. Logistic regression controlling for gender was conducted to compare the expression difference. Receiver operating characteristic (ROC) curve analysis was performed to evaluate the diagnostic value. Twenty‐two differentially expressed circRNAs were filtered from genome‐wide circRNA sequencing. Seven circRNAs were validated by qPCR. Only hsa_circ_0006719 was confirmed to have a higher expression level in the CS group than the healthy control group (P = 0.036). Receiver operating characteristic curve also suggested that hsa_circ_0006719 had significant diagnostic value for CS (AUC = 0.739, P = 0.001). We described the first study of circRNAs in CS and validated hsa_circ_0006719 as a potential novel diagnostic biomarker of CS.

factors could lead to the development of CS, such as rearrangement of chromosome 16p11.2, 7 mutation of TBX6 8 or FBN1. 9 There is also evidence that environmental factors could contribute to the development of CS. Li et al 10 found that vitamin A deficiency in pregnancy may induce CS in rats. Some researchers illustrated that hypoxia and high altitude are associated with a higher risk of CS. 1,11 However, there were few types of research focusing on the linkage of these aetiological factors. Thus, it is urgent and necessary to identify novel molecular markers of the mechanism, even for diagnosis and therapeutics.
Circular RNAs (circRNAs) are a class of non-coding RNAs that manifested with stable circular RNA structure. 12 Recent studies have reported that circRNAs play a pivotal role in different kinds of diseases.
CircRNAs could act as efficient microRNA sponges, 13 which function as post-transcriptional regulators. 14 Some researchers also found that some circRNAs could work as protein sponges, 15 which can influence gene regulation. Other studies also found in cellular responses to environmental stress, some circRNAs can be translated. 16,17 Because of the unusually stable circular structure, circRNAs were also reported to be promising diagnostic biomarkers in different diseases. 18,19 There are very few studies focusing on circRNAs and CS. Our previous study has identified significantly different expressed circRNAs in vitamin A deficiency-induced CS rats. 20 However, the function and characteristics of circRNAs in CS patients are still unclear.
In this study, we enrolled seven patients and three healthy controls to identify and compare the expression of genome-wide cir-cRNAs. After filtering, annotation and validation, we compared the differentially expressed circRNAs aiming to find the potential diagnostic and therapeutic biomarkers of CS.

| Patients and materials
Seven patients diagnosed as CS and three healthy controls were recruited from Peking Union Medical College Hospital (PUMCH). Thirtytwo CS patients and 30 healthy controls were enrolled as a replication cohort. The inclusion and exclusion criteria of CS were as follows.

| RNA extraction and Genome-wide circRNAs sequencing
Total RNA was extracted from the peripheral blood using a Qiagen

| Bioinformatics analysis and annotation of circRNAs
The low-quality reads were removed, and the clean reads were aligned to the reference human genome (hg19). Based on the tool of find_circ, 14 find_circ_enhance was used to identify circRNAs with default parameters (circRNA read count ≥2 and unique alignment reads ≥2). Filtered circRNAs were used for further annotation. After comparing to circBase, novel circRNA and known circRNA were separated. Transcripts per million clean tags (TPM) were applied to quantify the expression of circRNAs. 21 Differentially expressed circRNAs were estimated by DESeq2. 22 The expression level was considered significantly different when |log2 fold change| > 1 and P value < 0.05. Functional enrichment analysis of the differentially expressed circRNA-related genes at Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) was performed using clusterProfiler (v3.6.0) Bioconductor package.

| Validation of selected circRNAs
Quantitative PCR (qPCR) was performed to validate the circRNAs identified by the bioinformatics analysis. Divergent primers were

| Statistical analysis
Data of qPCR were analysed using SPSS software (16.0 version, SPSS Inc, Chicago, IL, USA). The expression differences of qPCR were evaluated using logistic regression controlling for gender. Receiver operating characteristic (ROC) curve was performed to estimate the diagnostic application of circRNA.

| Genome-wide circRNA expression profiles
There were 551,278 circRNAs identified by circRNA sequencing in seven CS patients and three healthy controls. After filtering, there were 126 907 candidate circRNAs left for further analysis ( Figure 1A). In order to identify the source of those circRNAs, breakpoint type was also annotated ( Figure 1B). Most of them were derived from exonic or intronic circRNAs. Differentially expressed circRNAs were estimated by the fold changes (FC) of circRNA expression ( Figure 1C). In total, 394 circRNAs with significantly different expression level were analysed by the volcano plot ( Figure 1D, Table S1).
We enriched the differentially expressed circRNA-related genes and then subjected these genes to GO annotation and KEGG pathway analysis. It indicated that the circRNA-related genes were enriched in several pathways. The top outcomes of GO enrichment were organelle organization in biological process (BP), intracellular organelle part in cellular component (CC) and transcription coactivator activity in molecular function (MF) (Figure 2A). In the KEGG pathway analysis, three pathways with the most significant association were the ubiquitin-mediated proteolysis signalling pathway, endocytosis signalling pathway and oocyte meiosis signalling pathway ( Figure 2B), indicating that these pathways could be associated with CS.

| Validation of selected circRNAs
Combing with the P value and the read count of each circRNAs, we selected 22 circRNAs with the most stable and significant different expression for validation (Table 2). Divergent primers were designed (Table S2), and qPCR was performed, and the back-spliced sites of seven circRNAs were confirmed by Sanger sequencing including hsa_circ_0006856, hsa_circ_0006719, hsa_circ_0006208, hsa_circ_0002785, hsa_circ_0002692, hsa_circ_0002372 and hsa_circ_0000225 ( Figure 3). Then, we compared the expression level with ΔΔCt method (Table S3). All of those seven circRNAs had higher expression levels in the CS group than the healthy control (HC) group (Table S4). After compared with the expression level of the discovery cohort, only hsa_circ_0006719 had the same expression tendency in the replication cohort. After logistic regression controlling for gender, we found that the expression level was statistically significant different between the CS group and the HC group (95% CI: 0.59-0.98, P = 0.036; Table 3).

| ROC curve analysis of hsa_circ_0006719 in CS patients
To estimate the diagnostic value of hsa_circ_0006719 as candidate biomarkers of CS, ROC curve analysis was performed. The area under the curve (AUC) was 0.739 (95% CI: 0.611-0.866, P = 0.001) ( Figure 4). Therefore, hsa_circ_0006719 may be a potential diagnostic biomarker for CS.

| D ISCUSS I ON
In this study, we performed genome-wide circRNA sequencing and compared the circRNA expression profiles of peripheral blood from CS patients and healthy controls. Gene Ontology and KEGG pathway analysis were conducted to explore the potential func-  pathways. Previous studies indicated that the ubiquitin-mediated proteolysis signalling pathway is involved in osteogenic differentiation and may play essential roles in disorders manifested with scoliosis. 28,29 The endocytosis signalling pathway has also reported to play an pivotal role in the development of scoliosis. 30,31 We hypothesized that these differentially expressed circRNAs could regulate ubiquitin-mediated proteolysis and endocytosis signalling pathway, which could influence the development of CS.
To explore specific circRNAs involved in CS, we compared the expression level in the replication cohort. We selected 22  Vitamin K is proved to be essential in bone development, including affecting the function of osteoblasts 34 and osteoclasts. 35 The supplementation of vitamin K has an effect of osteoporosis prevention. 36 Several studies also delineate that mutations of VKORC1 are associated with bone mineral density and osteoporosis. 37,38 Thus, we hypothesized that hsa_circ_0006719 could contribute to the aetiology of CS by regulating the expression of VKORC1.
Further functional studies are needed to identify this hypothesis.

| CON CLUS ION
In conclusion, we described the first study of circRNAs in CS patients.
After genome-wide circRNAs analysis and validation in a replication cohort, we found that the differentially expressed circRNA-related genes were enriched in several pathways, and we also found that hsa_circ_0006719 had a higher expression level in CS and could act as a potential novel diagnostic biomarker of CS.

ACK N OWLED G EM ENTS
We give our great thanks to the patients and healthy controls for their active participation.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

AUTH O R CO NTR I B UTI O N
GL and JS performed the research, analysed and interpreted the data.

DATA AVA I L A B I L I T Y S TAT E M E N T
Our data are available upon request to the corresponding author.