Genetic analysis of the CITED2 gene promoter in isolated and sporadic congenital ventricular septal defects

Abstract Ventricular septal defect (VSD) is the most common congenital heart defect. Previous studies have reported genetic variations in the encoding region of CITED2 highly associated with cardiac malformation but the role of CITED2 gene promoter variations in VSD patients has not yet been explored. We investigated the variation of CITED2 gene promoter and its impacts on gene promoter activity in the DNA of paediatric VSD patients. A total of seven variations were identified by Sanger sequencing in the CITED2 gene promoter region in 400 subjects, including 200 isolated and sporadic VSD patients and 200 healthy controls. Using dual‐luciferase reporter assay, we found four of the 7 variations identified significantly decreased the transcriptional activity of the CITED2 gene promoter in HEK‐293 cells (P < .05). Further, a bioinformatic analysis with the JASPAR databases was performed and a cluster of putative binding sites for transcription factors was created or disrupted by these variations, leading to low expression of CITED2 protein and development of VSD. Our study for the first time demonstrates genetic variations in the CITED2 gene promoter in the Han Chinese population and the role of these variations in the development of VSD, providing new insights into the aetiology of CHD.

is substantially influenced by interaction and correlation between genetic and environmental factors, a large body of evidence indicates that genetic factors contribute to the majority of CHD. [6][7][8] The promoter, regulatory region of DNA located 5′ upstream of a gene, plays an essential role in transcriptional regulation. It is well reported that DNA sequence variations in the gene promoter region may be associated with alterations in gene expression levels, putatively leading to disease. 9,10 Consequently, the analysis of promoter DNA sequence variations is important for the reason that it improves the diagnosis of disease-causing DNA promoter variations and also expands our understanding of the role of transcriptional regulation in human disease.
CITED2 gene is a key member of the CITED family and is widely expressed in the embryo. 11 Studies have identified CITED2 as a cardiac transcription factor (TF) that is essential to heart development. Lack of CITED2 in embryos can cause abnormal heart ring formation, as well as various cardiac malformations including atrial septal defect, VSD, transposition of great arteries, double outlet right ventricle, and tetralogy of Fallot. 12 Among these CHDs, VSD is by far the most common CHD, with a birth prevalence of 2.62 per 1000 live births, 13 whereas there are no studies focusing on CITED2 gene promoter region variations in patients with VSD. Thus, we hypothesized that variations in the CITED2 gene promoter may result in abnormal CITED2 gene expression, which may increase susceptibility to the formation of VSD. To test this hypothesis, we designed the present study to genetically analyse the DNA sequence of the CITED2 genes promoter region in VSD patients in comparison with the healthy controls and to functionally analyse the variations found in the promoter region.

| Study participants
This study enrolled 400 Han Chinese subjects. The study was approved by the Ethics Committee of TEDA International Cardiovascular Hospital and adhered to the tenets of the Declaration of Helsinki.
Written informed consent was obtained from all subjects or the parents or guardians. From August 2018 to August 2019, 667 children with VSD underwent repair surgery at our hospital. Among those, 243 had isolated and sporadic VSD. Among those, 200 patients were matched with normal control (n = 200) who had the same ethnicity, gender and similar age were recruited in the study ( Figure 1A). The control group was chosen from normal body checking or CHD screening programme at the hospital. All control subjects were confirmed either by clinical screening or plus echocardiography that confirmed no cardiac diseases.

| Sequence analysis
Genomic DNA was extracted from peripheral blood samples using the standard procedure. The sequences of the CITED2 promoter was taken from the Genbank database (https://www.ncbi.nlm.nih. gov/genbank). The accession number for CITED2 is NG_016169.1.
Polymerase chain reaction (PCR) primers were designed to cover translational start sites as well as the potential 5′ promoter region of CITED2 in accordance with literature (Table 1). 14 The promoter region of the human CITED2 gene (1418 bp, from −1197bp to + 220bp to the transcription start site) was generated by PCR and directly sequenced. The primers for Sanger sequencing were listed in Table 1.
Subsequently, the above DNA sequences were compared with the wild-type CITED2 gene promoter sequence.

| Plasmid constructs, cell culture and transfection
To functionally analyse the variations of CITED2 gene promoter, DNA fragments of wild-type and variations in the CITED2 gene promoter region, containing the terminal restriction sites KpnI and BglII, were generated by PCR. Subsequently, by digestion with restriction enzymes, the wild-type and variant fragments were subcloned into the KpnI/ BglII sites of the upstream of the firefly luciferase reporter gene plasmid (pGL3-basic) to construct expression vectors that were validated by Sanger sequence analysis. The PCR primers with KpnI and BglII sites are shown in Table 1.

HEK-293 cells were routinely cultured in MEM (Minimum Essential
Medium, Gibco), supplemented with 10% FBS (Fetal Bovine Serum, Thermofisher), 100 units/ml penicillin and 100 μg/mL streptomycin at 37°C with 5% CO 2 . Cells were maintained and passaged when reaching 80% confluence, and then, 1 × 10 5 cells were seeded into each well of a 96-well plate 24 hours before transfection. Cells were transfected with the above-constructed reporter plasmids, respectively, plus the expressing renilla luciferase reporter plasmid (pRL-SV40) as an internal control to normalize the transfection efficiency in each well. An empty pGL3-basic vector was used as a negative control.

| Analysis with dual-luciferase reporter assay
Firefly luciferase activity was measured 48 hours after transfection. Subsequently, cells were lysed and luciferase activities were measured by adding a luciferase substrate buffer using the dualluciferase reporter assay system according to the manufacturer's protocol. The promoter activities of the CITED2 gene promoter were normalized by the ratios of firefly luciferase activities to

Primers name Sequences 5′-3′ Location Position
PCR primers Primers containing restriction sites Note: PCR primers are designed based on the genomic DNA sequence of the CITED2 gene (NG_016169.1). The transcription start site is at the position of 5001 (+1).
Abbreviations: F, forward; R, reverse. a Restriction sites are underlined.
b Protective bases are presented in bold.

TA B L E 1 List of primers used in this study
renilla luciferase activities. All experiments were performed three repeats independently.

| Transcription factor binding sites prediction
For evaluating whether variations would disrupt or create the affinity of transcription factor binding sites (TFBS), putative TFBS for CITED2 gene variations were predicted online at JASPAR (http://jaspar.gener eg.net/), an open-access database for TF binding profiles. 15,16 The relative profile score threshold was set at 85%.

| The DNA sequence variations identified in VSD patients and healthy controls
A total of 400 subjects, including 200 VSD patients (99 males, 101 females, age ranging from 2 months to 14 years) and 200 sex-and age-matched healthy controls (99 males, 101 females, age ranging from 5 months to 14 years), were recruited ( Figure 1A). There were 7 DNA variations detected by Sanger sequencing.
The genotype distributions were in Hardy-Weinberg equilibrium in both VSD group and control group (both P > .05). Table 2 shows the details of these variations. Among those seven variations, four single-nucleotide variations (SNV) g.4698A>G (rs529363037) and g.4778G>T (rs9385859)], and one novel heterozygous variation (g.4933 C>A) were not found in controls. Further, the allele frequency of these five variations is less than 0.0001 in the NCBI dbSNP database and GnomAD database, and 2 of them [g.4778G>T (rs9385859) and g.4933C>A] have 0 allele frequency (Table 2). In addition, 2 of those 5 variations were discovered in two patients, respectively ( Table 2). These five variations were further investigated for functional studies.
There were two variations that are common [g.4285T>G (rs12333191) and g.4357G>A (rs76757432)] in the control. These were excluded from further study.

| Functional analysis of the variations by dualluciferase reporter assay
To further determine whether these variations in CITED2 gene promoter directly affect the activity of CITED2 promoter, we gener- As illustrated in Figure 2B, luciferase activity analysis showed that among the 5 genetic variations examined on the promoter activity, the reporter plasmids carrying 4 of those (4078C, 4698G, 4778T or 4933A) significantly decreased the luciferase expression, compared to the wild-type (P < .05, Figure 2B). The rest one variation (4255T) did not have significant effect on the luciferase expression (P > .05, Figure 2B). The allele frequency was obtained from NCBI dbSNP database and GnomAD database.

| Putative binding sites for TFs affected by genetic variations
The potential binding sites for TFs in the CITED2 gene promoter affected by variations were investigated by the online tool JASPAR core TF database. 15,16 The results indicated that variations may disrupt or create binding sites for TFs. The variation g.4078C may create binding sites for nuclear factor I-C (NFIC), nuclear factor 1 X-type (NFIX), and RHOXF1, and disrupt the binding sites for HOXB7, HOXC8, and Islet-2. The analysis is summarized in Table 3. Among these TFBSs, ELK1, E2F1, E2F4 and SP1 have been reported as TFs that regulate the CITED2 expression. 17-20 Figure 3 is a schema illustrating the role of CITED2 gene promoter region variation, in combination with the analysis by the JASPAR database from the present study and previous studies. [17][18][19][20] The schema included those genes and pathways: CITED2, Isl1, Nkx2.5, Gata4, Tbx5, Lefty2, and Pitx2, HIF1α, TFAP2 family, Nodal pathway, and VEGF pathway (Figure 3). In particular, it was reported that cardiac-specific deletion of CITED2 in mice caused myocardial compact layer thinning and VSD, as well as abnormal angiogenesis, revealing that CITED2 played an essential role in the growth and development of ventricular muscles and ventricular septal. 26 As it is well known that the promoter is the crucial regulatory region of gene transcriptional regulation, 9  that the four variations in the CITED2 gene promoter region discovered in the present study significantly altered the transcriptional activity of CITED2. The experiments therefore support our hypothesis.

| D ISCUSS I ON
It is essential to know whether genetic variations create or disrupt a new TFBS in the understanding of disease-causing gene regulatory mechanisms. 27 In the present study, we performed bioinformatic analyses with the 4 discovered variations at the promoter region of the CITED2 gene that had cellular functional changes through the JASPAR database. We found that the variation of g.4778G>T may disrupt the potential binding sites for E2F1, a TF that participates in the development and differentiation of several tissues and in the regulation of glucose oxidation, oxidative metabolism, etc. 17 It has been demonstrated by chromatin immunoprecipitation analysis that E2F1 proteins bond to the CITED2 gene promoter regulatory region and the expression increases CITED2 promoter activity. 17 Further, g.4778G>T may also disrupt the potential binding sites for ELK-1.
A previous study on the CITED2 promoter showed that ELK-1 protein bonded to the promoter of CITED2 and cooperatively increased HIF-2α activity in the transcriptional activation of the CITED2 promoter. 18 Another variation, g.4933 C>A may disrupt the binding site for SP1 that is critical for CITED2 expression. In fact, knocking down Sp1 diminished CITED2 promoter activity. 19,20 Considering the experimental results in the present study that these variations at the CITED2 promoter region down-regulated the expression of CITED2 ( Figure 2B), the above findings suggest that these variations may alter CITED2 gene transcription by disrupting or creating binding sites of the CITED2 promoter region for TFs, contributing to the low expression of CITED2. The consequence of this may be associated with the development of VSD.

F I G U R E 3
Schema describes the role of CITED2 gene promoter region variations found from the present study and analysed by the JASPAR database in combination with previous studies. The low CITED2 promoter activity caused by the variations contributes to the low expression of CITED2. Consequently, the low expression of CITED2 may be directly involved in the development of VSD. In addition, it may decrease the activity of VSD-relevant genes and pathways, such as Isl1, Nkx2.5, Gata4, Tbx5, Lefty2, Pitx2 and the Nodal pathway. The low expression of CITED2 may also lead to overexpression of certain cardiac-related genes such as HIF1α and the VEGF pathway. Further, the low level of CITED2 protein may weaken the combination of CITED2 and ISL1, leading to VSD and may overexpress TFAP2 family, causing low expression of Pitx2 that hinders the formation of a normal heart and increases the risk of VSD directly involved in the development of VSD. 12,25,28 In addition, the low expression of CITED2 may decrease the activity of VSDrelevant genes and pathways, such as Isl1, Nkx2.5, Gata4, Tbx5, Lefty2, Pitx2 and the Nodal pathway. 7,26,28,29 The low expression of CITED2 may also lead to overexpression of certain cardiac-related genes such as HIF1α and the VEGF pathway. 12,26,28,30,31 Further, the low level of CITED2 protein may weaken the combination of CITED2 and ISL1 leading to VSD leading to VSD, 26,28,29 and may overexpress TFAP2 family, 12,26 causing low expression of Pitx2 that hinders the formation of a normal heart and increases the risk of VSD. 12,26

| Limitations of the study
There are some limitations in this study. The sample size at this stage is relatively small. Further verification of CITED2 promoter variations in larger cohorts or in other populations is required. The validation in other syndromic types of VSD is also required. In addition, the degree of decreased transcriptional activity is rather small, and therefore, in vivo validation in animals needs to be performed in the future in order to confirm the role of these variations in the development of VSD or other CHDs.

| CON CLUS ION
In conclusion, the present study for the first time has identified ge-

ACK N OWLED G EM ENTS
We thank the patients and their family members for their collaboration. The assistance of Nursing staff at the Division of Pediatric Cardiac Surgery, Department of Cardiovascular Surgery, is gratefully acknowledged.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest with respect to the authorship and publication of this article.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.