Up‐regulation of PRKDC was associated with poor renal dysfunction after renal transplantation: A multi‐centre analysis

Abstract Renal transplantation is the only efficacious treatment for end‐stage kidney disease. However, some people have developed renal insufficiency after transplantation, the mechanisms of which have not been well clarified. Previous studies have focused on patient factors, while the effect of gene expression in the donor kidney on post‐transplant renal function has been less studied. Donor kidney clinical data and mRNA expression status were extracted from the GEO database (GSE147451). Weight gene co‐expression network analysis (WGCNA) and differential gene enrichment analysis were performed. For external validation, we collected data from 122 patients who accepted renal transplantation at several hospitals and measured the level of target genes by qPCR. This study included 192 patients from the GEO data set, and 13 co‐expressed genes were confirmed by WGCNA and differential gene enrichment analysis. Then, the PPI network contained 17 edges as well as 12 nodes, and four central genes (PRKDC, RFC5, RFC3 and RBM14) were identified. We found by collecting data from 122 patients who underwent renal transplantation in several hospitals and by multivariate logistic regression that acute graft‐versus‐host disease postoperative infection, PRKDC [Hazard Ratio (HR) = 4.44; 95% CI = [1.60, 13.68]; p = 0.006] mRNA level correlated with the renal function after transplantation. The prediction model constructed had good predictive accuracy (C‐index = 0.886). Elevated levels of donor kidney PRKDC are associated with renal dysfunction after transplantation. The prediction model of renal function status for post‐transplant recipients based on PRKDC has good predictive accuracy and clinical application.


| INTRODUC TI ON
Renal transplantation is a beneficial therapy for many patients with renal failure as well as some chronic renal diseases. However, in the early stages of transplantation, delayed graft function (DGF), or malfunction of a transplanted kidney, is a severe complication.
DGF is a severe dysfunction that can give rise to a prolonged hospital stay, a high risk of acute and chronic acute graft-versus-host disease (GVHD), and even graft failure. 1 DGF occurs primarily in a lack of donors and is influenced by a variety of recipients 2 and donor factors, 3 as well as cold ischemic time. 4 The incidence of DGF varies considerably between countries and may be related to ethnicity, with incidence rates ranging from 21% to 70%. 5 The most accepted definition of DGF is at least one dialysis session in the first week after transplantation. 6 Even though a few risk factors for DGF have been identified through recent studies, the pathogenesis of the disease is not fully understood.
The PRKDC gene is located on chromosome 8q11. The PRKDC gene encodes the DNA-dependent protein kinase catalytic subunit (DNA-PKcs), a member of the phosphatidylinositol 3-kinase-related kinase family, a serine/threonine protein phosphorylation kinase. 7 The mutation rates of PRKDC and the expression levels of DNA-PKcs vary significantly in various tumours. Chen et al. performed a statistical analysis of the mutation rates of PRKDC reported in The Cancer Genome Atlas and Chinese population databases; the authors found that PRKDC had high mutation rates in several tumours, including gastric, colorectal and endometrial cancers, with a high correlation with microsatellite instability-high correlation with microsatellite instability. 8 However, the role of PRKDC in the donor kidney and its effect after transplantation has not been reported.
With the technological advances in gene high-throughput technology and proteomics, combined with bioinformatics approaches, access to validated biomarkers for diagnosis and patient survival has been facilitated. Bioinformatics-based multi-gene co-expression analysis is critical. WGCNA is a multigene analysis method first proposed in 2008. 9 Instead of individual genes or isolated biomarkers, WGCNA focus on gene co-expression modules. It associates them with specific features, improving the efficacy of identifying potentially valuable bioregulatory targets. 9 Gene differential expression analysis is an effective way to find the driver genes of a disease. 10 In summary, the simultaneous use of differential gene expression analysis and WGCNA enables the search for potential biomarkers that affect the functional status of specific organs.
In this project, we obtained high-throughput data from the GEO database for 192 kidney transplant donor kidneys. We used differential gene expression analysis and WGCNA to search for potential genes that affect post-transplant kidney function. In addition, we used differential gene enrichment analysis and PPI networks to screen out central genes. We proceeded to examine the potential role of PRKDC in influencing post-transplant renal function through comprehensive bioinformatics analysis and validated the results with 122 patients in our medical centre. We performed a logistic regression analysis based on the screened target genes to screen for high-risk factors and constructed a predictive model for posttransplant renal functional status based on these genes for clinical application.

| Data filtering
Clinical data and gene data related to kidney transplant donor kidneys were obtained from the GEO database (https://www.ncbi.nlm. nih.gov/gds). Data filtering was performed using a function 'rpmk' in the edgeR package. 11 After searching the database, this study used the R function 'GEOquery' to download a collection of clinical data such as renal functional status after kidney transplantation (GSE147451) 12 for a total of 192 patients.

| WGCNA analysis
To increase the accuracy of the joint analysis of multiple genes, genes used for WGCNA were filtered. We used the R function 'WGCNA' to perform multiplex analysis between gene expression data profiles of GEO, grouping genes that are co-highly expressed into different modules. 13 'PickSoftThreshold' was used to build a scale-free network. After performing Pearson correlation analysis, we made a similarity matrix. We then correlated the previously filtered modules with their corresponding clinical features to identify the needed functional modules.

| Differentially expressed genes (DEG) analysis
To clarify donor DEGs between different renal functional states in patients after renal transplantation, we applied the R function 'limma' in the database. the criteria for screening DEGs were adj. p < 0.05 and |logFC| ≥ 1.0. the R packages VennDiagram and ggplot2 were applied to plot Venn and Volcano plots. 14

| Pathway and function enrichment analysis
We enriched specific genes for specific functional parts or signalling pathways to clarify the function of GEO differential genes. Genome Encyclopedia (KEGG) and Gene Ontology (GO) pathway analysis were performed by the cluster profile R package. 15

| Identification of hub genes and construction of PPI and
We have constructed the PPI network through the STRING database (https://strin g-db.org/). The network was analysed and established with Cytoscape software (v 3.8.1) using ≥0.4 points as the extraction value. In the network, edges represent protein interactions, and nodes represent proteins. To specify the hub genes in the network, we used the maximum cluster centrality algorithm calculated by CytoHubba's plug-in. The top four genes were set as hub genes.

| Analysis of the expression levels of hub genes
To further verify the accuracy of the screened genes, we analysed the expression of these critical genes in different renal functional states of patients after renal transplantation. A bar graph represented the expression of each gene in other renal functional forms.
The statistical significance between the two groups was determined by Student's t-test subject to normality (p < 0.05).

| Patients
This study included data from 122 kidney transplant patients col-

| Statistical analysis
Missing values, accounting for ≤5.0% of the data, were estimated using the 'mice' package 16

| Patient and public involvement
The study was a comprehensive study based on a public database and local hospital cases, and the project evaluated renal reconstruction and function after kidney transplantation. The public-that is a statutory health insurer, physicians, local medical managers and patient representatives-was involved in the design and implementation of the entire project. In addition, practitioners (physicians, local health care workers, health care administrators) and scientific experts were involved in the discussions of this study. The results of the study and the project as a whole will be disseminated to participants through practise-oriented publications and newsletters.

| RE SULTS
The flow chart for this project is shown in Figure 1. We performed a WGCNA analysis from the GEO database (GSE147451) to identify the critical genomes of donor kidneys that affect renal function after kidney transplantation. Each colour represents a module. Here, 13 modules in the data set ( Figure 2A) were confirmed. The results of the analysis do not indicate any significant outliers. Figure 2B  We have found that 2211 DEGs in the GEO data set ( Figure 3A) were selected via the 'limma' package, using adj. p < 0.05 and |logFC| ≥ 1.0 as the cut-off criteria. We decided on the top 50 genes based on expression differences and statistical significance to plot the heat map ( Figure 3B). Sixty-six genes were obtained from these modules according to WGCNA. Finally, 690 genes were obtained at the intersection of the DEGs and midnight-blue modules involving genes obtained from the GEO database ( Figure 3C).
To fully understand the functions of these genes in the coexpression module, GO ( Figure 4A) and KEGG ( Figure 4B) analyses were performed on the differential genes in GEO using the R pack-  We analysed the STRING database PPI to investigate the relationship between co-expression modules and DEGs. We plotted a PPI network with 320 edges and 219 nodes in Figure 4C. In addition, the top four hub genes with MCC scores in the PPI were calculated using the CytoHubba plugin. These genes are PRKDC, RFC5, RFC3 and RBM14 ( Figure 4D). All four essential genes were highly expressed in patients with poor post-transplant renal function ( Figure 5).
To further clarify the effect of the above genes on renal function after kidney transplantation, we collected clinical data from 122 kidney transplant patients at the medical centre. We examined the expression levels of the above genes. The clinical baseline data for these patients are shown in Table 1    can give rise to increased risk of acute and chronic rejection, prolonged hospitalization and graft failure. 19 In this study, we analysed 192 patients in the GEO data set.
Based on a comprehensive bioinformatics analysis, we obtained 690 lists of DEGs, and modules crossed by overlapping genes from the GEO data set (GSE147451). GO and KEGG analysis were performed using the function "ClusterProfiler" of R software. Immediately after, the four genes with the highest MCC scores in PPI were screened, including PRKDC, RFC5, RFC3 and RBM14.
These essential genes impact the renal functional status of patients after kidney transplantation. PRKDC, whose high expression was associated with post-transplant renal insufficiency. In the validation phase, we collected clinical data and gene expression assays from 122 kidney transplant patients in our hospital. Four independent risk factors, aGVHD, infection and PRKDC, were screened by multifactorial analysis, and a prediction model was established focusing on these genes, which showed good prediction accuracy.
Recently, gene networks have been increasingly used for bioinformatics analysis. WGCNA is a efficacy method to analyse the expected effects of multiple genes among revealed genomic data. 20 WGCNA has been applied in various biological studies to confirm therapeutic targets or candidate biomarkers, especially in tumours. [21][22][23] However, the analysis of WGCNA in transplantationrelated, especially renal functional status after kidney transplantation, is currently less studied.
In the project, we focused on the hub gene PRKDC and com-

S TRENG THS AND LIMITATIONS OF THIS S TU DY
This study found that infection and aGVHD are related to renal dys-

CO N FLI C T O F I NTER E S T S TATEM ENT
The authors declare no competing financial interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data may be obtained from a third party and are not publicly avail-