Prevalence of mixed genotype hepatitis C virus infections in the UK as determined by genotype‐specific PCR and deep sequencing

Summary The incidence of mixed genotype hepatitis C virus (HCV) infections in the UK is largely unknown. As the efficacy of direct‐acting antivirals is variable across different genotypes, treatment regimens are tailored to the infecting genotype, which may pose issues for the treatment of underlying genotypes within undiagnosed mixed genotype HCV infections. There is therefore a need to accurately diagnose mixed genotype infections prior to treatment. PCR‐based diagnostic tools were developed to screen for the occurrence of mixed genotype infections caused by the most common UK genotypes, 1a and 3, in a cohort of 506 individuals diagnosed with either of these genotypes. The overall prevalence rate of mixed infection was 3.8%; however, this rate was unevenly distributed, with 6.7% of individuals diagnosed with genotype 3 harbouring genotype 1a strains and only 0.8% of samples from genotype 1a patients harbouring genotype 3 (P < .05). Mixed infection samples consisted of a major and a minor genotype, with the latter constituting less than 21% of the total viral load and, in 67% of cases, less than 1% of the viral load. Analysis of a subset of the cohort by Illumina PCR next‐generation sequencing resulted in a much greater incidence rate than obtained by PCR. This may have occurred due to the nonquantitative nature of the technique and despite the designation of false‐positive thresholds based on negative controls.

DAA therapies for HCV infection have revolutionised treatment of the disease. These new drugs have been optimised for genotype 1 (gt1), and SVR rates can be lower for other genotypes, particularly gt3.
Consequently genotype-specific regimens may be required for effective treatment. 9 The efficacy of DAAs against mixed genotype infections has yet to be determined although several studies have hypothesized that mixed infections may only be partially resolved by therapy regimens tailored to the major infecting genotype, leading to genotype switching. 10,11 Due to morbidity issues and cost implications of repeated DAA treatments, improved diagnostics for mixed genotype infections to optimise treatment regimens are important. In this study, we sought to determine the prevalence of mixed HCV infection in a cohort of 506 HCV-positive patients from across Scotland. We focused on gt1a and gt3 which together constitute greater than 90% of the total HCV infections in the UK. Highly sensitive genotype-specific nested PCR assays were developed and used to screen for gt1a/gt3 mixed infections. The relative proportion of each genotype within the mixed infections was determined by real-time (rt)-PCR. Furthermore, we compared the genotype-specific PCR techniques with NGS for diagnosis of mixed HCV infections.

| Control transcripts
Control RNA transcripts derived from synthetic dsDNA based on a gt1a (H77, AF009606) and a gt3 (3a.GB.2005, GQ356206) sequence were used for PCR and rt-PCR optimisation, for relative quantification and as NGS fidelity controls. 3 Primers used to screen for gt1a and gt3 strains ( T A B L E 1 Primers and probes designed for this study were assessed. These controls contained 25, 50 or 100 copies/μL of the minor strain with 10 6 copies/μL of the major genotype.

| Sequence analysis
Sequences were aligned using MUSCLE (v3.8) within SSE. 16,17 Maximum likelihood phylogenetic trees were produced using MEGA

| rt-PCR
rt-PCR was performed with the TaqMan fast 7500 system using TaqMan fast reagents (Thermo Fisher Scientific). An rt-PCR targeting the 5′ UTR was used to quantify the HCV viral load 21 using a dilution series prepared from known concentrations of JFH-1 replicon transcripts. Genotype-specific rt-PCR was performed using newly designed primers and probes targeting the NS5B region (Table 1).

| Deep sequencing
In addition to the 19 mixed infection samples, 19 gt1a and 20 gt3 randomly selected samples determined by PCR screening to be monoinfected also underwent deep sequencing. Mock mixed infections were prepared from gt1a and gt3a transcript control RNA consisting of 10 3 , 10 4 or 10 5 copies/μL of the minor strain with the major genotype to give a total of 10 6 copies/μL. Aliquots of single genotype transcripts (10 6 copies/μL) were used as fidelity controls, whereas HCV-negative serum and H 2 O provided negative controls. To limit contamination risk, samples were sequenced in two runs on the basis of genotype or major genotype.
Pan-genotypic PCR primers designed targeting partial E1-E2 region ( Desktop Sequencer (Illumina). As diversity among samples was low, they were run at a relatively low cluster density with 5%-10% PhiX controls.

| Deep sequencing analysis
Deep sequencing data were analysed using an in-house Unix-based pipeline. Low-quality reads were identified by FastQC and sequencing adapters removed using Trim Galore. An in-house bioinformatics programme was developed for genotyping HCV using high-throughput sequences. This comparative genotype assignment method took 37 bp k-mers from sequence reads and compared them against a list of reference genotype-specific k-mers of the same length. Although a longer k-mer could improve genotyping, they increase the risk of mismatches and require more computing power. Based on the breadth and depth of k-mer coverage, genotypes were assigned to samples.
The best references to map sequence reads were selected based on the genotyping programme, and reads were mapped to reference genomes using Tanoti. 22 Consensus sequences were generated and compared with Sanger-sequenced reads and reference genomes by phylogenetic analysis.

| Statistical analyses
Differences in the means and distributions of data were compared using the independent samples t test function. The significance of differences in the distribution of categorical data was determined using Chi-squared tests.

| Patient characteristics
Mixed gt1a/gt3 HCV infections were detected in samples collected from individuals residing throughout Scotland, indicating that no geographical region was particularly associated with mixed infections.
The average age was comparable (P = .84) between individuals with mixed infections (40.6 ± 9.7 years) and mono-infected individuals Detailed clinical data were available for 4 individuals with mixed genotype infections. Liver disease was recorded for three of these subjects, two with cirrhosis and one with fibrosis. All four patients had a history of psychiatric disorders. Three of the patients had received treatment for HCV infection (treatment type unknown) and of these, one had not yet completed treatment and two had failed to achieve an SVR.

| PCR analysis
PCR assay sensitivity was evaluated using known dilutions of the transcript controls tested in batches of 8 replicates with a negative control.
Results were converted into probit values 14,23 and using these values, a 90% detection rate of 9 (gt1a) and 21 (gt3) copies/μL of RNA were calculated. The specificity of the PCRs was confirmed by amplification and sequencing of the minor transcripts within mock mixed infections.
A total of 506 patients diagnosed with gt1a or gt3 infection were screened for the presence of mixed gt1a/gt3 infection. Overall

| Quantification by rt-PCR
Genotype-specific rt-PCR assays were developed to quantify the relative proportions of gt1a and gt3 present in samples with mixed genotypes. Consistent detection of less than 10 copies/μL of RNA was F I G U R E 1 Maximum likelihood phylogenetic trees of A, gt1a sequences (red) and B, gt3 sequences (blue) obtained from samples with mixed genotype HCV infections. Reference sequences are shown in black. Sequences obtained from individuals diagnosed with the opposite genotype are shown in boxes. Bootstrap support of >70% after 1000 replicates is shown observed for both gt1a and gt3 transcript controls whilst sample-free controls were negative.
All 20 mixed genotype samples were assayed by genotype-specific rt-PCR to determine individual viral loads, and positive rt-PCR results for both genotypes were obtained for 15 samples. In each sample, there was a major and a minor genotype, the latter constituting less than 21% of the combined viral load (Table 2) and in 10 of 15 samples constituting less than 1% of the combined viral load. Most (14 of 15) of the major genotypes have correlated with the clinical diagnosis of the patient. The exception was sample G3-85 that was clinically diagnosed as a gt3 infection; however, the major genotype by rt-PCR was gt1a which constituted 99.6% of the combined viral load. In the remaining gt1a-diagnosed individuals with mixed infections (n = 2), the minor gt3 strains comprised 0.56% and 6.91% of the total viral load.
In gt3-verified samples, the minor gt1a strains comprised 0.01%-21% of the combined viral load. The limit of detection of the PCR assays for samples with mixed genotypes was less than 58 IU/mL.

| Analysis of mixed genotype infections by PCR-NGS
The pan-genotypic primers for the E1/E2 region (Table 1) were validated against an in-house panel of 64 previously typed HCV samples spanning nine subgenotypes from five genotypes (Table S1). These primers were subsequently used for the PCR-NGS amplification.

| DISCUSSION
In this study, a genotype-specific nested PCR targeting the E1-E2 region was developed and used to screen HCV-positive samples for the presence of mixed gt1a/gt3 infections. The mixed genotype infection prevalence rate in a cohort of 506 HCV-positive individuals from Scotland previously diagnosed with either gt1a or gt3 infection was 3.8%. The nested PCR assay proved to be both sensitive and highly specific at the subgenotype level, capable of detecting low-level secondary infecting genotypes in a high background of the major genotype. The E1-E2 region is infrequently used for mixed HCV genotype screening except as part of a fragment >1000 bp in length, 25 with the 5′ UTR and core regions being favoured. Whilst these regions can be used effectively for genotyping, diversity restriction means they are not always suitable for subtyping viral strains. 26 The relatively short E1-E2 region targeted in this study was highly discriminatory for genotyping and subgenotyping, providing more information than current clinical testing protocols.
The rate of mixed HCV genotype infections identified by PCR in our cohort (3.8%) is similar to the low prevalence rates observed in studies with large cohort sizes. 5,27 Studies involving smaller cohorts 1,28,29 tend to have greater rates of HCV coinfection prevalence.
The stringent focus on gt1a and gt3 may have contributed to the low prevalence rates observed. Gt1a and gt3 are the most common genotypes within the UK and are estimated to be responsible for 90% of all HCV infections. 30  The genotype-specific rt-PCR assay was less efficacious than the nested PCR for the mixed infection samples, quantifying only 75% of the minor genotypes. The use of a non-nested protocol may explain the reduced sensitivity of the rt-PCR protocol compared to the screening assay; however, the assays displayed similar sensitivities with control samples. Alternatively, the rt-PCR specificity at the subgenotype level may account for the discrepancies. It is notable, however, that most of the samples where the minor genotype was not detected by rt-PCR were of older origin and had undergone several freeze-thaw cycles following PCR screening, potentially affecting RNA yield.
Results from the genotype-specific rt-PCR assay indicated the HCV population structure comprised a major and a minor genotype.
A significantly greater rate of mixed infection was determined in individuals diagnosed with gt3 than in patients diagnosed with gt1a.
The disproportionate rate of individuals clinically diagnosed with gt3 infections with mixed infections could suggest a difference in sensitivity between the two genotypic rt-PCR assays; however, this was not apparent when quantifying the transcript controls in mock mixed and as 28.2% of individuals in our study had been previously treated without achieving an SVR, there may have been partial resolution of coinfecting genotypes 11,43 in some treated individuals which has disproportionally resolved gt3 minor strains. The rate of infection among drug users who are already anti-HCV positive is lower than individuals previously unexposed, suggesting there may be some form of partial immunity, 44,45 and it is possible that some genotypes confer a broader cross-protective immunity than others. Minority HCV strains within a superinfection may survive by replicating within extrahepatic sites, 46 and there may be genotypical differences in ability to adapt for survival in these regions. The reasons for the disparity in coinfecting rates of the genotypes are likely highly complex and involve a combination of genotype-specific host response 31 and viral competition. The number of reads for each genotype was normalised to the total number of E1-E2 reads. c Gt1a or Gt3a clonal transcripts with spiked-in controls of the opposite genotype constituting 10% (H), 1% (M) or 0.1% (L) of the total amount (10 6 copies/ μL). d Sera derived from individuals tested negative for HCV. Normalisation was performed against the total number of reads.

T A B L E 3 (Continued)
F I G U R E 2 Maximum likelihood phylogenetic tree comparing A, gt1a sequences (red) and B, gt3 sequences (blue) sequences of mixed infection samples obtained by Sanger sequencing and consensus sequences from deep sequencing. Labels with the suffix.ds are the NGS consensus sequences. Control strains are underlined, and samples without a corresponding consensus sequence from deep sequencing are shown in red. Sequences from the same sample that cluster are highlighted in yellow, and those sequence pairs not clustering are highlighted in grey. Circles denote major strains (as determined by rt-PCR). Bootstrap support of >70% after 1000 replicates is shown A pan-genotypic primer set developed for the PCR-NGS proved to be highly effective at amplifying and typing gt1, gt2, gt3 and gt4 strains and one gt6a isolate at the subgenotype level. Despite this, no genotypes other than gt1a and gt3 were detected with the exception of assumed contaminant gt2 strains which were highly similar to a replicon strain used locally. To ascertain the suitability of this assay for clinical diagnostics, it would be necessary to test the primers against gt5 and gt7 as well as more gt3 and gt6 subtypes. Data collected during assay optimisation suggested that the primers could function effectively with at least two known mismatches.
PCR-based deep sequencing was selected in preference to metagenomic methods as the low ratio of HCV to human RNA affects the sensitivity of the latter method. HCV has been detected in clinical samples by metagenomic methods at levels as low as 2000 IU/mL 47 ; however, this is still substantially less sensitive than would be required to detect most minor strains we identi- can be distorted by margins of up to 100-fold relative to the true frequency. 48,49 For the sample subset that was deep sequenced, the percentage of mixed infections were similar between the methods and, in comparison with detection by either PCR or NGS which we designated as the gold standard, the individual methods had similar sensitivities.
However, an estimate of the expected percentage in the original sample set suggested a mixed infection prevalence rate of 24.4% by PCR-NGS, much greater than the rate calculated from PCR analysis (3.8%).
A key issue in interpreting NGS data for viral diagnostics lies in defining background contamination. We applied a false-positive threshold based on background reads obtained in negative controls. However, there is uncertainty as to the reliability of such methods for PCR-NGS which has been shown to be poorly quantitative, in this study by inaccurate read proportions obtained from mock mixed infections, and elsewhere. [47][48][49] Reduction in PCR cycles may improve the quantitative aspect of PCR-NGS.
In conclusion, the prevalence rate of mixed infection in this UK cohort of 506 individuals by PCR was 3.8%, with gt3 as the major genotype in most samples. The mixed infection rate obtained from PCR-NGS data was much higher; however, interpretation is hampered by the designation of false-positive thresholds with a technique that is poorly quantitative.