Review article: the human intestinal virome in health and disease.

Summary Background The human virome consists of animal‐cell viruses causing transient infections, bacteriophage (phage) predators of bacteria and archaea, endogenous retroviruses and viruses causing persistent and latent infections. High‐throughput, inexpensive, sensitive sequencing methods and metagenomics now make it possible to study the contribution dsDNA, ssDNA and RNA virus‐like particles make to the human virome, and in particular the intestinal virome. Aim To review and evaluate the pioneering studies that have attempted to characterise the human virome and generated an increased interest in understanding how the intestinal virome might contribute to maintaining health, and the pathogenesis of chronic diseases. Methods Relevant virome‐related articles were selected for review following extensive language‐ and date‐unrestricted, electronic searches of the literature. Results The human intestinal virome is personalised and stable, and dominated by phages. It develops soon after birth in parallel with prokaryotic communities of the microbiota, becoming established during the first few years of life. By infecting specific populations of bacteria, phages can alter microbiota structure by killing host cells or altering their phenotype, enabling phages to contribute to maintaining intestinal homeostasis or microbial imbalance (dysbiosis), and the development of chronic infectious and autoimmune diseases including HIV infection and Crohn's disease, respectively. Conclusions Our understanding of the intestinal virome is fragmented and requires standardised methods for virus isolation and sequencing to provide a more complete picture of the virome, which is key to explaining the basis of virome‐disease associations, and how enteric viruses can contribute to disease aetiologies and be rationalised as targets for interventions.


| THE HUMAN VIROME
Viruses are thought to be the most abundant and diverse entities on Earth, numbering as many as 10 31 virus-like particles (VLPs), 1 though this paradigm is likely to change in light of information from large-scale sequencing surveys of marine environments and improved analytical tools. 2 With the advent of new, sequence-based technologies that do not rely on the ability to isolate viruses for their identification, it is now possible to define and characterise viruses in different environmental samples in greater detail than ever before, which has resulted in an increased interest in the role the viral assemblage of the human gut microbiota plays in health and disease. The following reviews our current knowledge on the human intestinal virome.
A virome comprises all the nucleic acids (ssDNA, dsDNA, ssRNA and dsRNA) belonging to the VLPs associated with a particular ecosystem. The human virome is a genetically complex component of the microbiome, with the blood, nose, skin, conjunctiva, mouth, vagina, lungs and gastrointestinal (GI) tract harbouring their own distinct virus assemblages (Table S1). The genetic content of VLPs comprising bacteriophages (phages) that infect bacteria and archaea and, to a much lesser extent, human-, plant-, amoebae-and animal-infecting viruses found along the GI tract constitute the human intestinal virome (Figure 1).

| BACTE RIOPH AGES
Phages can be lytic or lysogenic. Lytic (virulent) phages bind to specific host-cell receptors, then penetrate and infect their host cell to hijack its replication and translation machinery to produce virions; once sufficient virions have accumulated (tens to thousands, depending on the phage), the lytic enzymes they produce cause the host cell to lyse, releasing the virions into the surrounding environment where they can infect new host cells. Lytic phages can have narrow or broad host ranges, infecting only one strain of a prokaryote or multiple species of closely related prokaryotes.
These entities have been used in the past to treat infections, and have recently been revisited as alternatives to antibiotic therapies for a range of infections and as means of controlling food-contaminating bacteria such as Listeria spp. 3 Their gene products are also of interest in biotechnological and medical applications. 4  The mammalian intestinal virome comprises viruses that infect eukaryotic and prokaryotic cells. It is established soon after birth and is dominated by viruses that infect bacteria (ie, phages). The virome establishes a mutualistic relationship with eukaryotes/prokaryotes, contributing to intestinal homeostasis by influencing microbial ecology and host immunity. Composition of the virome is influenced by numerous factors that affect viruses directly (infection) or change host-cell populations (eg, antibiotics, diet). Members of the virome may contribute to the pathogenesis of certain diseases via microbial host lysis leading to dysbiosis, infection of epithelial cells, and/or translocation of the compromised or damaged mucosal barrier to gain access to underlying tissues and immune cells, leading to immune activation. Dysbiosis can be defined as a microbial imbalance or any changes to the composition of resident microbial communities relative to the community found in healthy individuals. [101][102][103] Virome association with certain disease states is characterised by changes in diversity, and predominance of specific virotypes (eg, members of the order Caudovirales in IBD) (temperate) phages do not kill their host but instead integrate into their host's genome without interfering with its replication, incorporating the phage into its genome as a prophage that is transmitted to its progeny at each cell division. Lysogenic phages can be converted to a lytic cycle in response to environmental stressors (eg, antibiotics).
Phage-host interactions influence host and viral evolution. The ability of phages to transfer genes from one prokaryotic host to another can lead to increased diversification of viral species, and increased antibiotic resistance and/or induction of toxins or virulence factors in prokaryotes. 5 Some phages alter the antigenicity of their hosts by producing enzymes that modify the O-antigen component of lipopolysaccharides. 6 Modification of surface structures of prokaryotes has the potential to affect microbial interactions with the human host, and influence niche specialisation within the GI tract. 7 Presence of clustered regularly interspaced short palindromic repeats (CRISPRs) confers upon prokaryotes resistance to phage infection and contributes to prokaryotic adaptive immunity. Analyses of metagenomic sequence data provide detailed information on phage-host and phage-phage competition within the human faecal microbiome, implying CRISPR spacers are actively and continuously acquired by prokaryotes in response to the presence of phages in the GI tract. 8 The potential effects of such phage-host interactions on microbiota composition/function or host health are unknown.

INTESTINAL VIROME
Viruses do not encode universally conserved genes such as the 16S or 18S rRNA genes of prokaryotes and eukaryotes, respectively, and are genetically highly diverse. Consequently, it is not possible to use metataxonomic approaches such as 16S rRNA gene sequencing to characterise VLPs within ecosystems. Traditionally, classical approaches-mainly microscopy and cultivation-have been relied upon to characterise VLPs in the human gut.
Based on transmission electron microscopy (TEM), mucosal samples contain~1.2 9 10 9 VLPs/biopsy. 9 VLPs have been detected in caecal contents at 10 6 /mL using TEM 10 with faeces harbouring 10 8 -10 9 VLPs/g wet weight. 10,11 In all GI contents examined to date by microscopy, the overwhelming majority of VLPs have been phages of the order Caudovirales, with the human GI tract estimated to harbour 10 15 phages in total. 3,9,10,12 The order Caudovirales encompasses most known phages (Figure 2), comprising the families

INTESTIN AL VIROME
The only reliable molecular method currently available for routine surveys of the human virome is metagenomics. Metagenomics is a culture-independent, molecular-based approach that allows functional and sequence-based analyses of the collective microbial genomes contained in an environmental sample, providing a powerful approach for exploring the ecology of complex microbial communities. 14 It has been used to examine prokaryotic communities associated with the human gut microbiome in health and disease, [15][16][17][18] and to examine viromes associated with different regions of the human body (Table S1).
A protocol involving homogenisation of faeces in buffer, centrifugation (to remove cell debris), tangential flow filtration (TFF) (to concentrate large-volume samples and isolate VLPs), ultracentrifugation and metagenomic reconstruction was used to characterise the first human faecal virome. 19 However, recent improvements in recovery methods give the potential to characterise human-associated VLP assemblages at the molecular level in greater detail than ever before (  Sequences associated with this phage were not found in American gut-associated datasets (metagenome or virome), suggesting geographical variation in distribution of gut-associated phages. 7 Using a similar metagenomic mining approach but with major capsid protein VP1 specific for the ssDNA phage family Microviridae, members of the subfamilies Gokushovirinae, Alpnavirinae and Pichovirinae were detected in faecal metagenomes. 24 By using phage genome signature-based recovery with metagenomes, several (predominantly temperate) potential gut-specific phages infecting members of the order Bacteroidales were identified. 21 This alignment-free approach was able to resolve phage sequences not readily detected by conventional alignment-driven approaches. The main limitation of this method is the lack of available genome sequences from isolated phages, as "driver sequences" are required to provide baseline information upon which inferences can be made regarding sequence data and host phylogeny (only four such "driver sequences" were required in the Bacteroidales study). 21 Using co-occurrence profiling, crAssphage was identified in publicly available metagenomes, and by CRISPR analysis and co-occurrence profiling predicted to infect Bacteroides or Prevotella spp. 22  Assembly of sequence data from four deep-sequenced DNA viromes (8.1 Gb sequence data in total) from two healthy individuals led to recovery of 4301 contigs, representing 72 complete phage genomes plus potentially complete and partial phage genomes. 28  mapped to 160 contigs. A consortium of 155 phages contributed to the "healthy gut phageome" (HGP), though only 23 phages were present in >50% of samples ("core" phages, including crAssphage and nine other complete genomes), and 132 were present in 20%-50% of samples ("common" phages); 1679 were present in 2%-19% ("low overlap" phages). The HGP represented only 4% of the total phage community, with the expectation that the HGP will increase in size by analysing additional healthy individuals using deep sequencing. 28 Although it has been proposed the HGP may play a role in maintaining and possibly restoring a dysbiotic microbiota, 30 (Table S1). Later work on oral and faecal viromes of 20 individuals over a 6-month period confirmed this, demonstrating phages were persistent in these viromes and readily shared among related and unrelated members of the same household. 38 Sharing of phages encoding virulence or antibiotic-resistance genes has implications for shaping of microbiomes of those in close contact with one another, and warrants further study. With respect to antibiotic-resistance genes, their presence may have been over-estimated in human viromes 11,20,27,31,35,37 as these entities are thought to be rarely encoded in phages. 39 Consequently, care should be taken when analysing data: proper in silico checks with conservative cut-offs are required to refine functional assignments and avoid over-interpretation of data. 39  and geminiviruses were detected in at least one twin of each pair between months 3 and 24 of the study. 46 Anelloviruses, associated with host immune status, were most prevalent from 3 months of age and highly divergent from known anelloviruses, with their abundance peaking at 6-12 months of age. The same anelloviruses were detected in faecal samples of the same infant collected 12 months apart, suggesting a persistent or stable source of recurring infection. 46 It was speculated the increase in anelloviruses between 6 46 The initially high phage population is unsustainable because of low numbers of bacteria colonising the GI tract.
Consequently, the phage assemblage shrinks in size and diversity, relieving pressure on the bacterial community and allowing it to establish and colonise the gut.

FACTORS ON DEVELOPMENT OF THE INFANT VIR OME
Differences in birth mode (Caesarean vs vaginal delivery) and diet (breast-vs formula-feeding; age at which weaning began) were confounders in the study of Lim et al. 46 51,54 Circulating phages have also been detected in blood samples after oral administration of antibiotics to patients with bacterial infections, whereas no phages could be detected in their blood prior to antibiotic therapy. 51 It is not known if these phages were of GI origin. The presence of phages in the peripheral blood has been termed "phagemia". 51 However, whether the blood contains its own "baseline" VLP population to which the GI virome can contribute in disease is unknown.  in colonic biopsies than healthy individuals (n = 10) (2.9 9 10 9 vs 1.2 9 10 8 VLPs/biopsy), and for the CD patients ulcerated mucosa had significantly fewer VLPs than nonulcerated mucosa (2.1 9 10 9 vs 4.1 9 10 9 VLPs/biopsy). 9 More recent, small-scale studies have been undertaken to characterise the gut virome of IBD. 29,62-64 A pilot metagenomic study of ileal and colonic contents of six CD patients and ileal samples from six non-IBD controls showed CD DNA-based viromes had higher phage abundance than control samples. 62 However, the ways in which virome data from healthy controls (n = 8) and patients with ileocolic CD (n = 11) were analysed were found to contribute to interpretation of results. 63 Prophages represented the most hits in the CD samples when unassembled reads were compared, but were higher in the control group when the assembled data were compared. Analysis based on assembled data showed fewer differences between the CD and control group.

Human-associated viruses and intact phages and
Nonrarefied data were used to generate estimators of species richness: the viromes of CD patients were less diverse than the healthy patients, but showed greater heterogeneity across samples. The CD and control samples could not be fully separated into two groups based on VLP composition and abundance.
A later DNA-based study showed differences in CD patients related to disease status (newly diagnosed, active onset, active presurgery) and therapy. 65 Similar to their previous study, individual variability and sample origin had a greater effect on the virome than CD, although more over-represented viruses were found in the CD viromes than the healthy, and newly diagnosed patients had higher diversity in faecal and biopsy viromes than those with active disease.
Patients on steroids and/or immunosuppressors had lower diversity than untreated patients, while those on immunosuppressors only had lower diversity than those on combination therapy or steroids only. The study benefitted from recruiting CD and ulcerative colitis (UC) patients and their family members in the UK (Cambridge) and USA (Los Angeles), thereby controlling for household factors that may influence the microbiome. 29 An increase in phage-associated richness (predominantly Caudovirales and Microviridae) was seen in IBD faecal viromes compared with those of controls. Initial findings were confirmed using two independent and geographically distinct US (Boston, Chicago) patient cohorts with matched controls. 29 It was suggested decreases in bacterial richness concomitant with increases in phage richness may be due to predator-prey dynamics, but it was unclear how these contribute to the pathogenesis of IBD.
Also proposed was the virome as a target for therapeutic modulation. 29

| Isolation-kit contaminants
Few studies of the human virome have considered the effects of contaminants in reagents and nucleic acid extraction kits on viral metagenomes (Table S1), though these are being increasingly recognised in prokaryote-focused studies. 26,69,70 Nucleases, proteases and polymerases used in studies are produced in protein expression systems, and sequences associated with these expression vectors have been detected in virome studies. 70 Columns used to purify DNA may introduce parvovirus-like, circoviruses/densoviruses and iridoviruses sequences into samples, and it has been suggested their use should be avoided in virome studies. 26,70 Efforts should be made to prevent cross-contamination of samples when processing samples for nucleic acid extraction and sequencing. 26,70 Low-biomass samples obtained from non-GI sites will be reliant on PCR-based approaches for the foreseeable future, and these are most likely to be affected by contaminants in reagents. Therefore, appropriate negative controls should be included in sequencing studies, to allow the identification of sample-associated sequences rather than those from contaminants.

| Combining virus isolation and metagenomics
Those studies targeting only the VLPs of the microbiota will not capture prophage diversity within samples, and will contain little information to allow studies of phage-host interactions or host ranges, limiting in-depth analyses of the overall ecology of the gut microbiome. 21 Studying total community metagenomes in conjunction with VLP-derived metagenomes will provide a more-complete picture of virus-host interactions in the human gut. 21 The greatest challenge to integrated studies of the whole gut microbiome is  71 It has been stated that isolating "just a few phage genomes from novel environments will greatly increase our understanding of viral diversity in these environments". 72 (2) there are a number of medical conditions speculated to be linked with viral infections (eg, type I diabetes, chronic fatigue syndrome, obesity) but for which no infectious agent has been found. It has been estimated over half of humaninfecting viruses remain to be discovered. 76  Interactions with pets, insects and wild animals will also influence the composition of the human virome. From a pathogen perspective, bush-workers, abattoir workers and individuals exposed to insect bites have been highlighted as likely sources of new humaninfecting viruses. 75 With the advent of high-throughput, inexpensive, sensitive sequencing methods, it has become possible to study the contribution of dsDNA, ssDNA and RNA VLPs to the human virome. Gene transfer agents, which appear to represent defective phages, have not been examined in the context of the human microbiome or virome, but may also make a contribution to its genetic content. 77,78 Similarly, exosomes-extracellular vesicles of 40-100 nm in diameter consisting of proteins, lipids, miRNA, mRNA and DNA-derived from cells lining the GI epithelium are likely to contribute to nucleic acids found in viromes. 79,80 Outer-membrane vesicles, the equivalent (in size and composition) of eukaryotic exosomes for Gram-negative bacteria, may also contribute to what we currently call the virome, as these vesicles have recently been reported to contain DNA as well as RNA. 81

| Virome-specific bioinformatics tools
Whereas 16S rRNA gene sequence data are readily processed and analysed using packages such as QIIME and Mothur, no such tools exist for the analyses of sequence data derived from host-associated viromes. 35 Tools exist for functional annotation of viromes and estimating viral diversity (Table 3). No easy-to-use pipeline that takes raw reads, strips out host DNA, looks for bacterial contaminants then assigns taxonomy and functionality to prokaryote and eukaryote viruses within samples exists, though efforts are being made to generate such tools. These will need to take into account the presence of human endogenous retroviruses, which comprise~8% of the human genome and still possess the ability to encode retrovirus polymerase and envelope protein and can be reactivated by exogenous retroviruses such as HIV. 53

ACKNOWLEDG EMENTS
The authors would like to thank Sam Carding for creating Figure 1.
Declaration of personal interest: None.