“Protein aggregates” contain RNA and DNA, entrapped by misfolded proteins but largely rescued by slowing translational elongation

Abstract All neurodegenerative diseases feature aggregates, which usually contain disease‐specific diagnostic proteins; non‐protein constituents, however, have rarely been explored. Aggregates from SY5Y‐APPSw neuroblastoma, a cell model of familial Alzheimer's disease, were crosslinked and sequences of linked peptides identified. We constructed a normalized “contactome” comprising 11 subnetworks, centered on 24 high‐connectivity hubs. Remarkably, all 24 are nucleic acid‐binding proteins. This led us to isolate and sequence RNA and DNA from Alzheimer's and control aggregates. RNA fragments were mapped to the human genome by RNA‐seq and DNA by ChIP‐seq. Nearly all aggregate RNA sequences mapped to specific genes, whereas DNA fragments were predominantly intergenic. These nucleic acid mappings are all significantly nonrandom, making an artifactual origin extremely unlikely. RNA (mostly cytoplasmic) exceeded DNA (chiefly nuclear) by twofold to fivefold. RNA fragments recovered from AD tissue were ~1.5‐to 2.5‐fold more abundant than those recovered from control tissue, similar to the increase in protein. Aggregate abundances of specific RNA sequences were strikingly differential between cultured SY5Y‐APPSw glioblastoma cells expressing APOE3 vs. APOE4, consistent with APOE4 competition for E‐box/CLEAR motifs. We identified many G‐quadruplex and viral sequences within RNA and DNA of aggregates, suggesting that sequestration of viral genomes may have driven the evolution of disordered nucleic acid‐binding proteins. After RNA‐interference knockdown of the translational‐procession factor EEF2 to suppress translation in SY5Y‐APPSw cells, the RNA content of aggregates declined by >90%, while reducing protein content by only 30% and altering DNA content by ≤10%. This implies that cotranslational misfolding of nascent proteins may ensnare polysomes into aggregates, accounting for most of their RNA content.

We recently developed improved click-chemistry crosslinking reagents and analytical software to identify adjacent proteins in aggregates, based on peptide-peptide crosslinking, and we applied it to define the protein-adherence network, or "contactome", of aggregates. We began with total, sarkosyl-insoluble aggregates isolated from SY5Y-APP Sw human neuroblastoma cells (Ayyadevara et al., 2017), a model of familial Alzheimer's disease (fAD). This work revealed a complex, non-random structure of aggregates in which megahubs (very-high-connectivity hubs with ≥100 partners) and hub connectors (low-connectivity proteins linking large hubs) contribute functionally to the assembly of large aggregates (Balasubramaniam et al., 2019). We noted marked enrichment among megahubs for large structural proteins such as titin, ankyrins 1 -3, nesprins 1 -3, MAP1A, and other neurofilament proteins, purely as a consequence of their size. We also observed significant enrichment for a variety of nucleic acid-binding proteins (Balasubramaniam et al., 2019).

| The aggregate interactome
To compensate for protein size variation, we reassessed the aggregate interactome with normalization for protein length. The intraaggregate contactome, based on length-normalized connectivity (interaction number per residue), fell into 11 clusters comprising 24 "central hubs" (Figure 1; hubs with >4 edges, indicated by red circles). Four "hub connectors" of low degree (≤4 edges; green circles) bring together large hubs not otherwise connected. Remarkably, all 24 central hubs and 2 of 4 hub connectors are nucleic acid-binding proteins (Figure 1), revealing a striking enrichment for proteins that bind RNA (N = 16; p < 3E-150), or bind DNA (N = 6; p < 2E-20), or both (N = 2). This supports and extends our earlier observation that nucleic acid-binding proteins are especially susceptible to aggregation (Balasubramaniam et al., 2019) and led us to inquire whether their targets, RNA and DNA, might also be present in the entities we call "protein aggregates".

| Quantitation of aggregate nucleic acids from AD vs. control hippocampus, or human glioma cells
We isolated total sarkosyl-insoluble aggregates from hippocampal tissue of individuals diagnosed with Alzheimer's disease (AD) and confirmed by histopathological markers, and from age-matched controls (AMC) without dementia or AD-diagnostic markers (amyloid deposits or hyperphosphorylated tau). From equal initial weights of hippocampus, quantified recoveries of nucleic acids increased in AD aggregates, over those in controls, by 1.5-to 2-fold for DNA, and ~twofold for RNA (Figure 2A,B). These elevations did not differ significantly from the difference in protein content of total aggregates, which was ~60% higher in AD than in controls, in close agreement with previous results (Ayyadevara, Balasubramaniam, Parcon, et al., 2016). Among normal controls, there was fourfold to sixfold more RNA than DNA in total sarkosyl-insoluble aggregates (p < 1E-5), regardless of the methods used for separation and quantitation (see Experimental Procedures). For AD samples, nucleic acid recoveries were higher and more variable, with roughly twice as much RNA as DNA ( Figure 2B).
Apolipoprotein E (APOE) gene alleles are the leading genetic risk factors for AD, with at least fourfold increased AD risk for each APOE ε4 allele (abbreviated APOE4, ε4, or E4), and increased severity of aggregate-associated neuropathology for AD carriers of APOE4 alleles (Neu et al., 2017;Parker et al., 2005). We recently reported that the concerted transcription of autophagy genes is disrupted in the human glioblastoma cell line T98G, when overexpressing an APOE4 transgene rather than APOE3 (Parcon et al., 2018). To assess whether the greater nucleic acid content of aggregates in AD vs. AMC hippocampi may reflect the disruption of autophagy in AD, we separately analyzed aggregates from T98G cells that overexpress implies that cotranslational misfolding of nascent proteins may ensnare polysomes into aggregates, accounting for most of their RNA content.

K E Y W O R D S
aggregation, Alzheimer's disease, apolipoprotein E, beta amyloid, cotranslational misfolding, DNA, endogenous viruses, functional annotation, gene ontology, neurodegeneration, nucleic acid sequence, nucleic acids, protein aggregates, proteomics, retrotransposons, RNA F I G U R E 1 The "aggregate contactome" of proteins isolated from SY5Y-APP Sw human neuroblastoma cells, an in vitro model of familial AD. The contactome was generated from proteomic data for cross-linked peptide pairs in sarkosyl-insoluble aggregates, using a modified version of X-link Identifier (Balasubramaniam et al., 2019;Du et al., 2011), requiring ≥10 spectral hits per protein observed in at least 2 of 3 replicate crosslinking experiments. Hits were normalized to hub length (amino acids in the most abundant isoform). Red circles highlight central hubs with 5 or more large-hub interactors; green circles show smaller hub-connectors, which join major hubs not otherwise connected. Other proteins of interest are indicated by dashed gray circles with the APOE4 allele overexpressed.
Most neuropathic aggregates are cytoplasmic, but may also be nuclear or extracellular. When we separated nuclei from cytoplasm of T98G cells prior to aggregate isolation, similar amounts of aggregate protein were recovered from each fraction. However, nuclear aggregates contained mostly DNA and only 40% as much RNA, while cytoplasmic aggregates contained ~10-fold more RNA than DNA ( Figure 2D).

| Sequencing data for aggregate nucleic acids
To assess whether DNA and RNA fragments in aggregates are a random sampling from the genome and transcriptome, respectively, these nucleic acids were separately extracted from pooled aggregate preparations from either AD or age-matched control (AMC) individuals (APOE ε3/ε4 heterozygotes; 3 subjects per group), and their sequences determined (UT Southwestern Genomics Core, Dallas TX). DNA fragments were analyzed using a ChIP-seq protocol, suited to detection of site specificity, and were then mapped to the human genome. RNA fragments underwent a dual screening, comprising a test of peak significance (similar to ChIP-seq) followed by RNA-seq analysis of differential abundance, and mapping to the human genome.
In both the direction and magnitude of the RNA-abundance shift, the influence of Alzheimer's disease was less consistent and so appeared less pronounced on average, than that of the APOE allele. This is almost certainly due to genetic and environmental variance among AD and AMC subjects (Ayyadevara, Balasubramaniam, Parcon, et al., 2016), in contrast to the single transgene that distinguishes T98G/ E3 from T98G/E4 cells. Because all human subjects considered in the present comparison were APOE3/E4 heterozygotes, the AD effect could not have arisen from a difference in APOE genotypes. The prevailing reduction in RNA content of E4 aggregates, for the most differentially expressed genes, may reflect transcriptional suppression of TFEB targets by APOE4 (Parcon et al., 2018), rather than an impact of the APOE allele on aggregation per se.
Mapping the RNA transcripts to the human genome revealed a remarkable cluster of at least 20 intergenic loci in a relatively silent segment (21p11.2-21p12) of the chromosome 21 short arm (Supplementary Figure S1). While these loci are not differentially represented for the most part, either between AD and AMC or between APOE3 and APOE4, they include 2 loci with the highest AD/ AMC ratios we observed, 9.1 and 17.2 (each Chi 2 p < 10 -6 , Table 2).

| Annotation enrichment meta-analysis of RNA fragments in aggregates
Although we had expected the RNA fragments embedded in aggregates to comprise a random selection from the transcriptome, gene ontology and pathway term enrichment analysis (functional-  (Table 3B).
The very existence of these clusters, and their marked overlap between meta-analyses derived from gene lists of very different origin (aggregates from cultured glioblastoma cells vs. human hippocampi) despite only 11 common members, suggests that the underlying aggregate-RNA fragments are strikingly nonrandom in nature. The specific annotation terms that were most enriched (Table 3C) are likely to reflect the nature of proteins that coalesce in AD and ADmodel aggregates, which include terms (fold enrichment) such as protein refolding (70) (16), and neurodegeneration (9).

| What mechanisms account for RNA and DNA fragments co-aggregating with proteins?
What does the inclusion of DNA and RNA fragments imply about aggregates or the mechanism of aggregation? Clearly, there are proteins in aggregates that evolved to bind both nucleic acids and proteins. RNA assumes many transient structures constrained chiefly by its duplex regions, which form A-helices. DNA, in addition to its repertoire of relatively stable structures (A-, B-and Z-duplex helices, triplex, and G-quadruplex forms), in the course of replication and transcription can adopt as wide a range of single-stranded structures as RNA. Affinity for nucleic acids, as well as the protein constituents of multimeric RNA-and DNA-binding complexes, may require protein structures that are at least partially disordered (Zhang et al., 2013) and/or are highly polar, which in turn may favor aggregation (Babu, 2016;Kovacech et al., 2010). DNA-binding proteins include histones; high-mobility-group (HMG) proteins; constituents of DNA replication, transcription, and repair complexes (e.g. topoisomerases, helicases and polymerases; transcription factors, co-factors, and repressors); and proteins that stabilize or remodel chromatin ( Figure 1) (Li et al., 2006;Mitchell & Tjian, 1989;Stoyanova et al., 2009;Wade, 2001). A key feature shared by many DNA-binding proteins, in addition to structural instability, is an excess of positively charged residues-allowing formation of electrostatic bonds to the negatively charged phosphates that link DNA-backbone sugars.
G-quadruplex binding proteins (G4BPs) could be responsible for the presence of certain DNA and RNA segments in aggregates. Of the 39 DNA loci listed in Table 1, 13 (33%) had predicted G-quadruplexforming sequences at >100-fold higher likelihood than predicted at random, whereas 18 (46%) were <20-fold above random expectation (Supplementary Table S4 and Figure S2A; note that numbers differ slightly due to binning). This partition into G4-rich and G4-poor regions suggests that a subset of DNA fragments may have been "recruited" into aggregates by G4BPs. A similar but less extreme split was observed for RNA fragments listed in this table: 15 of 51 peaks (30%) had ratios >100, vs. 11 (22%) with ratios <20 (Table S2 and Figure S2B). As a negative control, the C. elegans genome was screened with identical parameters, yielding zero viral reads.

|
For 5 of the 10 viruses shown in Table 4, viral RNA fragments were significantly enriched in AD over AMC (at p < 0.01 to p < 0.0001), relative to the 1.6-fold AD enrichment of aggregate proteins (Ayyadevara, Balasubramaniam, Parcon, et al., 2016), for a Note: All peaks shown were significant at p < 0.01; peak-coincident genes (at zero distance from peaks) are indicated by bold font. Read ratios (AD/AMC) and p values for AD-AMC differences were not corrected for 1.6-fold higher RNA recovery from AD relative to AMC hippocampus, since all normalized RNA-seq libraries used 1 µg RNA. "AD-agg enriched" indicates proteins that were also found to be significantly enriched in AD aggregates relative to controls (Ayyadevara, Balasubramaniam, Parcon, et al., 2016).

| Cotranslational aggregation
As noted above, RNA reads substantially exceeded DNA reads by twofold to fivefold (Figure 2A). The propensity for nucleic acidbinding proteins to be inherently disordered, suggested above as an explanation for entrapment of nucleic acids in aggregates, is not expected to differ greatly between RNA-and DNA-binding proteins. We propose another mechanism, specific to RNA, that would account for the greater abundance of RNA in aggregates: cotranslational misfolding. Among the RNAs identified in AD-model aggregates (  (Turner & Varshavsky, 2000), indicating that chaperone protection is highly fallible. We wondered whether the remarkable abundance in aggregates of diverse RNA fragments, the great majority of which contain coding sequences, might be a clue that cotranslational aggregation occurs when misfolded, nascent proteins are neither prevented from misfolding nor degraded, prior to their coalescence with other misfolded proteins to form insoluble aggregates.
If this is the case, then interventions that arrest or delay translation should sharply reduce the aggregate content of RNA fragments.
We used shRNA knockdown of EEF2 mRNA, reducing its steadystate level by 33% ( Figure 3A,B) to attenuate protein translation in SY5Y-APP Sw human neuroblastoma cells. Suppression of EEF2 has been shown to extend lifespan, reduce stress response, and improve the balance of protein quality control (Anisimova et al., 2018;David et al., 2010;Tavernarakis, 2008;Turner & Varshavsky, 2000  cells as follows. In SY5Y-APP Sw cells, shRNA targeting EEF2 eliminated over 90% of the RNA entrapped in aggregates (p < 0.0001; Figure 3C,D), far exceeding the 33% efficacy of EEF2 knockdown ( Figure 3B). At the same time, this RNAi exposure had little or no effect on aggregate DNA content ( Figure 3E,F), but reduced aggregate protein by 30% (p < 0.01; Figure 3G,H). In SY5Y-APP Sw cells treated for 4 h with MG132, a cell-permeant proteasome inhibitor, aggregates increased 20-30%; however, this rise was not accompanied by any increase in aggregate RNA fragments ( Figure S3). This suggests that the reduction in aggregate burden per se cannot account for the decline in aggregate RNA after EEF2 knockdown.

| DISCUSS ION
Pathognomonic complexes associated with neurodegenerative diseases, including Alzheimer's, Parkinson's, and Huntington's diseases, are widely termed "protein aggregates" because their diagnostic antigenic markers are proteins. Whether these aggregates also contain other components, however, is a question that has not been adequately addressed. We were aware that some amalgamations of cell debris that accumulate with aging, known as lipofuscin granules, contain a complex mixture of oxidized, glycated and carbonylated proteins, lipids, and possibly other carbohydrates; however, nucleic acids were only rarely noted among their constituents (Cindrova-Davies et al., 2018;Nowotny et al., 2014). Ginsberg et al. (1998Ginsberg et al. ( , 1999 reported that 80% of neurofibrillary tangles and 55% of senile (amyloid) plaques can be stained with acridine orange, implying the presence of RNA. Numerous studies have implicated nucleic acid binding by mammalian prion-like protein, PrP (Cordeiro et al., 2014;Gomes et al., 2012;Macedo et al., 2012;Silva et al., 2008), and the evidence that this extends to other neurodegenerative-disease seed proteins has been reviewed (Cordeiro et al., 2014).
We were led from the results of proteomic "contactome" studies, intended to define the molecular architecture of aggregates (i.e., which proteins adhere to which other proteins), to investigate the nature, extent, and specificity of nucleic acids incorporated into aggregates. In each of these three respects, the results were unexpected. We observed two-to fivefold more RNA than DNA in aggregates, whether isolated from AD or control hippocampus (Figure 2A).  Note: Values in parentheses failed to meet one or more thresholds, but are included here for purposes of comparison. AD RNA counts differ from AMC by Chi-squared test: **p<0.001; ***p<0.0001.

Many
When we compared aggregates isolated from glioblastoma cells overexpressing an APOE3 vs. APOE4 transgene, sequences with the most differential representation were quite consistently more abundant in APOE3-bearing cells. We believe this very likely reflects the surprising ability of APOE4 to enter nuclei and bind competitively to the CLEAR/E-box motifs recognized by transcription factor EB (TFEB), thereby inhibiting expression of autophagy and lysosomal genes (Parcon et al., 2018). Not surprisingly, >90% of aggregate DNA originated from nuclear aggregates, while RNA in aggregates was predominantly of cytoplasmic origin.
Only a small fraction of aggregate-associated nucleic acids (0.09 -0.15% of RNA reads, 0.33% of a smaller set of DNA reads) appears to be of viral origin, although these totals may be underestimated due to as-yet-uncatalogued and mutated viruses or endogenous retroposons (Sanjuan et al., 2010). The striking 2.3-fold enrichment of viral RNA sequences in AD aggregates relative to controls, vs. only 1.15-fold for viral DNA fragments (see Table 4), is consistent with possible roles of viral infection and/or transcriptional activation in the etiology of Alzheimer's disease (Balin & Hudson, 2018;Irish et al., 2009;Kreutz, 2002;Kristensson, 1992;Linet al., 1997;Romeo et al., 2019;Steel & Eslick, 2015). It is also possible that the observed enrichments reflect secondary effects of Alzheimer's pathology, including chronic low-grade inflammation (Majde, 2010), insofar as Nevertheless, viral RNA and DNA comprise very small fractions of the nucleic acids recovered from aggregates. From an evolutionary perspective, however, they may ultimately be responsible for the perseverance in our genomes of proteins with high levels of disorder and high probability of aggregation-provided only that disorder contributes to the ability to bind viral nucleic acids and/or to sequester them in aggregates.
The observed data are consistent with a scenario in which en- Sequences with G-quadruplex-forming potential can be recognized by their binding proteins based on singular structural features; they thus often serve as recognition sites for critical proteins with key surveillance or regulatory functions, such as telomere-binding proteins, viral-replication proteins, and gene promotor regions (Brazda et al., 2014).
The observation of consistent functional-annotation terms and clusters, both within each aggregate type and between the two sources of aggregates, confirms that the particular RNA species found in aggregates are not a random sampling from the transcriptome-but it does not explain the basis for their enrichment. We propose two routes by which nucleic acids can be incorporated into aggregates that form either as a result of aging per se or due to an age-dependent pathology such as Alzheimer's disease: (1) "hitchhiker" or "bystander" entrapment of DNA and RNA, when they are bound by proteins that become misfolded and consequently enmeshed in aggregates; and (2) cotranslational misfolding of proteins in the midst of their translation, which might be expected to also ensnare ribosomes and the mRNAs they are translating. The first mechanism is supported by the remarkably high abundance of DNAand RNA-binding proteins in the aggregate interactome ( Figure 1).
The second mechanism is most compellingly supported by the decimation (>10-fold reduction) of aggregate RNA content following shRNA knockdown of the translational procession factor EEF2.
We suspect that cotranslational aggregation occurs preferentially in pathways or processes that involve enzymes with multiple partners, and/or several nucleic acid-binding proteins-thus accounting for the highly significant enrichments observed in aggregated RNA, for genes annotated with specific clusters of descriptive terms. Note that neither of these explanations attributes a primary or causal role to nucleic acids, through which they would "drive" aggregate accrual.
Rather, they are collateral casualties due to misfolding of their attached proteins.
Why did EEF2 knockdown have a far greater effect on RNA content than protein content of aggregates? This is actually the expected result if cotranslational aggregation accounts for only a minor fraction of the protein deposition in aggregates, but is responsible for 90% of their RNA content. Nascent proteins may misfold transiently during translation, but even mature proteins can misfold over time, as a consequence of post-translational disturbances such as oxidation, phosphorylation, or alkylation, and other temperature-or timedependent processes that favor misfolding of pre-existing proteins.
Such processes would continue with little prospect of reversal, for all previously-synthesized proteins-unabated by translational arrest. RNA, however, may only appear in aggregates when it is bound by a misfolded (and hence aggregation-prone) protein, or when the RNA is in the process of translation into a nascent protein that has a high probability of transient misfolding and aggregation. Our observations imply that cotranslational aggregation is the predominant route, accounting for at least 90% of aggregate RNA.
Our data suggest that proteostasis in SY5Y-APP Sw cells, which are subjected to chronic ER stress by continual generation of Aβ 1-42 , is normally insufficient to prevent cotranslational aggregation. However, even moderate alleviation of that stress appears to shift the balance back to sustainable translational proteostasis.
Translational inhibition has been reported to lower chronic inflammation (Mazumder et al., 2010), which may be a consequence of reduced protein aggregation, augmented by a disproportionate decrease in aggregate RNA.

| CON CLUS IONS
"Protein aggregates" contain nucleic acid constituents that are highly nonrandom in sequence-making it unlikely that they are artifacts, but instead implying that they contain protein-binding features (including G-quadruplexes) that might pull them into aggregates. The number and variety of viral sequences found in aggregates suggests that there may be an evolutionary advantage (i.e., antiviral protection) to the synthesis of nucleic acid binding proteins that readily misfold and thus sequester viral genomes in aggregates. Significant enrichment of viral sequences in AD aggregates, relative to controls, is consistent with roles for integrated viruses in AD susceptibility. The preferential enrichment of RNA over DNA in aggregates may implicate a mechanism specific to transcripts: cotranslational aggregation of polysomes during initial misfolding of nascent polypeptides. This process would likely be quite sensitive to the balance between translation rate and chaperone-mediated refolding capacity. A critical role of cotranslational entrapment is supported by our observation that shRNA knockdown of the translation elongation factor EEF2, although only 25-35% effective, selectively eliminates at least 90% of RNA in aggregates.

| Isolation of sarkosyl-insoluble aggregates
Aggregates were prepared from Alzheimer's Disease (AD) vs.
Supernatant protein was quantified and each sample (0.6-1.0 mg) was centrifuged 15 min at 13,000 × g. Supernatants (soluble protein) were removed, and to each insoluble pellet the same lysis buffer was added plus 1% (v/v) sarkosyl, and mixed well. Samples were centrifuged 20 min at 100,000 × g; supernatants and pellets were recovered as "sarkosyl-soluble aggregates" and "sarkosyl-insoluble aggregates", respectively.

| Immunoprecipitation of amyloid beta and tau aggregates
AD and AMC hippocampal tissue samples were pulverized as described above. After removal of debris (centrifugation for 5 min at 1400 × g), protein was quantified by the Bradford protein assay.

| Aggregate contactome generation
Insoluble aggregates isolated from SY5Y-APP Sw cells as above, were cross-linked following procedures described previously (Balasubramaniam et al., 2019). In brief, purified aggregates were rinsed, cross-linked with modified click reagents, digested with trypsin, and the linked peptide pairs were affinity purified using streptavidin-coated beads to capture the biotin-coupled crosslinking moiety. Cross-linked peptide pairs were identified from highresolution LC/MS-MS raw data files, using a modified version of Xlink identifier (Balasubramaniam et al., 2019;Du et al., 2011). Xlink identifier outputs were analyzed with the GePhi software package to calculate the degree (number of interacting partners) of each hub. Because high-molecular-weight proteins (e.g., titin) have greater potential to interact with other proteins, spectral hits for each hub were normalized, i.e. divided by the length of that hub protein in amino acids. Identified contactome proteins were categorized by degree, as described previously (Balasubramaniam et al., 2019). Proteins with a high normalized degree (number of interacting partners divided by length in amino acids) or classified as hub-connectors (connecting 2 or more hub proteins that are not otherwise connected) were pursued by further graph modeling; the Cytoscape package (Shannon et al., 2003) was used with default parameters to construct and visualize graphs.

| Isolation and quantitation of nucleic acids in aggregates
For sequencing of nucleic acid fragments from isolated aggregates, RNA and DNA were extracted from sarkosyl-insoluble material isolated from cultured cells, or from AD and AMC hippocampus, using the Qiagen AllPrep kit following manufacturer's instructions and a protocol in which this kit was shown to recover even small nucleic acid fragments (Pena-Llopis & Brugarolas, 2013).
To quantify DNA and RNA trapped in sarkosyl-insoluble aggregates, nucleic acids were extracted and assayed by multiple protocols, with consistent results. These consisted of (1)  Ctr., TR118) to isolate RNA, DNA, and protein in a single protocol.
Figure 2 data were obtained by method (3) above.

| RNA-seq and ChIP-seq analyses
All RNA-seq and ChIP-seq analyses were performed by the UT Southwestern Genomics Core, analyzed using the CLC Genomics Workbench. We employed ChIP-seq to evaluate DNA-fragment specificity; thus, the primary analytic value is the number of significant peaks, with peak validity assessed by an E value relative to a flat distribution (peak absence). RNA-seq was preceded by peak validation, just as for ChIP-seq. Subsequently, valid-peak reads that map uniquely to exons ("unique exon reads") were summed as our expression metric, and were used to determine differential expression between groups.
The nucleic acid contig assemblies were quite consistent in size, 579 ± 34 (SD) base pairs in length for DNA peaks, and 291 ± 31 (SD) for RNA-fragment contigs. The efficiency of ChIP-seq and RNAseq fragment cloning protocols, employed prior to sequencing, is quite sensitive to fragment size. Under normal ChIP-seq protocols, they would be determined by shearing or sonication, size selection by cloning vector, and/or manual size selection. However, in the case of aggregate nucleic acid fragments, other factors may be influential-such as the size, age, and intracellular location of individual aggregates.

| Viral sequence analysis
We employed a modified version of VirTect to scan DNA and RNA fragment sequences from human AD and AMC (age-matched control) hippocampi. VirTect is a pipeline script that calls a sequence of RNA-seq pattern-matching routines (Khan et al., 2019). VirTect retrievals of viral matches to aggregate nucleic acid reads, from 3 AD and 3 AMC brain samples, were filtered using the following parameters (https://github.com/WGLab/ VirTe ct/blob/maste r/README.

| G-quadruplex analyses
We employed two programs to screen RNA and DNA sequences for G-quadruplex-forming regions: G4CatchAll (Doluca, 2019) and QGRS Mapper (Kikin et al., 2006). Both strands were scanned for each DNA-fragment sequence, but only strands with G4-forming potential were pursued in subsequent analyses. The following parameters were used for G4CatchAll: G3L (loop limit) was set to 1.3; G2L (allowing 2-G loops) was set to 1.3; G4H (enables the G4Hunter algorithm for final evaluation). The following parameters were used for QGRS Mapper: Max. Length 30; Min G-Group 2; Loop size 0 -36.

| Statistical analyses
Inter-group differences were tested for significance by 2-tailed Behrens-Fisher heteroscedastic t tests, unless otherwise indicated.
These conservative tests are appropriate to small-sample comparisons in which the intra-group variance is not well estimated.
Comparisons of ratios generally employed Yates chi 2 (chi-squared) nondirectional tests, substituting 2-tailed Fisher Exact tests as required to meet numerical constraints. This conservative replacement is stated in the text but is not made explicit (line by line) in tables to conserve space.

CO N FLI C T S O F I NTE R E S T
The authors declare no competing or conflicting interests.

AUTH O R CO NTR I B UTI O N S
RJSR and SA designed the study; MB and SA performed aggregate

DATA AVA I L A B I L I T Y S TAT E M E N T
All data generated or analyzed during this study are included in this published article and its supplementary information files. Any reasonable request for additional data will be honored.