A common classification framework for histone sequence alterations in tumours: an expert consensus proposal

The description of genetic alterations in tumours is of increasing importance. In human genetics, and in pathology reports, sequence alterations are given using the human genome variation society (HGVS) guidelines for the description of such variants. However, there is less adherence to these guidelines for sequence variations in histone genes. Due to early cleavage of the N‐terminal methionine in most histones, the description of histone sequence alterations follows their own nomenclature and differs from the HGVS‐compliant numbering by omitting this first amino acid. Next generation sequencing reports, however, follow the HGVS guidelines and as a result, an unambiguous description of sequence variants in histones cannot be provided. The coexistence of these two nomenclatures leads to confusions for pathologists, oncologists, and researchers. This review provides an overview of tumour entities with sequence alterations of the H3‐3A gene (HGNC ID = HGNC:4764), highlights the problems associated with the coexistence of these two nomenclatures, and proposes a standard for the reporting of histone sequence variants that allows an unambiguous description of these variants according to HGVS principles. We hope that scientific journals will adopt the new notation, and that both geneticists and pathologists will include it in their reports. © 2021 The Authors. The Journal of Pathology published by John Wiley & Sons, Ltd. on behalf of The Pathological Society of Great Britain and Ireland.


Introduction
The diagnosis and management of tumours increasingly depends on the demonstration of sequence variants (colloquially known as mutations and other genetic alterations) in genes related to their causation. The discovery of such tumour-associated sequence variants not only facilitates the diagnosis of tumour types that are difficult to classify based on histomorphology and immunohistochemistry alone but can also be of prognostic and therapeutic value. To unify the description of such variants and to make new discoveries available for clinicians and pathologists, the Human Genome Variation Society (HGVS) has developed and maintained a nomenclature for the standardised and unambiguous description of sequence variants [1][2][3][4]. This has become the universally accepted nomenclature for all next generation sequencing (NGS) and whole genome sequencing reports and provides the basis for consensus recommendations for molecular pathology [5] and clinical oncology [6].
Biologists and clinicians around the world have applied these guidelines to report new discoveries in DNA sequence variation, with the apparent exception of those working on histones, which present unique challenges.
Histones are a family of basic proteins that associate with DNA and allow condensation to form chromatin. The four canonical histones H2A, H2B, H3, and H4 form dimers between H2A and H2B (H2A-H2B) and between H3 and H4 (H3-H4). When pairs of these dimers are bound to DNA, they form a histone octamer, which corresponds to a nucleosome. A chain of nucleosomes is wrapped in a spiral called a solenoid, in which the nucleosomes are further stabilised by the linker his-DNA transcription by regulating RNA polymerase II's access to DNA. Consequently, complex cell processes can be influenced by histone modifications and its related alterations in gene expression. There are isoforms for the core histones H3, H2A, and H2B, as well as for the H1 linker histone, which show alterations of either a few amino acids or larger domains [7]. These histone isoforms are further subdivided into replicative or replacement types. Whereas replicative types are only highly expressed during the S-phase, replacement histone types are typically synthesised in a cell-cycleindependent manner and incorporated into chromatin. Since replacement histone types localise to specific chromatin domains, they can lead to differential expression in different tissue compartments. Therefore, a single histone isoform can fulfil different functions. Replacement types H3.3 and H2A.Z are involved in the regulation of transcription, whereas the replicative isoforms H3.1 and H3.2 are incorporated during replication and the replicative H3.5 contributes to DNA condensation during spermatogenesis [9,10].
Well-documented discrepancies in histone sequence notation reflect the historical roots of this scientific field. The amino acid (AA) sequence of histones was first deciphered in the 1960s and 1970s [11][12][13], using chromatographic separation methods, enzymatic and chemical cleavage, followed by Edman degradation to determine the sequences of the functional histone proteins. After histone genes had been sequenced, it became apparent that the AA encoded by the first codon of histone genes, a methionine, is post-translationally cleaved from most proteins and could therefore not be detected with previous protein-based methods. However, the previously established numbering of the AA sequences remained the standard in this field. Therefore, the sites of histone sequence variants are typically reported lacking one AA, relative to the DNA-based AA numbering [11,14]. This discrepancy is particularly problematic at sites where two identical AAs follow each other in the protein sequence. In human H3. 3 [15].
While in some instances the use of legacy AA numbering may have little significance, it is not sufficient for use in tumour classification. For histone sequence variants, the coexistence of two numbering systems (nomenclatures) can lead to confusion for pathologists, researchers, and clinicians alike. The International Agency for Research on Cancer (IARC) noted this issue during production of the 5th edition of the WHO Classification of Tumours, published as the widely used WHO Blue Books (http:// whobluebooks.iarc.fr), and decided to explore the matter further in the hope of providing precise and unequivocal description of tumour-associated sequence variants in histone genes to avoid the potential for errors based on the coexistence of two different nomenclatures.
We therefore propose the use of the following notation: in line with existing guidance for reporting genetic alterations, histone sequence variants should be reported using the HGVS recommended format. First, reference sequences must be specified for the transcript (mRNA) and the corresponding encoded protein. Once that has been done, the gene symbol may be used to indicate the gene that harbours the DNA sequence. For example, the transcript and protein reference sequences for the H3-3A gene are RefSeq [16] entries NM_002107.7 and NP_002098.1, respectively. Based on these reference sequences, a variant in H3-3A at the transcript and protein level would be described as H3-3A:c.103G>A p.Gly35Arg. This variant description is standardscompliant and corresponds to the variant commonly known as H3.3 G34R, in which the AA number is based on the legacy protein sequence and the amino acids are demonstrated in a single letter code. This legacy designation is also used to label antibodies raised against specific protein isoforms and variants. It should be noted that there is currently no agreed HGVS nomenclature syntax for the reporting of legacy/common variant descriptions. For now, the reporting format from DNA sequencing should be H3-3A:c.103G>A p.Gly35Arg (G34R), noting that there is a space, but no punctuation, between the DNA, protein, and the legacy protein variant descriptions, often identified by immunohistochemistry. The parentheses are used to indicate the legacy protein sequence description.
The WHO Classification of Tumours will adopt this nomenclature within its publications both online and in print. Molecular pathology reports should also cite the version of the human genome used, e.g. GRCh37 or GRCh38, when a variant is described in terms of a genome position following genome or exome sequencing, as well as the method used to obtain the results [17,18].
We established the extent of the nomenclature problem for sequence variants of histones using these conditions, parameters, and assumptions: • Nucleotide and AA sequence variants are described in accordance with the current HGVS recommendations (Version 20.05) for the description of sequence variants [1]. In line with these recommendations, the term 'sequence variant' is used instead of 'mutation'. In agreement with current recommendations, the threeletter AA code is used to reduce reporting errors. To simplify the presentation of HGVS-compliant protein sequence variant descriptions, an assumption is made that all such variants have been confirmed by independent methods and that none are purely predictions from variants at the nucleotide level.
• Gene symbols, names, and gene IDs are used in accordance with the recommendations of the HUGO Gene Nomenclature Committee (HGNC) [19]. • Nucleotide and protein reference sequences for the reporting of sequence variants are from the NCBI RefSeq Reference Sequence Database [16]. • Sequence variant descriptions are validated using Var-iantValidator (https://variantvalidator.org/) [20]. • The WHO Classification of Tumours online database (https://tumourclassification.iarc.who.int/home) and PubMed were searched as a rapid mapping review for references to 'histone', 'K27', 'G34', 'K28', and 'G35' to identify those tumour types whose classification, and therefore potentially diagnosis, is supported by and sometimes depends on histone AA positions.  Table S1. A list of relevant website resources is provided in Table 1.

Tumour entities that harbour histone-related changes
The discovery of histone sequence variants as oncogenic drivers in neoplasms has opened a new chapter in tumour diagnostics. Indeed, there are an increasing number of tumour entities that harbour histone-related changes ( Table 2). Such entities include tumours of the central nervous system, bone and soft tissue neoplasms, head and neck squamous cell carcinoma, malignant melanoma, bladder and colorectal cancer, ovarian cancer, and haematological neoplasms. These are described here and summarised in Table 3.

Digestive system tumours
Few H3-3A sequence variants have been reported in COSMIC (H3F3A) for digestive system tumours, and there is as yet little consistency, with fewer than five reports of the same tumour type. Though these are small numbers of reports, the few anatomic tumour sites that carry alterations in more than one case comprise H3-3A sequence variants encoding p.Ala115Gly, p.Ala88Ser, p.Ala88Thr, p.Ile90Val, p.Lys28Arg, p.Arg50His, or p. Arg50Cys missense variants. Such changes have been detected in intestinal adenocarcinoma, tumours involving the oesophagus and pancreas, as well as hepatocellular cholangiocarcinoma [34]. The biological significance of these sequence alterations is uncertain.

Breast tumours
There is a high diversity of H3-3A sequence variants described for breast tumours in the COSMIC database, most being invasive breast carcinomas (IBCs) of no special type (NST). However, COSMIC lists two ductal carcinomas in situ in combination with Paget's disease, which both show a p.Arg73Gln substitution.

Soft tissue and bone tumours
There is an established and consistent pattern of sequence alteration in giant cell tumour of bone [23,26,35], with the H3-3A:c.103G>T p.Gly35Trp sequence variant being reported in the vast majority of cases. Additionally, chondroblastoma, chondrosarcoma, clear cell chondrosarcoma, and osteosarcoma have been reported to harbour H3-3A: c.110A>T p.Lys37Met sequence variants [23,36].

Female genital tumours
There are few reports of H3-3A gene sequence alterations in tumours of the female genital tract, with most reported in endometrial carcinomas (18/24 in COSMIC; supplementary material, Table S1) [37].

Thoracic tumours
There are few reports of sequence variants in thoracic tumours. Two small cell lung carcinomas have been reported to harbour sequence variants leading to a stop at p.Gly34Ter [38]. Squamous cell carcinoma cases of the lung have been described with the introduction of stop codons at p.Gln6Ter or p.Tyr100Ter (CGP study 418).

Central nervous system tumours
The majority of H3-3A sequence variants have been reported in tumours of the CNS, some of which are now named accordingly: Diffuse hemispheric glioma, H3 G34-mutant The incidence of diffuse hemispheric glioma, H3 G34-mutant, ranges from 8% to 16% of paediatric highgrade gliomas with hemispheric location [25,29,39]. So far, H3-3A sequence alterations resulting in a p.Gly35Arg (G34R) or a p.Gly35Val (G34V) substitution have been described for this tumour type. The vast majority of tumours show a glycine-to-arginine substitution (p.Gly35Arg; 94%), A classification framework for histone alterations 111 whereas the glycine-to-valine substitution is rare (p.Gly35Val; 6%).
Pilocytic astrocytoma (with histone H3 gene alteration) Pilocytic astrocytoma carrying a histone H3 sequence alteration that results in a p.Lys28Met (K27M) substitution is extremely rare. Some of the reported cases showed fast tumour progression and hence are more likely to resemble diffuse midline gliomas, H3 K27-altered. However, there are individual cases with a prolonged overall survival of about 10 years [43], which are biologically more in accordance with pilocytic astrocytomas. The PRISMA Statement' [21] (for more information visit http://www.prisma-statement.org).  [48,49]. The progression-free survival of PFA cases is worse compared with its counterpart, the posterior fossa ependymoma group B, which lacks sequence alterations of one of the histone H3 isoforms [50].

Urinary and male genital tumours
There are relatively few reports of H3-3A sequence variants in prostate, bladder, and renal carcinomas [34], with H3.3 histone A p.Ala48Val, p.Lys5Met, p. Arg50His or p.Arg50Leu protein alterations each being listed at least twice (supplementary material, Table S1).

Head and neck tumours
Squamous cell carcinomas of the upper respiratory tract [34,53] have been reported to carry a recurrent H3-3A: c.344C>G p.Ala115Gly alteration, though numbers remain small. So far, these alterations have only been found in HPV-independent tumours.

Endocrine and neuroendocrine tumours
Histone sequence variants have been studied in relatively small numbers of cases, but there is a consistent H3-3A: c.103G>T p.Gly35Trp variant in four phaeochromocytomas listed in COSMIC, and in four paragangliomas studied [22]. This variant also occurs in thyroid carcinomas (2/4 cases, type not specified), whereas the other two cases of H3-3A-mutant thyroid cancer displayed an H3-3A: c.86G>C p.Ser29Thr variant (CGP study 676).

Haematolymphoid tumours
Histone sequence variants occur in a variety of myeloid and lymphocytic haematogenous malignancies, with T-cell lymphomas showing a p.Ala115Gly, p.Lys28Glu, p.Lys28Met or p.Lys28Asn substitution in H3.3 histone A in more than one case. A single case of a diffuse large B-cell-lymphoma with an H3-3A:c.45A>C p.Lys15Asn substitution has been described (CGP 632). Lohr et al [54] reported an accumulation of variants in the linker histone H1 family in diffuse large B-cell lymphomas. The functional significance of these sequence alterations remains to be explored, although the authors report that hotspot variants of H3C2 and potentially other histone core proteins might be related to activation-induced cytidine deaminase (AID)-mediated somatic hypermutation [54].
Additionally, single leukaemia cases including AML [34] and T-ALL [55,56], as well as follicular lymphomas [57], have been reported to show alterations in histone genes.

Skin tumours
There are rare cases of melanomas with H3-3A sequence variants [34].

Advantages of using the proposed reporting system for histone sequence alterations
Our analysis of the published literature and the COSMIC database suggests that H3-3A sequence variants occur in tumours from various sites of the body, though overall relatively rarely, with CNS and bone tumours accounting for the majority of reported cases.
The use of a standardised nomenclature for reporting genetic alterations, as described in the HGVS Sequence Variant Nomenclature, allows the unambiguous description of sequence variants, independent of the analytical method, tumour type or field of research. The HGVS recommendations have thus become the almost universally accepted guidelines for the description of genetic alterations.
The benefit of the recommended HGVS-compliant reporting scheme is that it includes the sequence alteration at the DNA level, as well as the predicted resulting change at the protein level. This is particularly relevant for histones for the following reasons. Histone H3.3 is encoded by two genes, H3-3A and H3-3B, with each  Figure S1). An 'H3.3 K37M' description would thus not be sufficient to unequivocally report the differences at the site of genomic sequence alteration. The report of the specific mutated gene or alteration may also have prognostic impact, as observed in paediatric patients with H3 K27-altered diffuse midline glioma located in the pons, i.e. diffuse intrinsic pontine glioma (DIPG). DIPG patients with an H3-3A:c.83A>T p. Lys28Met (K27M) sequence variant have a median overall survival of 11 months. In contrast, patients with the same AA substitution resulting from sequence variation in the H3C2, H3C3 or H3C14 genes have a median overall survival of 15 months [59][60][61]. Since this information can be useful for clinicians and patients, the recommended HGVS-compliant nomenclature encourages pathologists to correctly report alterations that are associated with different outcomes. Such information is important for future studies to unequivocally report genetic information that is known to be of prognostic relevance. Additionally, it enables better-defined patient stratification that might result in the identification of new prognostically or therapeutically relevant sequence alterations in the future. Despite the clearly defined site of sequence variation and report of potentially different outcome in patients, the so far-used AA-based description of the site of histone sequence alteration is potentially inaccurate. The legacy description (H3 K27) in 'diffuse midline glioma, H3 K27-altered' is based on the processed histone protein, after cleavage of the initiating methionine, which is assumed to be the tumour driver protein. However, histones contain several N-terminal cleavage sites. For histone H3 variants, several cathepsin L cleavage sites have been reported [62] in murine and human embryonic stem cells (ESCs), such as after p.Ala22, p.Thr23, p.Lys24, p.Ala25, p.Arg27 and p.Lys28. Mass spectrometry analysis has demonstrated that such post-translational cleavage occurs. Even though the role of such cleavage has not been entirely resolved, Zhou et al [62] reported that it may influence the epigenetic signature upon differentiation and that it could play a role in apoptosis as well as in immune cell recruitment. Alterations of these hypothesised functions are tightly associated with tumour development [63]. In addition, there has been a serine protease activity detected in human ESCs with an additional cleavage site after p.Ala32 in H3.1 and/or H3.2 isoforms [64]. To our knowledge, this serine protease has not been fully characterised and thus we lack further information regarding the biological impact. The use of an AA-based nomenclature, as is the current standard in the histone field, is not only potentially ill-defined but also potentially erroneous regarding the tumour driver protein, with implications for both diagnostics and research.
Another aspect is that there is at least one histone (H2B type W-T) in which the initiating methionine is not cleaved. Therefore, the histone nomenclature description of sequence variants necessitates awareness of whether cleavage of the initiating methionine occurs [7].
It becomes even more complicated in the case of histone 3-like centromeric protein A (CENP-A). Even though this protein shows cleavage of the initiating methionine after nucleosome deposition, Bailey et al found that approximately 10% of pre-nucleosomal CENP-A still retains the initiating methionine [65]. Thus, the notation of the AA-sequence changes for CENP-A depends on its biological state. Since CENP-A H4 heterodimers form a pre-nucleosomal complex with Holliday junction recognition protein (HJURP), which is active in G2 and during mitosis, it is speculated that the variant with the retained initiating methionine might be involved in tumour development [65].
Due to the coexistence of the two reporting systems, namely the legacy AA-based nomenclature of the mature histone protein as well as the DNA-based nomenclature, as used in genetic testing reports, there are many ambiguous reports in the literature. However, combined use of gene symbols, DNA sequence variants (based on reference transcript sequences), and the predicted resulting substitutions of the protein unequivocally defines the site of the sequence alteration. Furthermore, it enables a DNA-based description of histone sequence variants, which is regarded as the gold standard for the reporting of variants according to the HGVS sequence variant recommendations, and allows unambiguous description of these alterations.

Application of the proposed nomenclature on histone gene alterations found in brain tumours
The most common CNS tumour harbouring H3-3A sequence variants is the 'diffuse midline glioma, H3 K27-altered'. In this entity, a c.83A>T sequence variant within H3-3A leads to the replacement of the AA lysine (Lys) with methionine (Met) (p.Lys28Met). Such alterations should be reported as H3-3A:c.83A>T p.Lys28-Met (K27M) [15].
The second most frequent glial tumour harbouring an H3-3A sequence variant is the diffuse hemispheric glioma, H3 G34-mutant, which is characterised by the substitution of glycine (Gly) by either arginine (Arg) or valine (Val) (p.Gly35Arg or p.Gly35Val). Following the notation proposed here, the sequence alteration should be reported as H3-3A:c.103G>A p.Gly35Arg (G34R), or H3-3A:c.103G>C p.Gly35Arg (G34R), or H3-3A:c.104G>T p.Gly35Val (G34V) [15]. The same reporting system would apply for the single cases of gangliogliomas, pilocytic astrocytomas, A classification framework for histone alterations 115 ependymomas [48], and subependymomas [51] that were reported to contain H3-3A sequence variants. It needs to be acknowledged that there is a debate about whether cases with histological features of a ganglioglioma and presence of a p.Gly35 substitution in the H3-3A-encoded protein are part of the spectrum of diffuse hemispheric glioma, previously reported as H3.3 G34-mutant, or whether they resemble real gangliogliomas with rare H3-3A sequence variation and are therefore not listed as such in COSMIC.
Application of the proposed nomenclature on histone gene alterations found in bone and soft tissue tumours In the bone and soft tissue field, there are two bone tumour types that account for most of the soft tissue and bone tumours associated with recurrent histone sequence alterations: there is an established and consistent pattern of sequence variants in giant cell tumour of bone [23,26,35]

Techniques for the detection of sequence alterations
Various techniques are available for analysing tissue samples for sequence variants using methods based on DNA sequencing or PCR. Some individual methods are capable of determining the number of variant DNA molecules relative to the normal reference sequence and hence are recognised as being quantitative or semi-quantitative. The choice of method and interpretation of results should depend upon the experience of the institution performing the analysis [66,67]. Also, the use of immunohistochemistry has become a routine in tumour diagnostics, as it is a fast and inexpensive method to report the presence or absence of tumour-related AA sequence alterations in neoplastic tissue. In addition, it also enables pathologists to clearly define the distribution and the cellular components that bear such sequence alterations. This not only allows for a better understanding of tumour types but also defines areas of tumour tissue where molecular analyses would be most promising for revealing conclusive results (by avoiding cellular components that are negative for these alterations).
In addition to the sequence variant-specific antibodies against IDH1 R132H and BRAF VE1 (BRAF V600E) that are commonly used in tumour diagnostics, there are also antibodies available that detect H3 isoform-  Table 4). Even though a positive immunohistochemical result is considered to be sufficient to define the tumour entity, further elucidation of a sequence variant's prognostic value, e.g. in H3 K27-altered diffuse midline glioma located in the pons (DIPG), may depend on the specific gene in which the sequence is changed (see above), and hence requires additional molecular testing [59][60][61] (Table 5). Combined use of molecular analysis and immunohistochemistry also further controls for false-positive or false-negative results, which have been reported for both sequencing and immunohistochemistry [68]. Histone sequence variant-specific antibodies have been developed and named according to legacy histone AA numbering. To avoid confusion with the labelling of these antibodies, we suggest for the time being that these legacy descriptions be added in brackets following the HGVScompliant standard notation (see above).

Practical handling of the proposed new nomenclature
Knowledge of the subject as summarised in this present paper has several important limitations. Ascertainment bias is a known issue in databases such as COSMIC, leading to a greater number of rare sequence variants being listed which then appear to be more common than they are. Reporting bias in publications is also likely, as many are case reports or small series rather than comprehensive studies, presenting an incomplete picture that requires more research. Sequencing of histone genes on a population basis in multiple geographical areas to account for any ethnic differences would be very helpful. Studies using tumour banks may fulfil this need, at least for the more common tumour types. Nevertheless, we have been able to identify some tumour types in which the reports are frequent enough to suggest that histone sequence variation may be important in their biology.
Since sequencing is usually more time-consuming than histomorphological and immunohistochemical assessment, the proposed nomenclature of entities with a histone sequence variant encourages a two-stage diagnosis. With the help of variant-specific antibodies and histone sequence variant-defining immunohistochemical features, a diagnosis can be given which includes the histomorphology and the suspected histone sequence variant based on the immunohistochemical findings, such as diffuse midline glioma with immunohistochemical detection of a p.Lys28Met (H3 K27M) sequence variant in one of the histone 3 isoforms. In some circumstances, such as in H3 K27-altered diffuse midline glioma located in the pons (DIPG), sequencing may not only be helpful to confirm the diagnosis but may also be of prognostic value, as indicated above. Following the proposed nomenclature, cases of diffuse hemispheric glioma, H3 G34-mutant, with, for example, immunohistochemical detection of H3.3 G34R positivity, might also be genetically evaluated to confirm changes in the nucleotide sequences, which have so far been described to be either H3-3A:c.103G>A p.Gly35Arg (G34R) or H3-3A:c.103G>C p.Gly35Arg (G34R).

Conclusion
This overview of tumour entities with sequence alterations of the H3-3A gene (HGNC ID = HGNC:4764) highlights the problems associated with the coexistence Table 5. Advantages and disadvantages of different methods for assessing histone alterations.

Method
Advantage Disadvantage of two distinct nomenclatures and proposes a standard for the reporting of histone sequence variants that allows an unambiguous description of these variants according to HGVS principles. Here, the gene, DNA, derived protein sequence, and legacy (antibody) sequence are given to account for the inconsistency in numbering resulting from the assumed post-translational modification of the protein. We hope that scientific journals will adopt the new notation, and that both geneticists and pathologists will find it helpful to include it in their reports.