The EAHAD blood coagulation factor VII variant database

Hereditary blood coagulation factor VII (FVII) deficiency is a rare autosomal recessive bleeding disorder resulting from variants in the gene encoding FVII (F7). Integration of genetic variation with functional consequences on protein function is essential for the interpretation of the pathogenicity of novel variants. Here, we describe the integration of previous locus‐specific databases for F7 into a single curated database with enhanced features. The database provides access to in silico analyses that may be useful in the prediction of variant pathogenicity as well as cross‐species sequence alignments, structural information, and functional and clinical severity described for each variant, where appropriate. The variant data is shared with the F7 Leiden Open Variation Database. The updated database now includes 221 unique variants, representing gene variants identified in 728 individuals. Single nucleotide variants are the most common type (88%) with missense representing 74% of these variants. A number of variants are found with relatively high minor allele frequencies that are not pathogenic but contribute significantly to the likely pathogenicity of coinherited variants due to their effect on FVII plasma levels. This comprehensive collection of curated information significantly aids the assessment of pathogenicity.

tion of previous locus-specific databases for F7 into a single curated database with enhanced features. The database provides access to in silico analyses that may be useful in the prediction of variant pathogenicity as well as cross-species sequence alignments, structural information, and functional and clinical severity described for each variant, where appropriate. The variant data is shared with the F7 Leiden Open Variation Database. The updated database now includes 221 unique variants, representing gene variants identified in 728 individuals. Single nucleotide variants are the most common type (88%) with missense representing 74% of these variants. A number of variants are found with relatively high minor allele frequencies that are not pathogenic but contribute significantly to the likely pathogenicity of coinherited variants due to their effect on FVII plasma levels. This comprehensive collection of curated information significantly aids the assessment of pathogenicity.

K E Y W O R D S
blood coagulation disorders, factor VII deficiency, genetic variation, hemostasis, LSDB

| INTRODUCTION
The initiation of blood coagulation and subsequent wound repair is a fundamental defense mechanism conserved in all vertebrates.
Exposure of blood coagulation factor (F) VII/VIIa to cells expressing its cellular receptor and cofactor tissue factor (TF) is both necessary and sufficient to initiate blood coagulation in vivo, leading to the generation of thrombin and a fibrin clot (Figure 1a).
FVII is a zymogen of a vitamin K-dependent serine protease that is synthesized in the liver and circulates in plasma as a single-chain molecule (406 amino acids) at a concentration of approximately 0.5 μg/ml (10 nmol/L). In common with the other serine proteases of the coagulation network (FIX, FX, prothrombin, and protein C) as well as protein S and protein Z, FVII has an N-terminal domain that contains 10 glutamic acid residues that are posttranslationally modified by the addition of a carboxyl group to the γ-carbon by a vitamin K-dependent carboxylase. This γ-carboxyglutamic acid (GLA) domain confers affinity to negatively charged phospholipid membranes such as those of activated platelets, promoting the assembly of functional multiprotein complexes on these surfaces. The primary translation product of FVII (466 amino acids) contains a prepro-leader sequence conserved (pro) sequence found in other vitamin K-dependent proteins that directs the γ-carboxylation. The GLA domain is followed by two epidermal growth factor (EGF)-like domains, the connecting or activation peptide, and the serine protease domain (Figure 1b).
FVII is converted to its activated form FVIIa as the result of a single proteolytic cleavage between Arg212 and Ile213 (all residues are numbered according to Goodeve, Reitsma, and McVey (2011) using Human Genome Variation Society (HGVS) nomenclature, numbering the initiation methionine of the reference protein sequence (NP_000122.1) as +1, and may differ from legacy numbering +60, producing a disulfide-linked two-chain molecule. In blood, 4% of the total circulating FVII is in the form of FVIIa that has little functional activity in the absence of its cofactor TF. Unlike other members of the trypsin superfamily, the neo-N-terminus generated upon activation of FVII fails to insert into the activation pocket leading to a nonoptimal alignment of the catalytic machinery, rendering the FVIIa "zymogen like" with significantly reduced catalytic activity. Binding of FVIIa to TF allosterically corrects this defect, transforming FVIIa into a catalytically competent enzyme. In addition, TF ensures optimal orientation and positioning of the FVIIa catalytic domain above the membrane for optimal interaction with its substrates, thereby enhancing the proteolytic activity by 10 6 -fold. The substrates of the TF-FVIIa complex are blood coagulation factors FIX and FX. A schematic of TF-FVIIainitiated thrombin generation is shown in Figure 1a.

| Database structure
The database was built on a common architecture developed for blood coagulation variant databases, using a MySQL platform and HTML, CSS, JavaScript, Perl, and PHP interface (McVey et al., 2020).
The first EAHAD database using this architecture was for F9 variants (Rallapalli et al., 2013). The database is available at f7-db.eahad.org.
The variant data in the databases are shared with LOVD; databases.lovd.nl/shared/genes/F7, which is a freely available genecentered collection of DNA variant data and is part of the GEN2-PHEN and Human Variome projects (Fokkema et al., 2011).

| Identification of variants
Data was initially imported from the original MRC FVII mutation database (McVey et al., 2001) and the UMD-F7 mutation database (umd.be/F7; Beroud et al., 2005). Subsequently, additional variants were identified in the published literature. All variants incorporated into the new database were verified for accuracy and HGVS nomenclature was generated and checked with Mutalyzer (mutalyzer.nl/). All data referring to individual cases with variants in the F7 database is pseudo-anonymized and no information is provided on the site that identifies individuals.

| Nomenclature
It is particularly important in molecular genetic analysis that there is no confusion resulting from differences in variant nomenclature between laboratories/publications. Many coagulation genes were cloned and initially sequenced during the 1980s, before the introduction of standardized nomenclature. As a result, genes and proteins have their own idiosyncrasies of naming and numbering.

| In silico analyses
The database provides access to in silico analyses that may be useful Finally, the evolutionary conservation of the amino acid sequence of the protein at the variant residue in closely related species chimpanzee, gorilla, gibbon, bushbaby, and marmoset can be inspected.
Further multiple sequence alignments from more distantly related species and multiple alignments of human vitamin K-dependent coagulation factor protease domains are also available from the AA Alignments tab.  (Millar et al., 2000;Pinotti et al., 1998).

| Impact of common variants known to modulate FVII levels
A number of common variants have been identified in F7 (Table 1) (Bernardi et al., 1996) in patients and confirmed by in vitro studies (Hunault, Arbini, Lopaciuk, Carew, & Bauer, 1997;Pollak, Hung, Godin, Overton, & High, 1996). In contrast, the rare c. in trans of the modulating common variants will, therefore, be provided when available for future submissions to this database.

| Individual data
In common with the other EAHAD coagulation factor databases, the

| Impact of thromboplastin source for FVII:C measurement
It is well-documented that some FVII protein variants display variations in the FVII:C measurement according to the species of the thromboplastin reagent (TF) that has been used to trigger the in vitro measurement. Historically bovine and rabbit sources were used before the introduction of recombinant human thromboplastin. The  (Matsushita, Kojima, Emi, Takahashi, & Saito, 1994;Mourey et al., 2014;O'Brien et al., 1991;Takamiya & Takeuchi, 1998;Zheng, Shurafa, & James, 1996). Importantly, the impact of the residual FVII:C levels measured in vitro on the potential bleeding phenotype of the individual should only be considered when assayed with human thromboplastin.

| Clinical phenotype and genotype
The database presents statistics and graphics on all the variants in the database by specific type of variant, by protein domain and by disease severity, available from the Variants tab allowing users to analyze relationships between clinical phenotype and genotype.

| Assessing pathogenicity of F7 variants
Variant classification is central to the utility of molecular genetic diagnostics in clinical practice; however, predicting whether gene variants are likely to be pathogenic may not be straightforward.
The F7 variant database currently does not assign pathogenicity to an individual variant but rather provides access to a number of tools to allow assessment of the variant according to published guidelines that establish a framework for variant classification (Nykamp et al., 2017;Richards et al., 2015). In future releases, the database will link directly to ClinVar (ncbi.nlm.nih.gov/clinvar) which is an open-access database that reports curated information and likely pathogenicity on gene variants. EAHAD curators are working as co-chairs or members of relevant curation panels that input into ClinVar. The variants classified as pathogenic are nonsense, canonical splice site, frameshift, missense variants resulting in the substitution of critical residues for FVII function namely, residues at the Arg212-Ile213 bond for proteolytic activation, at the catalytic site (His253, Asp302, Ser404), or residues involved in the unique disulfide bond between the light and heavy chains of the activated form of FVII (Cys195, Cys322), or variants at transcription factor binding sites within the F7 promoter (HNF4 and Sp1).