Characterization of the human ABO genotypes and their association to common inflammatory and cardiovascular diseases in the UK Biobank

The ABO gene contains three major alleles that encodes different antigens; A, B, and O, which determine an individual's blood group. Previous studies have primarily focused on identifying associations between ABO blood groups and diseases risk. Here, we sought to test for association between ABO genotypes (OO, OA, AA; OB, BB, and AB) and a large set of common inflammatory and cardiovascular diseases in UK Biobank as well as disease‐related protein biomarkers in NSPHS. We first tested for association by conducting a likelihood ratio test, testing whether ABO contributed significantly to the risk for 24 diseases, and 438 plasma proteins. For phenotypes with FDR < 0.05, we tested for pair‐wise differences between genetically determined ABO genotypes using logistic or linear regression. Our study confirmed previous findings of a strong association between ABO and cardiovascular disease, identified associations for both type 1 and type 2 diabetes, and provide additional evidence of significant differences between heterozygous and homozygous allele carriers for pulmonary embolism, deep vein thrombosis, but also for von Willebrand factor levels. Furthermore, the results indicated an additive effect between genotypes, even between the two most common A subgroups, A1 and A2. Additionally, we found that ABO contributed significantly to 39 plasma proteins, of which 23 have never been linked to the ABO locus before. These results show the need of incorporating ABO genotype information in the consultation and management of patients at risk, rather than classifying patients into blood groups.


| INTRODUCTION
The ABO blood group antigens were first identified by Karl Landsteiner in the beginning of the twentieth century, and its alleles give rise to the four major blood groups A, B, AB, and O. Since then, numerous associations have been reported between particular ABO blood groups and an increased susceptibility to disease, ranging from cardiovascular diseases to infections. It has been shown that individuals with an O blood group have a lower risk of cardiovascular diseases, including thromboembolism, 1-3 pulmonary embolism (PE), 3,4 and possibly also for myocardial infarction (MI), 3 compared to other blood groups. The different blood groups (A, B, AB, and O) have different properties where, for example, persons with an O blood group are less prone to coagulation, possibly since they express significantly lower plasma levels of both coagulation factor VIII (FVIII) and von Willebrand factor (vWF). 5 Elevated levels of both vWF and FVIII have been associated with risk of cardiovascular disease, whereas a deficiency of vWF is what gives rise to the most commonly inherited human bleeding disorder -von Willebrand disease. 6 This agrees with an increased risk of bleeding and reduced incidence of vascular occlusion for individuals with an O blood group.
Associations have also been found with inflammatory responses such as allergies, however with somewhat conflicting results. Generally, individuals with blood group O have been suggested to have greater susceptibility to asthma [7][8][9] and allergic rhinitis, 10,11 although there are some contradictory findings. 12 In contrary, most studies seem to agree on a higher risk of developing atopic dermatitis for non-O individuals. 13,14 In a recent study, a significant difference in prevalence between blood groups was also suggested in rheumatic diseases 15 where spondyloarthropathy, vasculitis, and rheumatic arthritis were more common in patients with blood group A and systemic lupus erythematosus and systemic sclerosis were more common in patients with blood group O.
The blood groups are determined by the ABO gene located at chromosome 9. Note, ABO encodes a galactosyltransferase with a function to convert a precursor fucosyltransferase H antigen (encoded by FUT1 when expressed on red blood cells, and by FUT2 when found in secretions) 16 to a mature antigen via sugar donation. 17 The proteins responsible for this conversion are encoded by the A and B alleles, synthesizing the A and B antigen, respectively. The O allele encodes an enzymatically inactive protein product, leaving the H antigen as is (referred to as the O antigen). Individuals with blood group AB will therefore have both A and B antigens while individuals with blood group O will have neither. These three ABO alleles -A, B, and Owere long thought to be all, but since the blood grouping system started to become more systematically investigated, several suballeles and minor alleles have been found. 18 One example is the A1 and A2 sub-alleles where both encodes GTA, but with a much lower enzyme activity for ABO allele A2 than for A1. 19 Consequently, A2 individuals have a lower expression of the A antigen.
Apart from being expressed on red blood cells, the ABO antigens are also expressed on other types of cells, mostly epithelial and endothelial cells, and are therefore also termed histo-blood group antigens. 20 In approximately 80% of the population, carrying the FUT2 encoded secretor type, the ABO antigens are also secreted in bodily fluids, such as saliva, where they are residing on mucins. 16 Consequently, 20% of the population has a non-functional FUT2 gene and does not secrete ABO antigens in other bodily fluids. These blood group antigens play a role in cell-cell recognition, and thus, it is imaginable that these antigens might serve as potential receptors for either microorganisms or toxins and allergens.
However, as ABO is also expressed on vWF, it might be the link to how ABO influence plasma levels of vWF and FVIII. vWF acts as a carrier for FVIII, which localizes FVIII to the site of possible vascular injury. 21 Also, ABO has been associated to levels of inflammatory proteins in coronary artery syndrome, as well as other important factors in the coagulation cascade, such as soluble tissue factor, sTF. 22 Further on, we and others [23][24][25] have previously found ABO to be associated with a large set of putative biomarkers for inflammation and cardiovascular diseases in genome wide association studies (GWAS), including the levels of, for example, vascular endothelial growth factor receptor 2 (VEGFR-2), angiopoietin-1 receptor (TIE2), platelet endothelial cell adhesion molecule (PECAM-1) and E-selectin.
As of yet, the biological function of the blood groups is not fully understood. The vast majority of studies have focused on the blood groups A, B, AB, and O, but very little attention has been paid to the underlying dose effect that might arise between individuals of blood group A and B being homozygous (AA or BB) or heterozygous (AO or BO). Also, the heterogeneity in effects, where different ABO alleles are risk alleles for different diseases, has not been a focus for previous studies. Therefore, there is a need to systematically examine ABO genotype associations in common cardiovascular and inflammatory diseases. Here, we therefore aim to extend current knowledge on ABO blood group associations by a detailed analysis of ABO genotypes. The goal is (1) to evaluate potential differences in susceptibility to common cardiovascular and inflammatory diseases between heterozygous and homozygous individuals and (2) to link ABO associations to biological mechanisms by comparing the ABO effects between the diseases to the ABO effects on a large set of diseaserelated plasma proteins. As previous associations to inflammatory diseases and reactions have been conflicting with inconclusive results, we decided to focus on these phenotypes, with the aim to try to disentangle previous associations, using associations with strong support (such as vWF, DVT and PE) as proof of concept. We have tested for association with 17 inflammatory and seven cardiovascular diseases, as well as over 400 proteins that have previously been highlighted as potential biomarkers for different diseases. As an individual's blood group seem to play a quite substantial role in disease susceptibility, a better characterization can aid clinicians in consulting and managing patients at risk. By analyzing diseases and plasma protein levels jointly, we have the possibility to extend to biological mechanisms beyond disease status. Participants were interviewed about lifestyle and disease history via touchscreen questionnaires and verbal interviews. Genotyping has been performed in UKB using two different custom-designed microarrays, UK BiLEVE and Axiom. These contain 807 411 and 820 967 SNPs, respectively, and overlap with 95% common content. Imputation of over 90 million SNPs was performed using UK10K and 1000 genomes phase 3 as reference panels. Imputations from the third release of imputed data from UKB (accessed March 2018), was used in the current study. Before analysis, participants were filtered for genotype call rate (>95%), high heterozygosity and sex discrepancies between self-reported and genetic sex. This leaves 487 409 individuals available for analysis. Additionally, as a sensitivity analysis to minimize the effects of population stratification, unrelated participants (pairwise kinship >0.044), self-reported as of white British descent and classified as Caucasian by principal component analysis were used. After filtering, data from 361 975 individuals were available for sensitivity analysis.
We included disease status from both self-reports, and various registers. Medical conditions assessed from verbal interviews and touchscreen questionnaires were extracted from the first assessment. A detailed description of all data-fields and coding used for each disease and disease status can be found in Table S1. All participants were assigned their respective ABO genotype (Table S2), based on three ABO gene allele-defining variants: rs8176746 (Leu266Met), 17 rs8176747 (Gly268Ala), 17 and the O blood type causing deletion rs8176719 (261DelG). 26 Furthermore, to examine whether there was a difference in risk between the most common A subtypes (A1 and A2), additional regression analyses were performed in UKB after having subtyped all individuals carrying at least one A allele, into A1 and A2, using the variant rs1053878:C:T, 33 where the C allele tags A1 and the T allele A2 (Table S3). and variant calling has been described previously. 28 Briefly, whole genome sequence (WGS) data were aligned to the GRCh37 (hg19) reference genome using bwa-mem v0.7.12. 29 Raw alignments were processed according to GATK best practice 30 using GTK v3.3. Variants were called with GATK HaplotypeCaller 3.3 followed by variant quality score recalibration (VSQR). Sample quality control (QC) was performed to remove genetic outliers, identify potentially contaminated samples and samples with sex errors. After QC, 1021 unique samples with WGS data remained. ABO genotype classification was performed as in UKB.

| Northern Swedish Population Health Study
Protein levels for 460 potential biomarkers have been measured in 903 individuals using the Olink Proseek Multiplex panels (CVD II, CVD III, INF I, ONC II and NEU I, www.olink.com), as previously described. 31 These Olink panels consist of putative or already established biomarkers for disease. As many proteins have several functions and thus, can fit in several broad categories, such as both inflammatory and cardiovascular, we decided to include all proteins with available measurements in NSPHS. In total, 867 individuals passed both WGS and plasma protein QC, and 438 proteins passed QC, and were analyzed in relation to ABO genotype classification. Plasma protein samples were adjusted for batch effect and rank-transformed to be normally distributed (mean = 0, standard deviation = 1) prior to analysis.

| Statistical analyses
All statistical analyses were performed using R v3.6.3. 32 All diseases were analyzed with logistic regression, and proteins with linear regression, using the glm function, including age, sex, smoking status, body mass index (BMI), and the five first genetic principal components as covariates. For comparability reasons between the analyses, we decided to use the same regression model throughout all phenotypes, and at the same trying to avoid including colliders or adjusting for mediators affected by unmeasured or unknown factors. ABO genotypes were analyzed as a six-level factor (OO, OA, AA, OB, BB, AB) variable. First, to test whether ABO genotypes were associated with disease or plasma protein levels, we assessed their total contribution to the regression model by a likelihood ratio test. This was done by comparing a null model, including only the covariates, with the full model, also including the ABO genotype. The test statistic is given by the difference in residual deviance between the null and the full model, divided by the residual variance. Assuming no over dispersion in the logistic modeling, the residual variance was fixed to one. Only in the linear modeling of the biomarkers, the residual variance was allowed to vary. Under H0, this test statistic is chi-squared-distributed with five degrees of freedom. Correction for multiple testing was done using a false discovery rate (FDR) of < 0.05. Then, for all diseases and proteins that passed the likelihood ratio test, that is, still had a significant association after FDR correction, the ABO genotypes were tested pair-wisely in a logistic or linear regression. In these tests, a P value of < 0.01 was considered significant adjusting for five independent tests, as the remaining 10 pair-wisely tested genotypes are dependent on the first five. When the division of A into A1 and A2 was considered, we used a corrected significance threshold of p < 0.006 (0.05/9) to adjust for nine independent tests, as the remaining are dependent on the first nine. Graphs and analyses were made using the rpart, 34 ggplot2, 35 cowplot, 36 gridExtra, 37 egg, 38 and broom packages. To construct ABO genotypes, individuals were haplotyped using haplo-stats 39 in R. in compliance with the declaration of Helsinki. 40 Informed consent to the study was given by all participants, including the examination for environmental and genetic cause of disease. If a person was not of age (< 18 years), a legal guardian signed additionally.

| ABO genotypes in UK Biobank and Northern Swedish Population Health Study
For a majority of the UKB participants, 487 269 out of 487 409, we were able to resolve the haplotypes and construct ABO genotypes ( Table 1). The most common genotype was OO (43.3%), followed by AO (35.9%), BO (9.0%), AA (7.5%), AB (3.6%) and BB (0.6%). A total of 140 individuals had an undefined ABO genotype. In the sensitivity analysis, including unrelated British Caucasians, we were able to resolve the haplotypes for 361 880 out of 361 975 participants, and the most common genotype was OO (43.5%), followed by AO (37%), BO (8.0%), AA (7.8%), AB (3.3%), and BB (0.4%). Here, a total of 95 individuals had an undefined ABO genotype, due to lacking critical genotypes, and were thus excluded from further analyses (see Table S4). In NSPHS, we were able to resolve the haplotypes and construct ABO genotypes for all 867 participants ( Table 1, Table S5). Here, the most common genotype was AO (36.1%), followed by OO (32.6%), AA  (0.5%) being the most uncommon (Table 1). Due to the much smaller sample size, further subclassification into A1 and A2 was not performed in the NSPHS.

| Association of ABO genotypes on disease
In the likelihood ratio test, ABO genotypes were associated with DVT, PE, MI, type 2 diabetes (T2D) and type 1 diabetes (T1D) after adjusting for multiple testing (FDR < 0.05, Table S6). Differences (Table S7) Tables S8 and   T  S9. For T2D, homozygous OO was associated with a decreased risk.
However, when restricting the analysis to the unrelated British Caucasian, the results did not remain significant, which makes the results uncertain as it can be either stemming from an increased sample size, or the introduction of population stratification ( Figure S3A,B  Table S10). This pattern is apparent both for DVT and PE ( Figure S4).

| Association of ABO to plasma protein levels
To link identified disease associations to biological mechanisms, we performed similar analyses in relation to the level of plasma proteins, where a majority are putative biomarkers for cardiovascular and inflammatory diseases. Note, ABO contributed to the association of 39 out of 438 proteins tested ( Table 2, Table S11). Of these, 23 have never been associated to ABO before (with extended references in Table S12, and protein function descriptions in Table S13).
The effect of all covariates on the respective proteins from the main regressions can be found in Table S14. For the majority of proteins, those of blood group A tend to be associated with lower levels, compared to both blood group B and O, and blood group B proteins tend to be associated with higher levels also compared to O, in a relation most similar to AA < AO < OO < AB < BO < BB ( Figure S5,   Number of cases/controls used in the full cohort analysis (N = 487 409) after having removed individuals lacking covariate data. b Participants who had any unspecified diabetes reported in any register were removed unless they were diagnosed with a disease-specific ICD code.
Additionally, participants diagnosed with malnutrition-related, other specified and unspecified diabetes from ICD-10 were also removed. c To check for novelty and previous associations, the GWAS catalogue (accessed 2021-03-25) was searched for all associations with mapped gene as ABO.
Associations linked to protein measurements were then manually checked (via the EFO_0007937 blood protein measurement and EFO_0004747 protein measurement ontology labels). All curated trait descriptions in the catalogue are mapped to terms from the Experimental Factor Ontology, EFO. d Number of individuals with biomarker data.
developing vascular occlusion. In addition to previous studies, we were also able to discern a difference between the individual genotypes. Large effects were seen for DVT and PE, which are both linked to high coagulation activity. 42 For these diseases we could clearly see that there was a big difference between heterozygous (AO and BO) and homozygous (AA and BB) individuals. This effect was also seen between different A subtype carriers, both for homozygous (A1A1, A1A2 and A2A2) and heterozygous (A1O and A2O) A allele carriers.
Here, the effect seems to be additive, with homozygous carriers of A or B posing a larger risk than heterozygous carriers with one O allele.
The different genotype effects were all similar for DVT and PE, where the risk was lowest among OO < AO $ BO < AB$AA$BB. This suggests that the causal mechanisms for ABO on these diseases are most likely through the same pathway(s). This also agrees with the underlying disease mechanism, where part or all of the deep vein thrombus is dislodged and transported through the right side of the heart before arresting in the pulmonary vasculature causing PE. 42 A recent study by Goumidi and colleagues, assessed the risk of venous thrombosis (VT) in relation to ABO genotypes, and subclassified A into A1 and A2 and O into O1 and O2 respectively. 43 They found a difference in risk between A1 and A2, where only A1 is posing an increased risk for VT compared to O, and not A2 that only showed a trend toward a moderately increased risk. They further emphasized the need of using ABO haplotypes to accurately estimating the risk of VT attributable to ABO. We were able to replicate these results for DVT and VT, where A2 seems to be the driving allele in risk attributed to the A allele(s) ( Figure S4). Furthermore, B seems to be the driving allele in increased risk for individuals with an A2B genotype.
Several of the proteins that were associated with ABO genotypes have previously been linked to coagulation. As for DVT and PE, we found moderate differences in the levels of vWF between heterozygous and homozygous A allele carriers. The OO genotype was associated with lower levels compared to all other genotypes, which confirm previous findings and is in line with the results for the thromboembolic event. Plasma levels of TM was also found to be associated with ABO genotypes. In contrast to vWF, TM -a membrane protein,

| ABO and inflammatory diseases
We included 17 common inflammatory diseases in this study, selected based on sample size rather than previous association results, but several of the diseases have previously been suggested to be associated with ABO, with the aim of trying to unravel previous ambiguous associations. Surprisingly, ABO was only associated with T1D in our study.
There have been some conflicting results in previous literature regarding the possible association between ABO and inflammatory diseases.
The associations to allergic diseases were recently reviewed by Dahalan et al. 47 Of the studies that met their criteria, three studies found an association between the non-secretor phenotype and susceptibility to T1D. 50 However, in the same study, they did not find an association between the ABO blood groups and T1D. Most previous work point toward the direction of the association being with how and where the ABO antigens are presented and expressed, rather than the genotype itself. Why the homozygous BB genotype show an association with a higher risk of T1D in our study, is therefore still unclear. There is only one previously published association where the blood group B (and O) gave rise to significantly higher intestinal alkaline phosphatase in patients with T1D compared to controls, however the sample size was very small (83 cases and 44 controls) and has to our knowledge not been replicated. 51

| Novel protein associations
Of the 39 plasma proteins that we found to be associated to variation at the ABO locus, as many as 23

| Plasma protein association patterns and previous associations
It is important to consider that the proteins measured are soluble proteins measured in plasma, and the levels and associations to the ABO genotypes might differ in the tissue where the proteins are expressed.
It is well established that the correlation between mRNA and protein expression within a tissue often is very weak, but less is known about protein expression correlations across tissues. 53 In a recent study, with samples from the Human Protein Atlas, Wang et al. 53  there is a possibility that the changes in levels of the antigen produced gives rise to at least part of the association. As the levels of these seven proteins (CD200, CDH5, ICAM-2, LIF-R, SELE, VEGFR-2, and vWF) were altered by a risk variant for coronary heart disease, this strengthens the proposition that ABO might influence disease risk through changes in protein levels.

| CONCLUSIONS
In this study, we have shown that variation at the ABO locus is associated with common disease risk and the levels of plasma proteins. We have confirmed previous associations as well as tried to disentangle previous findings with low support. By analyzing ABO genotypes rather than blood group phenotypes, we have been able to show differences also at genotype level. We further found an association with the ABO B genotypes and T1D that has not been captured in previous genetic studies. However, as only two groups of diseases were included, namely cardiovascular and inflammatory disease, there is still a gap in assessing the associations to diseases such as infection and cancer, as these also have shown previous association to the ABO blood grouping system. In addition, one of the largest uncertainties is that without tissue measurements in the same cohort, there is no way of establishing the degree of correlation between plasma protein levels and tissue expression levels. The measurements in NSPHS have furthermore also been sampled once, and are thus only from one single point in time.
Although one expects a correlation between plasma and tissue levels, especially for biomarkers and putative biomarkers, the degree and direction of correlation is still highly uncertain.
As we did not have access to serologically determined blood groups, nor tissue samples, we were not able to confirm the expressed antigens. Furthermore, we did not investigate Rh-status and secretor