Molecular Systems Biology Peer Review Process File Quantitative Variability of 342 Plasma Proteins in a Human Twin Population Transaction Report

(Note: With the exception of the correction of typographical or spelling errors that could be a source of ambiguity, letters and reports are not edited. The original formatting of letters and referee reports may not be reflected in this compilation.) Thank you again for submitting your work to Molecular Systems Biology. We have now heard back from the three referees who agreed to evaluate your manuscript. As you will see from the reports below, the referees find the topic of your study of potential interest. They raise, however, a series of concerns on this work, which should be convincingly addressed in a revision of the study. The reviewers are in general positive about the general approach of the study. The feel however that several of the conclusions do not seem to be sufficiently supported. Without repeating all the points listed by the reviewers, the major issues refer to the following:-the conclusions on the impact of aging should be revised and considerably toned down unless corrections of potential confounding factors can be taken into account-it seems that the statistical model and the interpretation of the effects included in the model should be revised according to the comments of reviewer #2-potential biases due to a focus on high concentration range should be addressed-an analysis of the contribution of PTM or protein structural variability to the observed concentration variability would be important. This is a potentially interesting study in which the authors investigated quantitative variability in plasma proteins in a cohort consisting of 44 DZ and 72 MZ post-menopausal female twins with blood samples were drawn at two different time points starting at an average age of 57.8 years, and with a time interval between the two samplings of 5.2{plus minus} 1.4 years. The authors conclude that the data indicates that inherent variability of protein levels varies significantly for different plasma proteins, and that the regulation of specific protein levels and biological processes are under tight genetic and/or aging control. First a word of caution is needed about interpreting the findings as applicable to the entire plasma proteome given that the scope of the analysis is rather narrow compared to the vastness of the plasma proteome. It would seem that the contribution of immunodepletion as a source of variability is not sufficiently well addressed. Also it is rather puzzling that the authors chose to limit their analysis to post-menopausal women with a span …

Thank you again for submitting your work to Molecular Systems Biology. We have now heard back from the three referees who agreed to evaluate your manuscript. As you will see from the reports below, the referees find the topic of your study of potential interest. They raise, however, a series of concerns on this work, which should be convincingly addressed in a revision of the study.
The reviewers are in general positive about the general approach of the study. The feel however that several of the conclusions do not seem to be sufficiently supported. Without repeating all the points listed by the reviewers, the major issues refer to the following: -the conclusions on the impact of aging should be revised and considerably toned down unless corrections of potential confounding factors can be taken into account -it seems that the statistical model and the interpretation of the effects included in the model should be revised according to the comments of reviewer #2 -potential biases due to a focus on high concentration range should be addressed -an analysis of the contribution of PTM or protein structural variability to the observed concentration variability would be important.

--------------------------------------------------------Reviewer #1:
This is a potentially interesting study in which the authors investigated quantitative variability in plasma proteins in a cohort consisting of 44 DZ and 72 MZ post-menopausal female twins with blood samples were drawn at two different time points starting at an average age of 57.8 years, and with a time interval between the two samplings of 5.2{plus minus} 1.4 years. The authors conclude that the data indicates that inherent variability of protein levels varies significantly for different plasma proteins, and that the regulation of specific protein levels and biological processes are under tight genetic and/or aging control.
First a word of caution is needed about interpreting the findings as applicable to the entire plasma proteome given that the scope of the analysis is rather narrow compared to the vastness of the plasma proteome.
It would seem that the contribution of immunodepletion as a source of variability is not sufficiently well addressed. Also it is rather puzzling that the authors chose to limit their analysis to postmenopausal women with a span between the two time points representing only some 5 years. It is hard to appreciate how such a short span and to begin with concerning post-menopausal women would lead to significant changes in quantitative levels. One can only imagine then that a wider time span say encompassing menopause would lead to vast variations for which there is not good evidence. A comparison with a younger age group would have been informative in this regard even if it did not involve twins.
A major concern is that the study does not seem to account for structural variability in plasma protein as well as PTMs as potential sources of variability. Given the twin nature of the study these contributions to variability would be highly informative. Thus an integration of quantitative variability with genomic sequence information for the proteins investigated is likely to be very informative. We have known for decades about genetic/structural variability in plasma proteins and the significance of this study would be greatly enhanced by including an assessment of genetic variability in coding and non-coding sequences and their contribution to quantitative variability. Short of that the findings may be subject to conflicting interpretations.
Overall this study could be made more informative and more definitive by addressing the points raised in this review.

Summary
The authors set out to estimate the heritability and other contributions of ~340 human plasma proteins using a longitudinal female twin study consisting of 72 MZ pairs / 44 DZ pairs. They identify protein QTL (pQTL) associated with genes coding for some of these proteins. They draw several conclusions regarding the relative contributions of genetics, shared environment, and aging.
The authors found that a large number of plasma proteins had substantial heritability. They identified 13 cis-SNPs associated with protein levels, and attempted to highlight the potential effect of aging on protein levels.

General remarks
Overall, the experimental design is sensible, and this twin study potentially provides a unique resource to assess the contribution of genetic and non-genetic factors to the variability of protein levels. Exploring the heritability of protein levels is important as this complements existing studies of other human molecular traits such as gene expression and metabolite levels. Importantly, due to technological advances in protein quantification, this study examined quite a large number of proteins, rather than be limited to a small number of proteins as in previous studies, such as Raffler et al 2013Raffler et al (doi:10.1186. Such studies will be of interests to human genomicists and systems biologists.
However, there are substantial analytical issues for the study which are likely to affect all downstream analyses, especially those quantifying and interpreting any 'aging' effects. The limitations of the study should be very clearly discussed.
Major points -The treatment of age in this study is a major issue which is likely to affect all analyses, and certainly those concerning aging effects. Here, there are two substantial issues.
The first is the model itself. The time is encoded as follow up, i.e., 0 at first visit and the time at second visit, which makes it hard to interpret whether there is an age effect. It would be far more interpretable to add the actual age as a fixed effect. In addition, in the linear mixed model, the ID "individual" matrix appears to be identical to the W "visit" matrix. For the shared environment matrix C, this essentially captures the fact that two individuals are twins, but it's hard to ascribe this to current shared environment (i.e., twins in their 60s are not likely to be living in the same house, for example), so the simple interpretation of this as "shared environment" needs to be a bit more nuanced. Also, it would make sense to make the "plate" effect a random effect rather than a fixed effect.
The second issue is that it is not possible to conclude from any time-varying effect that it is the result of aging and not other confounding. For example, it may be that older individuals tend to take more medication that in turn affect protein levels, or a myriad of other changes in diet, behavior or environment. This study does not attempt to control for any such potential confounders in the analysis, hence claiming that this is an age effect is premature.
A more realistic interpretation of the findings as well as an in-depth discussion of potential confounders and the limitations of the study is required, however, in the absence of appropriate control variables, this reviewer would prefer that all reference to aging effects as such be discarded.
-This study is rather small, a total of 131 individuals measured in two timepoints. It is likely that the variance components are estimated with substantial uncertainty which has not been reported in the study. Reporting the confidence interval for the estimates would allow assessment of the reliability of such estimates. Further, limiting the study to females based on the argument of excluding the gender variance seems heavy handed. A doubling of the sample size would be well worth adding another fixed effect "sex" to the model, in terms of power and reducing the uncertainty in the estimates.
- Figure 4C, 4D, 4E, what test are the p-values from? If it is a t-test, this would not be appropriate for comparing variances (heritabilities) as they are highly skewed and truncated at zero.
-The authors should use the Introduction to compare and contrast with Kato et al since it had a similar longitudinal proteome study design also using TwinsUK (Kato et al, 2011 Minor points -Within DZ twins, it would be informative to note whether there was a relationship between identity-by-descent and proteome similarity. -Introduction (pg2) "Genomic variation, modulated by lifestyle and environment, orchestrates the extensive phenotypic variability found in human populations." I think what the authors are trying to say is "The effects of genomic variation, modulated by lifestyle and environment, orchestrate the extensive phenotypic variability found in human populations." -Introduction (pg2) "The quantification of heritability, i.e. the proportion of phenotypic variance attributable to additive genetic effects..." It should be clarified that the authors are referring to "narrow-sense heritability". 'Heritability' sometimes refers to broad-sense heritability, the full genetic component (inclusion of non-additive effects as well).
-Results (pg 7) "This discrepancy may be mainly ascribed to the much shorter temporal intervals of sampling used in their study (around 3 months), indicating that the natural aging process during the ~5 year period tested in the present study uncovered a profound impact of aging on plasma proteomic dynamics." This statement seems speculative in the absence of literature support, especially given very few time points (and potentially different samples). Further, if the major issues with regards to aging analyses were addressed, the authors could indeed test this in the data (phenotypic variance as a function of time interval between visits, though the time fixed effect will have to change).
- Figure 5A: The y-axis of the Manhattan plot should indicate that it is a P value. An FDR of 0.1 was used as cutoff, the corresponding p-value should be annotated on the Manhattan plot. How many SNPs were considered? -Results (pg 11): A more thorough description of the '800 female twins' data is warranted.
-Discussion (pg 14): The paragraph beginning 'Understanding the underlying genetic determinants of biomarkers...' would benefit from less certain language. In particular, the simple logic that a genetic variant associated with a causal biomarker of a disease is itself necessarily a biomarker of the disease is false for the simple reason that there exist components of variance (e.g. the mixed model). Conversely, the discussion about how analysis of a non-causal biomarker like CRP would benefit from identification of pQTLs is unclear; it seems that the authors are trying to convey the concept of Mendelian randomisation. In terms of examples, it would be informative to also present cases where the authors' logic statements have not held.
-Methods (pg 20): What was the QC prior to imputation? Was there individual-level QC (missingness, outlier removal, etc) -Methods (pg 21) "e is the residual effect" -it's not an "effect", it's just the residuals -Methods (pg 22): It is not clear why the authors regress out 10 PCs of the protein levels if they have a linear mixed model already. This is apparently unnecessary as they can just take residuals or run a similar mixed model with protein levels as a function of the SNPs as a fixed effect.
-Please provide a better description of what is means to be a an "FDA cleared protein".
Spelling and grammar checks are needed. Some typos would include: P21 "Tipe points", "familiar effect" P11, "know genes" P12 "familiar component" P13 "committee" -> community? P13 "know to be causal" P20 "It is know that variance" ... "to non-normal" P21 "familiar effect" Reviewer #3: This is an elegant study concerning a highly multiplexed proteomic analysis of a unique cohort of plasma samples obtained from a monozygotic and dizygotic twin population taken at different times (~5 years apart). Such a cohort provides the powerful opportunity to begin to elucidate the significance of genetic vs aging vs diet/environment, etc. The analysis is the largest most comprehensive proteomic analysis to date for such a study. The investigators are thought leaders in the field and many of the methods pioneered by them. While the manuscript is very well written and will be of high interest to the readership, there are several opportunities missed that could have increased the impact of the paper.

Major Issues
1. I'm not sure the CAP oriented analysis is useful or if any of the data is meaningful and certainly the conclusions would need to be so circumspect that I don't think this adds to the paper: basically sampling 99 proteins out of thousands is of such low coverage (< 1%) that its impossible to conclude anything. Compounding this is that the abundance levels of the proteins chosen are fairly high abundance compared to many clinically used cancer markers/markers important in cancer biology today.
2. Following, the investigators really need to include a list of the concentrations of the proteins studied. An examination of the supplemental data make me concerned that most of the content is high abundance proteins with few in the microgram/ml range and very few in the ng/ml range. It is estimated that 90% of the plasma proteome exist in the sub fg/ml range. Thus there appears to be a huge percentage of the plasma proteome (especially in the low abundance range, which presumably contain the most interesting component) that was really unstudied here. IT would be very helpful to the reader to judge the data by having the average estimated concentration of the proteins enumerated in the supplemental table.
3. Following still-if the investigators could provide data that addresses if there is a concentration range bias in the study that could influence the final conclusions. For example, are those proteins that have the lowest concentration the most variable? most associated with environment changes? Same would go for the other categories. Some understanding of the percent of the proteome in each category that were in the mg/ml range, the microgram/ml range, nanogram/ml, femtogram/ml range etc. would be helpful.
A minor point: there are no FDA cleared proteins. there are FDA cleared biomarkers for diagnostic/prognostic purposes. As the reviewers of course know-the proteins exist in nature, not because the FDA approved them. Please consider renaming this cohort. Something like Protein biomarkers whose measurement has been approved by the FDA for clinical purposes. Reviewer #1: This is a potentially interesting study in which the authors investigated quantitative variability in plasma proteins in a cohort consisting of 44 DZ and 72 MZ post-menopausal female twins with blood samples were drawn at two different time points starting at an average age of 57.8 years, and with a time interval between the two samplings of 5.2(plus minus) 1.4 years. The authors conclude that the data indicates that inherent variability of protein levels varies significantly for different plasma proteins, and that the regulation of specific protein levels and biological processes are under tight genetic and/or aging control.
First a word of caution is needed about interpreting the findings as applicable to the entire plasma proteome given that the scope of the analysis is rather narrow compared to the vastness of the plasma proteome. To address the concern of the reviewer and to avoid the impression that we claim analysis of the full plasma proteome, in the revised manuscript, we first tuned down the respective expressions in the article and we also changed the title from "Quantitative variability of the plasma proteome in a human twin population" to "Quantitative variability of 342 plasma proteins in a human twin population".
It would seem that the contribution of immunodepletion as a source of variability is not sufficiently well addressed.
Author Reply: We share the concern about variability potentially introduced by immune depletion of high abundance plasma proteins. Precisely for this reason we did NOT deplete the plasma samples for SWATH analyses. This is described in the method of "SWATH-MS measurement" (Please see Page 21 Para. 2 and at a small, separated paragraph of the Discussion (Page 17 Para. 3).
We did use immunodepletion and SAX fractionation only for the shotgun experiments (i.e., the step for the generation of the spectral library). Here, all the matched ions in MS/MS spectrum are not affected by immunodepletion as they are confidently identified as peptides by the conventional shotgun proteomics.
We also add some further discussion related to the issue of depletion. We also mention the recent paper of Dayon et al. published at Journal of Proteome Research (Dayon et al, 2014) where the authors assessed the result of the variability of depletion and buffer exchange on a robotic platform (See Page 17 Para. 3: "For future comparative studies, the relative variability derived from this study for certain plasma proteins (e.g., those interacting with Albumin) might need to be adjusted by factoring in the technical variations, e.g., those from immunodepletion (Dayon et al, 2014) or protein isolation steps, if indeed these steps are used") Also it is rather puzzling that the authors chose to limit their analysis to post-menopausal women with a span between the two time points representing only some 5 years. It is hard to appreciate how such a short span and to begin with concerning post-menopausal women would lead to significant changes in quantitative levels. One can only imagine then that a wider time span say encompassing menopause would lead to vast variations for which there is not good evidence. A comparison with a younger age group would have been informative in this regard even if it did not involve twins.
Author Reply: Thanks for this comment. In the following we address the comment first by discussing the range of age and secondly the five-year age span investigated and then by improving the analysis of ageing/longitudinal sampling.
1. The range of age investigated (~17% of the twins encompassed menopause during the two visits, as they are younger than others).
• For the concern that all the twins are post-menopausal, we would like to firstly note that, the age range of the female twins is actually rather broad. To make this point clearer we added the new Figure S1  . Though majority of these female twins are post-menopausal, or some of their plasma protein concentrations might be affected by menopause, they are anyway the best controls available as an "age-matched" group for the biomarker discovery studies of these diseases. In fact we suggest that, for studies aiming at discovering predictive or retrospective biomarkers, or to monitor the progress of the above-mentioned diseases, it is crucial to focus on this age range. Clearly, understanding the biological variability (as presented in this work) will be very beneficial for understanding the quantified plasma protein profiles in the biomarker studies. The degree of plasma protein variability so far has been essentially unknown.  (Kato et al, 2011) where a time span of 3 months were used. Also, specific, clinically used protein analytes ( Figure 4G), such as BTD or THBG, were found to be significantly altered throughout this five year longitudinal process. Their plasma levels were preferably affected by a longitudinal effect (>30%) compared to the heritability and environmental components.
• Most importantly, the judgment about whether the longitudinal span in a biomarker discovery study (which our study aims to support) is too short highly depends on the disease targeted. For a slowly progressing disease such as prostate cancer, 5 years might be not enough to reveal the whole disease process. However, for many other general diseases, such as diabetes, Alzheimer's disease and many cancer types, a span of 5 years is clinically and medically relevant. For example, for ovarian cancer, the serum concentrations of CA125, HE4, and mesothelin were reported to provide evidence of cancer 3 years before clinical diagnosis . Also for Diabetic nephropathy (a progressive kidney disease), the urinary levels of collagen fragments were demonstrated as prominent biomarkers 3-5 years before onset of macro albuminuria (Zurbig et al, 2012). Moreover, in clinics the "5 year survival rate" is a routine predictor for cancer patient's survival after surgery. Therefore, we suggest that a span of ~5-year is clinically relevant and a suitable "predictive window" for many diseases. We add the concise discussion in Page 16 end of Para. 3.
•  (Mitchell et al, 2005). This is consistent with our observation. However, the longer-term stability of many molecules is still poorly understood and Gillio-Meina et al. expressed the caution of extreme long-term sample procurement in a recent review (Gillio-Meina et al, 2013). In addition to pre-analytical issues, there are also biological confounding factors in longer-term analysis (e.g., the number of medications taken along with ageing, as communicated with reviewer2). To address these issues we have added discussion text in Page 16 from the last line.

The improved analysis of ageing/longitudinal sampling in the revised version.
We also choose to thoroughly analyze the effect of the absolute age of twin individuals in our current data set. Specifically, we added the numbers of actual age as a fixed effect in our variance model of two visits respectively. This analysis leads to the new Supplementary Figure S7, which demonstrates that the actual age (38-78) has a minimal effect in the variance dissection, accounting for 0-1% of the quantitative variability for most plasma proteins (Supplementary Figure S7-C, D). Moreover, the heritability of the proteins inferred with and without the actual age component are excellently correlated (R=0.99 for both visits, Figure S7-E, F). Therefore, the relative ageing process (or more accurately, the longitudinal process including all the temporally variable factors) seems to be much more important than the actual age  in explaining the total phenotypic and biological variance in plasma proteome. Please see corresponding results added in Page 12 middle of Para. 1. With this result, we believe our analyses of ageing/longitudinal changes are significantly improved in the revision.
A major concern is that the study does not seem to account for structural variability in plasma protein as well as PTMs as potential sources of variability. Given the twin nature of the study these contributions to variability would be highly informative. Thus an integration of quantitative variability with genomic sequence information for the proteins investigated is likely to be very informative. We have known for decades about genetic/structural variability in plasma proteins and the significance of this study would be greatly enhanced by including an assessment of genetic variability in coding and non-coding sequences and their contribution to quantitative variability. Short of that the findings may be subject to conflicting interpretations.
Author Reply: We understand the reviewer's concern and agree that checking the variability regarding PTM and protein structures will provide useful information. Therefore in the revised paper, we carefully address this comment from technical and biological perspectives.  Figure S6), we found that the proteins with different modification potential in the human proteome seem to have diverse regulation dependency on genetics or longitudinal factors. Those proteins annotated as "Glycoproteins" ( Figure 4D) and with "disulfide bond" are more strongly regulated by genetics and less strongly affected by longitudinal factors, while the proteins with "Phosphoprotein" or "Acetylation" annotations show reverse trend of the regulation (Supplementary Figure S6 A-D, P<0.01 or P<0.05). The shared regulation trends are partially ascribed to the overlapping proteins between lists, i.e., "Glycoproteins" list is highly overlapping with "disulfide bond" proteins whereas the "Phosphoproteins" list is highly overlapping with the list of "acetylated" proteins (Supplementary Figure S6 E). These data are now added in the sections indicated above. Please also see Page 11 Para.2 for the relevant result. Figure S6 F Figure S6F) than other Pfam domains. Considering the fact that V-set domains are mainly found in diverse immunoglobulin light and heavy chains and in several T-cell receptors in human blood, we suggest that this trend might be related to the protein functions (rather than the domain itself), as the functional cluster of "immune response" are revealed to have a high degree of heritability ( Figure 4A).

For Pfam analyses (Supplementary
With the discussions above and the consideration that we did not directly measure the protein modification and structure in the current study, to address the issue raised by the reviewer we added a small, conservative paragraph for the biological perspective at Page 11 middle of Para 2. Specifically, the new text reads: "Furthermore, considering the proteins overlapping annotation classes and the fact that we did not directly measure any protein modification and structure in the current study, we suggest that further direct studies are crucial to conclude whether different protein modifications or structures indeed harbor diverse genetic or longitudinal regulation dependency." 3. Genetic variability on plasma protein concentration. The association of genetic variability with plasma protein levels was already analyzed by our pQTL analysis (the association of SNPs at the genome with protein levels, see Supplementary Table S4 and Figure 5). We observed that most of the discovered pQTLs lie in regulatory regions (11 of the pQTL associations are at 5-upstream, 3 are at the intronic region) and only 2 were in the coding region, but synonymous. This result is also briefly mentioned in Page 13 end of Para. 2.
Overall this study could be made more informative and more definitive by addressing the points raised in this review.

Summary
The authors set out to estimate the heritability and other contributions of ~340 human plasma proteins using a longitudinal female twin study consisting of 72 MZ pairs / 44 DZ pairs. They identify protein QTL (pQTL) associated with genes coding for some of these proteins. They draw several conclusions regarding the relative contributions of genetics, shared environment, and aging.
The authors found that a large number of plasma proteins had substantial heritability. They identified 13 cis-SNPs associated with protein levels, and attempted to highlight the potential effect of aging on protein levels.

General remarks
Overall, the experimental design is sensible, and this twin study potentially provides a unique resource to assess the contribution of genetic and non-genetic factors to the variability of protein levels. Exploring the heritability of protein levels is important as this complements existing studies of other human molecular traits such as gene expression and metabolite levels. Importantly, due to technological advances in protein quantification, this study examined quite a large number of proteins, rather than be limited to a small number of proteins as in previous studies, such as Raffler et al 2013 (doi:10.1186/gm417). Such studies will be of interests to human genomicists and systems biologists.
However, there are substantial analytical issues for the study which are likely to affect all downstream analyses, especially those quantifying and interpreting any 'aging' effects. The limitations of the study should be very clearly discussed.
Major points -The treatment of age in this study is a major issue which is likely to affect all analyses, and certainly those concerning aging effects. Here, there are two substantial issues.
The first is the model itself. The time is encoded as follow up, i.e., 0 at first visit and the time at second visit, which makes it hard to interpret whether there is an age effect. It would be far more interpretable to add the actual age as a fixed effect. In addition, in the linear mixed model, the ID "individual" matrix appears to be identical to the W "visit" matrix. For the shared environment matrix C, this essentially captures the fact that two individuals are twins, but it's hard to ascribe this to current shared environment (i.e., twins in their 60s are not likely to be living in the same house, for example), so the simple interpretation of this as "shared environment" needs to be a bit more nuanced. Also, it would make sense to make the "plate" effect a random effect rather than a fixed effect.
Author Reply: We appreciate the comments from the reviewer and we explain better why we decided to use the model we used and the changes implemented to address the issues raised. The second issue is that it is not possible to conclude from any time-varying effect that it is the result of aging and not other confounding. For example, it may be that older individuals tend to take more medication that in turn affect protein levels, or a myriad of other changes in diet, behavior or environment. This study does not attempt to control for any such potential confounders in the analysis, hence claiming that this is an age effect is premature. A more realistic interpretation of the findings as well as an in-depth discussion of potential confounders and the limitations of the study is required, however, in the absence of appropriate control variables, this reviewer would prefer that all reference to aging effects as such be discarded.
Author reply: We agree with the statement of the reviewer. Aging is a complex process that involves biological and sociological aspects. Since is not possible to take into account all the factors that change in time, we decided to follow the reviewer's advice and we changed the term aging for "longitudinal change" and we generally tone down the conclusions regarding impact of "aging" to the summary of "longitudinal or temporal effects".
Moreover, we looked at the information that we could obtain from the samples to see if there were distinctive changes between visit time points (TP1 and TP2) that could be considered confounding effects. We found several individuals suffering from diabetes type 2, but all of were already diabetic at TP1. We also found several individuals with cancer, but only in two cases did the cancer occur in the interval between TP1 and TP2. The situation regarding the intake of medicines it is more complicated and there is a higher degree of variability. It is difficult to draw a firm conclusion, but we cannot discard the possibility that intake of some medicines affected proteins levels between the two time points. We added this information as supplementary -This study is rather small, a total of 131 individuals measured in two time points. It is likely that the variance components are estimated with substantial uncertainty which has not been reported in the study. Reporting the confidence interval for the estimates would allow assessment of the reliability of such estimates.
Author reply: Given the sample size of our study, the estimates of heritability are not very precise. Following the reviewer's comment, we now added the confidence intervals to our heritability estimates (Supplementary Table S3).
-Further, limiting the study to females based on the argument of excluding the gender variance seems heavy handed. A doubling of the sample size would be well worth adding another fixed effect "sex" to the model, in terms of power and reducing the uncertainty in the estimates.
Author reply: As a general principle, we agree with the reviewer that it would be great to double the sample size and to include males to determine gender effects. However, availability of relevant samples is not always possible. Quantifying the plasma proteome in more than 200 samples is already a very significant effort that took several months to accomplish and is in fact, at this level of depth and reproducibility unprecedented in the proteomics literature. We like to highlight the value of the magnitude of this study in the proteomics field. Proteins predominantly determine the biochemical state of biological specimens and proteomic variation is therefore thought to be closely associated with phenotypic variation and adding a complementary component to the corresponding, nucleic acid-based indicators (Wu et al, 2013) and plasma protein levels cannot be inferred from transcript studies. Extending this comment, it would be ideal to have a longitudinal study with multiple sampling time point across the human life span as we also discussed. However, we are limited by the current capacity of sample throughput for the comprehensive large-scale proteomic analysis (considering the steps from reproducible sample preparation to mass spectrometry analysis and time consumed). In fact we would hope it can be appreciated that our study is the to date most comprehensive twin plasma proteomic study with the highest number of twin individuals investigated based on our sensitive, reproducible SWATH-MS technique and the unique, wellcontrolled experimental design. This is a significant achievement in proteomic field. Using women only in the study can limit the generalization of the conclusions for the general population but it still allows us to learn some biology. For example, the previous gene expression and methylation studies of twinUK with higher sample throughput (800-1000 twin individuals analyzed due to the high throughput of the genomic methods) also exclusively used the female individuals of the identical age range (Grundberg et al, 2012)(Buil et al, in press) (Glass et al, 2013) . We added some of the relevant discussions at Page 16 end of Para. 2. Finally, in a very recent study (Enroth et al, 2014) using ~1000 non-twin individuals, only four plasma proteins (among 92 tested) were identified as sex associated. This could also indicates that much more samples might be needed to extend our variability results to male twins to uncover plasma proteins associated with gender.
- Figure 4C, 4D, 4E, what test are the p-values from? If it is a t-test, this would not be appropriate for comparing variances (heritabilities) as they are highly skewed and truncated at zero.
Author reply: p-values in Figure 4 come from a t-test. Since we agree with the reviewer's comment, we recalculated the p-values using the non-parametric Wilcoxon rank sum test and report the new p-values. We clarify it in Methods (Page 27 Para. 3).
-The authors should use the Introduction to compare and contrast with Kato et al since it had a similar longitudinal proteome study design also using TwinsUK (Kato et al, 2011 To investigate if the fact that we do not find the pQTLs detected by Johannson is a matter of power we looked at the p-values of the association of Johanson's pQTLs in our study and looked if there is an enrichment for small p-values. The expectation is that if the Johannson pQTLs are not present in our sample, the distribution of p-values would be flat (a uniform 0,1). However, if the Johanson's pQTLs are present in our sample even though they may not be statistically significant, we should see an enrichment of small p-values in our sample. We estimated the mean of the p-value of the Johannson's pQTL associations in our sample and compared this mean with the distribution of means expected if there were no signal in our sample. Supplementary Figure S10 shows the expected distribution of means under the null hypothesis (no signal of Johannson's pQTLs in our sample) and the actual observed mean. From there we can calculate a p-value (0.0035) that supports the idea that, with a larger sample, we would find some of the pQTLs described by Johannson et al. Finally, the totally distinct sample cohort may be a factor for this discrepancy between pQTL studies. We added this information and the figure to the manuscript (Page 14, Para. 2).
Minor points -Within DZ twins, it would be informative to note whether there was a relationship between identity-by-descent and proteome similarity.
Author reply: We thought about introducing a new random effect in the model accounting for IBD similarity so that we would have an estimate of the amount of variance explained by genetic variants in cis. However, we understand that the model as is already has many random effects and with the small sample size that we have that test would be completely underpowered.
-Introduction (pg2) "Genomic variation, modulated by lifestyle and environment, orchestrates the extensive phenotypic variability found in human populations." I think what the authors are trying to say is "The effects of genomic variation, modulated by lifestyle and environment, orchestrate the extensive phenotypic variability found in human populations." Author reply: We appreciate the comment and changed the text accordingly.
-Introduction (pg2) "The quantification of heritability, i.e. the proportion of phenotypic variance attributable to additive genetic effects..." It should be clarified that the authors are referring to "narrow-sense heritability". 'Heritability' sometimes refers to broad-sense heritability, the full genetic component (inclusion of non-additive effects as well).
Author reply: We appreciate the comment and changed the text accordingly.
-Results (pg 7) "This discrepancy may be mainly ascribed to the much shorter temporal intervals of sampling used in their study (around 3 months), indicating that the natural aging process during the ~5 year period tested in the present study uncovered a profound impact of aging on plasma proteomic dynamics." This statement seems speculative in the absence of literature support, especially given very few time points (and potentially different samples). Further, if the major issues with regards to aging analyses were addressed, the authors could indeed test this in the data (phenotypic variance as a function of time interval between visits, though the time fixed effect will have to change).
Author reply: We agree with the reviewer that we cannot confirm that the differences observed between the two studies are due to the 'natural aging process'. As we discuss in the second point of reviewer 2, we are measuring 'changes in time' and not necessarily aging effects. So, it is expected that in 5 year will be more changes in the proteome of an individual than in only 3 months. However we cannot assure that these changes are only due to the natural aging process. We rewrote this sentence in the manuscript to make it more accurate (Page 8 Para 3 Line 5) and again, we mentioned it in discussion (Page 17 Para 1 Line 6).
- Figure 5A: The y-axis of the Manhattan plot should indicate that it is a P value. An FDR of 0.1 was used as cutoff, the corresponding p-value should be annotated on the Manhattan plot. How many SNPs were considered?
Author reply: We changed the y axis in the plot. We added to the plot caption cutoff pvalue = 6.166E-3. We added to the manuscript that the final number of tested SNPs was 758.
Results (pg 11): A more thorough description of the '800 female twins' data is warranted.
Author reply: Our paper describing these data and the eQTL analysis has been recently accepted and it is now in press (Buil et al, in press). We hope the citation will be available at the end of the editorial process of this paper. We add a brief statement about this study in the result (Page 14, Para. 1) and in Page 27 Para. 2 (Method section).
-Discussion (pg 14): The paragraph beginning 'Understanding the underlying genetic determinants of biomarkers...' would benefit from less certain language. In particular, the simple logic that a genetic variant associated with a causal biomarker of a disease is itself necessarily a biomarker of the disease is false for the simple reason that there exist components of variance (e.g. the mixed model). Conversely, the discussion about how analysis of a non-causal biomarker like CRP would benefit from identification of pQTLs is unclear; it seems that the authors are trying to convey the concept of Mendelian randomisation. In terms of examples, it would be informative to also present cases where the authors' logic statements have not held.

Author reply: We appreciate the comment and changed them to the ranges of p-values. We made it clear in the manuscript.
Please provide a better description of what is means to be a an "FDA cleared protein".
Author reply: We appreciate the comment. In fact reviewer#3 has a similar suggestion to change this term. According to the original, highly cited paper (Anderson, 2010) which we also cited upon usage, these proteins are the biomarker analytes the measurement of which has been approved by US Food and Drug Administration (FDA) for clinical purpose. We now use the new short name of "clinically assayed protein" instead of "FDA cleared proteins" to avoid confusion. We also refined the description at the first usage in Page 5 Line 5 from bottom of Para. 2.

Reviewer #3:
This is an elegant study concerning a highly multiplexed proteomic analysis of a unique cohort of plasma samples obtained from a monozygotic and dizygotic twin population taken at different times (~5 years apart). Such a cohort provides the powerful opportunity to begin to elucidate the significance of genetic vs aging vs diet/environment, etc. The analysis is the largest most comprehensive proteomic analysis to date for such a study. The investigators are thought leaders in the field and many of the methods pioneered by them. While the manuscript is very well written and will be of high interest to the readership, there are several opportunities missed that could have increased the impact of the paper.

Major Issues
1. I'm not sure the CAP oriented analysis is useful or if any of the data is meaningful and certainly the conclusions would need to be so circumspect that I don't think this adds to the paper: basically sampling 99 proteins out of thousands is of such low coverage (< 1%) that its impossible to conclude anything. Compounding this is that the abundance levels of the proteins chosen are fairly high abundance compared to many clinically used cancer markers/markers important in cancer biology today.
Author Reply: We are sure that here is a typo of "<10%" (as we detected 99 CAPs among ~1000 proteins). But we think the reviewer's comment is very fair, because of the low coverage of CAP proteins and the fact that perhaps cancer biology is also difficult to be revealed by the "blood window" with limited analytical depth. Therefore we choose to delete the variance analysis of CAP proteins and only mention it as a list for uncovering SWATH-MS analytical depth.
2. Following, the investigators really need to include a list of the concentrations of the proteins studied. An examination of the supplemental data make me concerned that most of the content is high abundance proteins with few in the microgram/ml range and very few in the ng/ml range. It is estimated that 90% of the plasma proteome exist in the sub fg/ml range. Thus there appears to be a huge percentage of the plasma proteome (especially in the low abundance range, which presumably contain the most interesting component) that was really unstudied here. IT would be very helpful to the reader to judge the data by having the average estimated concentration of the proteins enumerated in the supplemental table.
Author Reply: We agree with this comment.
Firstly, although our method promisingly detects on average 425 plasma proteins per sample there are many other proteins in plasma of which most lie in the lower abundance range. Today, there is Mass Spectrometry (MS) evidence of ~2000 human plasma proteins according to PeptideAtlas where the results of many and very large mass spectrometry studies are combined. Therefore, in the revised manuscript, we changed the title from "Quantitative variability of the plasma proteome in a human twin population" to "Quantitative variability of 342 plasma proteins in a human twin population".
Secondly, following reviewer's advice, we now provide the estimated concentrations of human plasma proteins reported from PeptideAtlas (Farrah et al, 2011) (Huttenhain et al, 2012) in the final column of Supplementary Table S3. The concentration distribution of proteins is plotted in Supplementary Figure 2A.
3. Following still-if the investigators could provide data that addresses if there is a concentration range bias in the study that could influence the final conclusions. For example, are those proteins that have the lowest concentration the most variable? most associated with environment changes? Same would go for the other categories. Some understanding of the percent of the proteome in each category that were in the mg/ml range, the microgram/ml range, nanogram/ml, femtogram/ml range etc. would be helpful.
Author Reply: This is a very good suggestion. In the previous version of the paper we only looked at the possible concentration bias on heritability. Following the advice here, we now add the Supplementary Figure S5 and revised the Figure 4B. We find these analyses to be very informative and even provide more biological insights. Specifically, in Supplementary Figure S5, we plot the protein variability and the determined contribution from heritability, shared environment, individual environment and longitudinal factors to the total variance and the total biological variance. Figure  S5A indicates that most of low abundant proteins are indeed more variable, as the reviewer and we expected; however, there are a few high abundant proteins which are also highly variable, which make the P value of the correlation insignificant. Even more interestingly, as illustrated in Figure  5SB-E, the heritability, shared environment and individual environment show no significant correlation with plasma concentrations, while the longitudinal factors clearly showed a difference, suggesting the concentration variability of more abundant proteins are generally less effected by longitudinal factors such as ageing. This is quite an interesting and novel finding. Please note that the relative contributions of the four biological components in total biological variance are inferred by removing the "unexplained fraction" (that means, the technical and experimental bias associated with specific proteins, e.g., concentration ranges, are expected to be removed). Additionally, Figure  4B is just the extraction of heritability and longitudinal contributions of the total biological variance.
As we detected few proteins below the nanogram/mL range, we used a dashed, red line to indicate the two concentration ranges of microgram/mL and nanogram/mL according to the reviewer's suggestion ( Figure 4B & Figure S5). From this we can also clearly see the trend of lower contribution of longitudinal effect on plasma proteome variability. We believe the P value of the Pearson correlation is more informative than the boxplot in this case, because we have the different concentration information for individual proteins.
The corresponding result description can be found in a new paragraph (Page 10 Para. 2).
A minor point: there are no FDA cleared proteins. there are FDA cleared biomarkers for diagnostic/prognostic purposes. As the reviewers of course know-the proteins exist in nature, not because the FDA approved them. Please consider renaming this cohort. Something like Protein biomarkers whose measurement has been approved by the FDA for clinical purposes.
Author reply: We appreciate the comment. We now changed this term by following the reviewer's advice when we mention it (Page 5 Line 5 from bottom of Para. 2). For the short term that are used in the Figures, we now change "FDA cleared proteins" to "clinically assayed proteins".
Mitchell BL, Yasui Y, Li CI, Fitzpatrick AL, Lampe PD (2005) Impact of freeze-thaw cycles and storage time on plasma samples used in mass spectrometry based biomarker discovery projects. Thank you again for submitting your work to Molecular Systems Biology. We have now heard back from the three referees who accepted to evaluate the revised study. As you will see, the referees are now supportive and we are pleased to inform you that your manuscript will be accepted for publication pending the following final amendments: -The reviewers raise a few minor issues and makes suggestions that we would kindly ask you to address with suitable edits.
-Please remove the password protection on the data and update the text of the Data availability section accordingly.
-We would suggest to move/copy the short statement on the accessibility to the genotypic data through TREC under the "Data availability" section. The authors have largely addressed my concerns, however I have a few additional comments: -While the authors have generally toned down conclusions around aging, there are places in the text which are not consistent. In particular: (1) pg9's 2nd paragraph where there are still multiple references to 'aging' that should be modified, as the authors advise, to longitudinal changes/effects (2) Figure 4E & 4G -The 95% CIs for the heritability estimates need to be included in Table 1 as well as Table S3.
-On pg8, the authors should provide a more informative comparison by making clear to the reader that Kato et al, the only previous similar twin study, also utilized TwinsUK and also only female twins.
Signed: Michael Inouye Gad Abraham