Biomarker development for translational geroscience: Considerations for a strategic framework focusing on early clinical development

The expectation that geroscience will allow humans to age optimally continues to generate excitement among a growing community of professionals and nonprofessionals, including high profile personalities in business. But the stakes are high, and if expectations grow from fanciful ambition rather than data, failure will lead to bewilderment, disillusionment, and dampened public support. The core mission of geroscience is to improve healthspan by targeting mechanisms associated with biological aging; whether this is possible in humans remains at present a hypothesis. Geroscience clinical trials will be the experiments that provide the strongest data to pressure test the geroscience hypothesis. While these trials can be described simply, adding to their public appeal, those of us involved in the conception of trial design increasingly recognize that unless geroscience clinical trials are conducted with rigor like traditional diseasebased clinical trials, the scientific community, and, perhaps more importantly, regulatory agencies, will lose their enthusiasm and support for gerosciencebased intervention that might have merited development. In this editorial, I briefly describe some methodological complexities in geroscience clinical trial design and argue for a strategic approach to derisk decisions which must be made during early clinical development. Among individuals who are, say, 55– 80 years old, who may be most likely to enroll in a geroscience clinical trial, hard clinical outcomes such as death and incident cardiovascular disease, cancer, and dementia, tend to occur at low rates. Subsequently, geroscience clinical trials will require, on average, longer treatment periods than typical diseasebased clinical trials to accumulate enough hard events to achieve adequate statistical power. The theorized pleiotropic effect of geroscience interventions (i.e., common mechanisms drive multiple morbid outcomes) could mean that effect sizes on any specific outcome may be modest, making it difficult to predict which ageassociated outcomes will benefit from treatment. This contrasts with a traditional diseasebased trial, where outcomes typically occur at higher rates and interventions often hinge on narrowly defined targets, making trials more efficient. Trialists may find it appealing to develop composite endpoints, which are defined by regulatory agencies as the occurrence or realization in a patient of any one of a set of specified components and measured as the time to developing the first event of any one occurrence within the composite endpoint (FDA, 2022). Composite endpoints lend themselves well to interventions aimed at preventing or delaying morbidity, a key goal of geroscience, and often have face validity when components are hard clinical outcomes, such as incident cardiovascular disease, cancer, dementia, or death. Composite endpoints are useful when each of the component endpoints occurs at a low rate; by increasing the overall event rate of the composite endpoint, trial size and duration may be reduced. Statistical adjustment for multiplicity (as is necessary with coprimary endpoints) is unwarranted, and therefore, a single statistical test can be performed on the composite endpoint. Seemingly, composite endpoints provide the rigor that geroscience trials need while preserving efficiency. However, composite endpoints can be tricky. We must decide which components to include in the composite, a particular problem for geroscience where there is little agreement on the most significant endpoints (e.g., molecular biomarkers of accelerated aging, worse tissue pathology, incident diagnosed disease, functional decline, and death). By default, each component is given equal statistical weight, though each component may differ in clinical impact. Furthermore, a significant therapeutic effect on the overall composite will mask the effect on each component— if this determination is necessary, it should be prespecified with adjustments for multiplicity, which can impact trial efficiency. More troubling, a positive treatment effect could be found with the overall composite endpoint despite some component(s) potentially being adversely affected, obscuring a potential harm signal. Because gerosciencebased interventions are intended to have pleiotropic effects, there may be multiple adverse events of special interest that require detection across organ systems. An alternative to a composite endpoint is a multicomponent endpoint, which is a withinsubject combination of two or more components, tabulated according to prespecified rules, such as a summary scale (e.g., frailty index, fatiguability, and physiologic index of comorbidity) (FDA, 2022; Glynn et al., 2015; Newman et al., 2008). A treatment effect depends on altering anywhere from one to all components. If all components trend in the same direction within a patient in response to treatment, it can suggest a positive effect.

alities in business. But the stakes are high, and if expectations grow from fanciful ambition rather than data, failure will lead to bewilderment, disillusionment, and dampened public support. The core mission of geroscience is to improve healthspan by targeting mechanisms associated with biological aging; whether this is possible in humans remains at present a hypothesis.
Geroscience clinical trials will be the experiments that provide the strongest data to pressure test the geroscience hypothesis.
While these trials can be described simply, adding to their public appeal, those of us involved in the conception of trial design increasingly recognize that unless geroscience clinical trials are conducted with rigor like traditional disease-based clinical trials, the scientific community, and, perhaps more importantly, regulatory agencies, will lose their enthusiasm and support for geroscience-based intervention that might have merited development. In this editorial, I briefly describe some methodological complexities in geroscience clinical trial design and argue for a strategic approach to de-risk decisions which must be made during early clinical development.
Among individuals who are, say, 55-80 years old, who may be most likely to enroll in a geroscience clinical trial, hard clinical outcomes such as death and incident cardiovascular disease, cancer, and dementia, tend to occur at low rates. Subsequently, geroscience clinical trials will require, on average, longer treatment periods than typical disease-based clinical trials to accumulate enough hard events to achieve adequate statistical power. The theorized pleiotropic effect of geroscience interventions (i.e., common mechanisms drive multiple morbid outcomes) could mean that effect sizes on any specific outcome may be modest, making it difficult to predict which age-associated outcomes will benefit from treatment. This contrasts with a traditional disease-based trial, where outcomes typically occur at higher rates and interventions often hinge on narrowly defined targets, making trials more efficient.
Trialists may find it appealing to develop composite endpoints, which are defined by regulatory agencies as the occurrence or realization in a patient of any one of a set of specified components and measured as the time to developing the first event of any one occurrence within the composite endpoint (FDA, 2022). Composite endpoints lend themselves well to interventions aimed at preventing or delaying morbidity, a key goal of geroscience, and often have face validity when components are hard clinical outcomes, such as incident cardiovascular disease, cancer, dementia, or death. Composite endpoints are useful when each of the component endpoints occurs at a low rate; by increasing the overall event rate of the composite endpoint, trial size and duration may be reduced. Statistical adjustment for multiplicity (as is necessary with co-primary endpoints) is unwarranted, and therefore, a single statistical test can be performed on the composite endpoint. Seemingly, composite endpoints provide the rigor that geroscience trials need while preserving efficiency.
However, composite endpoints can be tricky. We must decide which components to include in the composite, a particular problem for geroscience where there is little agreement on the most significant endpoints (e.g., molecular biomarkers of accelerated aging, worse tissue pathology, incident diagnosed disease, functional decline, and death). By default, each component is given equal statistical weight, though each component may differ in clinical impact. Furthermore, a significant therapeutic effect on the overall composite will mask the effect on each component-if this determination is necessary, it should be pre-specified with adjustments for multiplicity, which can impact trial efficiency. More troubling, a positive treatment effect could be found with the overall composite endpoint despite some component(s) potentially being adversely affected, obscuring a potential harm signal. Because gerosciencebased interventions are intended to have pleiotropic effects, there may be multiple adverse events of special interest that require detection across organ systems.
An alternative to a composite endpoint is a multicomponent endpoint, which is a within-subject combination of two or more components, tabulated according to pre-specified rules, such as a summary scale (e.g., frailty index, fatiguability, and physiologic index of comorbidity) (FDA, 2022;Glynn et al., 2015;Newman et al., 2008). With these pitfalls, how should we proceed with geroscience trial design? Although no trial design exists without risk, we have a say in how much risk we can tolerate before beginning a trial. Pharmaceutical industry approaches may provide guidance here through strategic biomarker development, with each proposed biomarker explicitly evaluated with respect to its intended use in early clinical development. This differs from the common practice of, for example, conducting in vitro and epidemiological analyses to observe biomarker patterns and using those patterns as rationale that a biomarker has value in a clinical trial. Sticking to a strategic framework forces us a priori to study a biomarker for patient selection, pharmacodynamic response, clinical efficacy, or safety and tolerability. As I will explain shortly, if we intend to rely on biomarker readouts to de-risk go/no-go decisions regarding clinical trial design and interpreting trial results, the most actionable information will be generated if we plan experiments around each biomarker's intended use (detailed in Table 1).
If we evaluate biomarkers within a strategic framework, it leads us to ask different questions that guide experimental design. Currently, there are no widely accepted, specific, sensitive, accurate, reproducible, and clinically validated biomarkers of aging in humans that meet regulatory standards-they are all exploratory (Justice et al., 2018). We are also all aware that some within our field, including notable academics, have lent their credentials to support a new generation of unregulated products and services sold TA B L E 1 Considerations for each strategic step of biomarker development.

Biomarker use Considerations for use
Patient selection • Does therapeutic effect depend on patient characteristics? • Is genetic, molecular, or other stratification part of standard of care?
• Is a companion diagnostic needed?
• What is the minimum scientific, analytical, and clinical data package required for a companion diagnostic?
• What is the development strategy to deliver this data package for the companion diagnostic?
Pharmacodynamic response • What is the drug's mechanism of action?
• How does mechanism of action impact measuring the association between therapeutic target and phenotypic effect?
• In what matrix and with what methods should the target and phenotype be measured?
• Are measurement methods commercially available, or do they require development?
Clinical efficacy • What are the most important endpoints for mid/late-stage trials?
• How can preclinical data and first-in-human results be used to predict effects in mid/late-stage trials?
• How accurate and precise are predictions of effects in mid/late-stage trials using preclinical and first-in-human data?
• What extent of validation is required for endpoints used in internal vs. external decision-making?
Safety & tolerability • Based on mechanism of action, patient population for intended use, and disease characteristics, what are the most likely and concerning safety signals that must be measured? directly to consumers that purport to offer clear, actionable information about one's health and behaviors, in many ways forgetting the disillusionment that communities experienced with similar approaches in nutraceuticals and the nutrition industry. These products are frequently costly biomarker tests backed by a thin veneer of scientific certainty. Furthermore, these measurements are often coupled to "recommended" (and sometimes costly) interventions, such as supplements, off-label use of existing medications, and trials of unproven technologies, to allay a consumer's fears produced by a "conclusive" biomarker test. Surely, as a scientific community, we can agree that endorsement of these products and services does a disservice to the geroscience mission of advancing healthspan. If we do not hold ourselves and each other to the highest standards, we (geroscientists) run the risk of being viewed as peddlers of snake oil (which can erode public confidence in geroscience), and potentially cause medical harm directly to patients or indirectly through opportunity lost from evidence-based trials.
In conclusion, strategically developed and validated biomarkers may help us overcome some of the biggest roadblocks in geroscience clinical development. I urge our community of geroscientists to optimize the probability of successfully validating and applying translational biomarkers by using a strategic biomarker development framework. Let us collectively develop best practices and guidelines to give the world its best shot at realizing the promise of geroscience-based therapies: increased healthspan.

CO N FLI C T O F I NTER E S T S TATEM ENT
JLS is an employee of and holds stock in Vertex Pharmaceuticals Inc.
The views expressed within are solely those of the author and not those of Vertex or Aging Cell.

Jason L. Sanders
Vertex Pharmaceuticals Inc., Boston, Massachusetts, USA