Data sharing to advance gene‐targeted therapies in rare diseases

Recent advancements in gene‐targeted therapies have highlighted the critical role data sharing plays in successful translational drug development for people with rare diseases. To scale these efforts, we need to systematize these sharing principles, creating opportunities for more rapid, efficient, and scalable drug discovery/testing including long‐term and transparent assessment of clinical safety and efficacy. A number of challenges will need to be addressed, including the logistical difficulties of studying rare diseases affecting individuals who may be scattered across the globe, scientific, technical, regulatory, and ethical complexities of data collection, and harmonization and integration across multiple platforms and contexts. The NCATS/NIH Gene‐Targeted Therapies: Early Diagnosis and Equitable Delivery meeting series held during June 2021 included data sharing models that address these issues and framed discussions of areas that require improvement. This article describes these discussions and provides a series of considerations for future data sharing.

available to address the 7,000+ recognized rare diseases, 80% of which are genetic in nature (Kaufmann et al., 2018).
The advancement and expansion of the application of GTTs and their translation into practice has been hampered by siloed approaches to development: individual research programs separately pursuing overlapping (or the same) therapeutic approaches, with limited collaboration on preclinical development, manufacturing, trial design and execution. While this might be traditional, the scarcity of individuals with any single rare disease impels a more collaborative approach. Furthermore, many GTT rare disease programs utilize common platform strategies (AAV9-mediated gene transfer, antisense oligonucleotides, etc.) creating potential for further scientific synergies.
The promotion of pre-competitive collaboration and international standards for data sharing could advance GTTs in several ways: (1) increasing the identification of patients who may have a rare disease, (2) identifying patients and providing them access to ongoing research efforts, (3) improving our understanding of the genotypephenotype relationships in rare diseases, (4) stimulating the efficient development of new therapies, (5) improving the detection of safety signals, and (6)  Journal Editors (ICJME) implemented a data sharing statement policy for clinical trial data (Taichman et al., 2017a(Taichman et al., , 2017b(Taichman et al., , 2017c. Massive investment has been made in both public and private sectors in largescale data sharing to create diverse datasets (Table 1). Digital, molecular, and imaging technologies used in patient care, academic and pharmaceutical-sponsored research, newborn screening, and directto-consumer genetic testing (e.g., 23andMe, Ancestry, etc.) all provide data pertaining to human health that can potentially predict disease risk, provide disease-related insights, and inform the personalization of therapies. Individuals themselves increasingly desire (and demand) access to their own data, as well as involvement in personal health care decisions. These kinds of data, coupled with Artificial Intelligence/Machine Learning (AI/ML) tools have already demonstrated the power to advance our understanding of human health, both for the good of the individual as well as society (Baxter et al., 2021;Dabbah et al., 2021;Maglanoc et al., 2020). There will be much to learn from the challenges and successes of these large data-sharing efforts, many of which involve large population-based approaches.
Meeting the needs of rare disease therapeutic development, though, may require even more ambitious sharing. Rare disease GTTs require more targeted approaches to safely and expeditiously advance the development of novel therapies, such as small, focused, yet global patient registries. However, significant challenges in their construction and implementation can limit their broad use, and for many diseases there are increasing redundancies of similar and parallel registry efforts that lack integration. Further, the traditional data privacy practices of pharmaceutical sponsors are restrictive and have hindered the ability to aggregate clinical trial data on patients with rare genetic diseases. For the rare disease community, there is an obligation and an urgent need to collect, aggregate, and share these disparate data. This will optimize the opportunity to learn from a collective experience to develop more effective therapies and improve patient outcomes. This article describes insights from an NIH hosted 3-day virtual roundtable held during June 2021 on the topic of "Gene Targeted Therapies: Early Diagnosis and Equitable Delivery," (https://eventssupport.com/events/Gene-Targeted_Therapies_June_2021/page/2357).
And similar to the virtual roundtable, the target audience for this article includes patients and patient organizations, clinicians, researchers, industry and government agencies. In addition, the article includes three case studies that highlight successes and challenges of developing focused global patient registries for rare disease and discussion of how patients, health care professionals, academic researchers, governments, and industry can leverage these experiences, with the goal of expanding our understanding of rare diseases and the arsenal of available treatments.

| METHODS
The roundtable consisted of an expert multi-stakeholder panel consisting of patients and patient organizations, clinicians and professional societies, researchers, industry, and government agencies. The roundtable explored broad landscape questions, including "What novel approaches are needed to enable development of gene-targeted therapies for all genetic rare diseases-now and in the future?" After the roundtable, working group participants convened to identify the key insights and learnings and share considerations on a framework for addressing the identified challenges to ensure equitable access to treatments for those born or living with a rare disease.

| CASE STUDIES
Three case studies are presented below. Two describe the successes and challenges in the development and execution of post-therapeutic disease registries for hemophilia and SMA. Their challenges are no different than those faced by large registry efforts, including governance, equity, consistency and harmonization of data collection, data housing, data ownership, and security and access. Case 3 describes recent efforts to advance GTT for patients with ultra-rare diseases with reflection upon the importance of data sharing.  (Miesbach et al., 2022;Ozelo et al., 2022). However, many questions will remain unresolved about the long-term safety, variability and durability of efficacy at the completion of current ongoing clinical trial programs. Lifelong follow-up of patients is therefore crucial to monitor long-term safety and efficacy of gene therapy, and unified The primary objective of the GTR is to determine the long-term safety of factor VIII and factor IX gene therapies in people with hemophilia. The secondary objectives of the WFH GTR are to determine the long-term efficacy and the durability of factor VIII and factor IX gene therapies in PWH, assessed as bleeding rate and plasma factor activity level; and to assess the long-term quality of life, assessed by the EQ-5D-5L and the Patient Reported Outcomes, Burdens, and Experiences (PROBE) instrument post gene-therapy infusion. The PROBE is a hemophilia-specific patient-reported outcome measure to assess burden of disease in PWH.

| Protocol and development of the core set of data to be collected
The protocol and the GTR core data set, reflecting the previously agreed core outcomes, were developed by the GTR Steering Committee  and informed by both the European Medicines Agency (EMA) (through formal submission of the protocol for Scientific Review by the EMA followed by an in-person meeting with the WFH) and the Food and Drug Administration (FDA) (comments via email). Both agencies have expressed support for the establishment of one global registry to collect long-term data on PWH who receive gene therapy (Konkle et al., 2021).
Data are to be collected at 3, 6, 9, 12, 18, and 24 months postgene therapy infusion and annually thereafter. Data will be captured in the registry one of two ways: 1. Directly via participating hemophilia treatment centers (HTC) 2. Through data transfer/linking from existing national hemophilia registries Linking registries will be reserved for registries that meet specific criteria and can ensure the following: • Collection of the same/similar data fields as the GTR core data set • Harmonization and standardization of terminology, particularly for adverse events • Patient consent for secondary uses of their data outside of the existing registry • Strict data quality management practices and high-quality data • Avoidance of duplication of data represented from a single individual The American Thrombosis and Hemostasis Network (ATHN) in the United States is a non-profit national coordinating center. It partners with 146 hemophilia treatment centers in the U.S., and leverages a standardized, integrated process to build and maintain a secure national database. ATHN has committed to collaborate with GTR, collecting data on all U.S. PWH who receive gene therapy, using harmonious data fields and robust practices consistent with GTR, and will transfer its data into the registry on a regular basis.
In Europe, a "Hub and Spoke" model for the administration of gene therapy has been proposed. The Hub centers will be invited to participate in the GTR, with follow up data being sent to the Hub center from the Spoke centers and entered in the GTR (Miesbach, Chowdary, et al., 2021). A similar integrated care infrastructure, built around the integrated comprehensive care model for hemophilia has been proposed for the United States (Miesbach, Pasi, et al., 2021).
Sponsor manufacturers will have access to their product data for regulatory submissions via a data dashboard of their product only.
The sponsor data dashboards will be developed to track, analyze, and visualize product specific data. A general dashboard presenting aggregate-level data will also be created for the public.
Patient reported outcomes will be collected via a web-based mobile application (myGTR).
The WFH GTR has the ambition to capture historical data from prior clinical trials, as well as pre-licensure data from on-going studies to provide the richest possible data set for research and analysis.

| WFH WGTR launch
At this writing, the WFH GTR is ready to accept data. Data collection will commence upon licensure of the first gene therapy for hemophilia The WFH GTR is funded through multi-year commitments from genetherapy sponsoring manufacturers.
Data from the WFH GTR will be critical to answer questions regarding safety and efficacy of gene therapy in hemophilia. Given that hemophilia is a rare disease, it is imperative that a global approach to data collection is adopted to detect low incidence events that may affect the lives of PWH who choose to undergo gene therapy. The registry will also be able to measure and compare the impact of gene therapy on the lives of PWH around the world. Success of this ambitious initiative needs the support of all stakeholders. Only through cohesive efforts by all treating physicians, PWH, regulatory agencies and manufacturers worldwide, will it be possible to ensure that gene therapy is safe and efficacious for PWH now, and in the future.

| Key take-aways
• Given the many knowns and unknowns, there is a critical need for life-long data collection and post-marketing surveillance of genetargeted therapies.
• To adequately assess the safety and efficacy of gene therapy, harmonized data collected longitudinally across products and countries is necessary. Given that hemophilia is a rare disease, it is imperative that there is a global approach to data collection to detect low incidence safety events.
• The core data set should capture demographic information, vector infusion details, safety, efficacy, quality of life, and burden of disease.
• Support and early engagement of all stakeholders in gene therapy (health care providers, patients, industry, and regulators) augers successful capture of uniform long-term safety and efficacy data to ensure optimal treatment.
• Integrating patient-reported outcomes directly into the registry will provide support for payers regarding the efficacy and potential safety milestones needed to inform reimbursement strategies.  and drug development efforts, and sponsors the largest annual meeting on SMA for both parents and scientists/clinicians.
• The Muscular Dystrophy Association (MDA, USA) has taken a slightly different approach to collecting data on SMA patients seen in MDA-sponsored clinics. Data is collected on case report forms and shared with the MDA, which then aggregates it in a registry designed to serve multiple neuromuscular diseases.
• TREAT-NMD in Europe is revising its SMA registry to make it more responsive to current needs. Data from a variety of SMA clinics within Europe contribute data, but as of yet there is no uniform registry for each clinic to use.

| Pharmaceutical sponsor registries
The pharmaceutical sponsors need to understand the safety and efficacy of their drug when used in a larger and broader population of patients, often with less clear standard-of-care support, and to meet the reporting requirements of regulatory authorities. The company that developed and markets nusinersen has taken the approach of developing data sharing agreements with numerous registries, including the iSMAC as described above. This includes data from a variety of highly and lesser curated registries, health utilization records, and gleanings from social media to construct a multidimensional picture of the broader patient experience. The company that develops onasemnogene, a GTT, has taken a different approach-to develop their own internal registry ("RESTORE") .

| Key take-aways
• Patients can be enrolled in more than one registry. Use of a global unique identifier (GUID) would identify such patients, however, to date, no consistent mechanism has been put into place. Without it, there is limited ability to combine data across registries without inadvertent duplication of data.
• Alignment of data elements across registries has not been adopted as fully as was hoped initially. And while all data on patients is of value, curated data can be discordant with non-curated data that is extracted from an electronic medical record or is self-reported. To this end, the process by which data integration or federation can be accomplished across registries must be considered.
• How these registries will be sustained in the long run is unclear.
Once the sponsor reporting requirements have been completed, will pharmaceutical companies continue to fund external efforts or maintain their own internal registries?
• As some patients are now switching from one drug to another or using one drug pre-or postadministration of a GTT drug (OA), registry data will need to become more sophisticated to identify safety and efficacy changes in these patients and sort out which safety findings are attributed to one or both drugs.
• With the rapid rollout of newborn screening for SMA in the United States, and evolving quickly in several European countries, existing registries will need to build in sufficient flexibility to capture additional, newly relevant parameters, including genetic data.
• Data sharing in both the pre-and postcompetitive space, with pharmaceutical sponsors and other nonpharma researchers (academics, payers) must be conducted in an equitable and preagreed upon way, particularly with the full consent of enrolled patients.
Similarly, data security and privacy must be maintained. Novel data sharing is integral to these efforts, including making public notes from meetings with the FDA, which are typically kept confidential. health-related data to the individual themselves, giving them control of their data, and the determination to share said data, in order to enhance their personal and family diagnostic and therapeutic journeys within the healthcare system, to gain access to clinical trial opportunities, and to contribute in a safe, equitable, and protected way to our deeper understanding of both rare and common diseases, necessary for the harnessing of gene-targeted therapies. These key gaps highlight areas that may require a coordinated and collaborative action (implementation of a national GUID) and/or industry-specific protection for innovation or incentivization. Indeed, the individual and societal benefits of well-orchestrated data sharing cannot be overstated and present a number of emerging opportunities, summarized in Table 2.

| RESULTS AND DISCUSSION
A national or international algorithmically searchable database would enable AI/ML programs to identify patients at risk for rare diseases. A number of such algorithms already exist; however, their potential is limited by access to patient data, inconsistency of format or data quality, and by rights or limitations for contacting patients and families. With better curated data, even on disparate platforms, that is patient-controlled and accompanied by informed permission and privacy protections, federated learning approaches may make it possible to identify those with risk for a rare disease in the absence of or as a compliment to newborn screening or whole-genome/exome sequencing.
Enhancing  Identify and then diagnose hundreds of patients with treatable diseases. These patients would otherwise go untreated, often with what would be inexpensive and straightforward treatments. Additionally, opportunities for early GTT during the pre-symptomatic state, may result in transformative "cures." many thousands of patients could benefit.

Patient controlled data
Patients now have the right to access and control their own medical data, however current systems make this challenging and minimize data utility. A number of companies have formed to facilitate patients aggregating their own data and sharing in the manner that they choose.
This approach may enable each of the other opportunities presented by patient sharing of data, including improvements in the patient's diagnostic and therapeutic journey, and proactive contact by sponsors and other researchers to participate in clinical trials or registries.
Deep characterization of rare disease Broad assembly of the symptoms (phenotypes) or rare disease and comparison to matched controls. Even more common diseases lack deep understanding of common symptoms.
This approach may enable each of the other opportunities presented by patient sharing of data, including improvements in the patient's diagnostic and therapeutic journey, and proactive contact by sponsors and other researchers to participate in clinical trials or registries.
Measurements of the effects of treatment for rare disease Following patient data over time enables the generation of a large amount of outcome information enabling the assessment of treatments on rare disorders. This includes efficacy, financial impact, personal and societal impact.
Demonstrate the effectiveness of a variety of treatments and palliative care for rare disease.
Newborn screening by whole genome sequencing data A number of efforts are underway to sequence a large percentage of newborns (while protecting patient privacy).
Gene targeted therapies often treat very specific genetic conditions. Identifying all patients who can benefit from the treatment at birth enables more rapid clinical trials, potentially earlier, pre-symptomatic treatment, and makes many more therapies viable for development for hard-totreat conditions.

More efficient clinical trials with use of virtual subjects
A virtual subject is a comprehensive, longitudinal, clinical record created using the baseline data collected from a patient -Before they receive their first treatment -That predicts how that patient would likely evolve over the course of the trial if they were to be given a placebo. That is, a virtual subject is like a simulated control group for a particular patient. https://www.statnews.com/sponsor/ 2021/09/22/can-digital-twins-make-clinical-trials-moreefficient-without-introducing-bias Comprehensive data from large cohorts will enable modeling of likely outcomes for all the participants in an RCT, leading to higher power and lower cohort size. This in turn allows more trials to be run, leading to more therapies for patients.

Deep learning generated artificial records
Train networks on large sets of patient records to generate artificial records that maintain confidentiality while preserving patient anonymity.
More public sharing of valuable descriptions of disease without compromising patient anonymity.
family interviews, ARSA gene sequencing and urinary sulfatide levels (Fumagalli et al., 2021 In balance to these emerging opportunities, there remain further challenges which must be addressed: • Equitable access. We recognize that not all individuals can or desire to participate in the current healthcare system; hence the described approach would underrepresent and under-serve those who do not have healthcare data within current systems. For instance, as a technology, universal genome sequencing for newborns has the potential to improve equity disparities across underserved groups, but as health policy, additional considerations of ways to subsidize poorer countries will be necessary to globalize the impact of these approaches.
• Participation. It is not uncommon for trials including individuals with more prevalent diseases that less than 5% of eligible patients agree to enroll. For successful assessment of treatment outcomes in rare disease communities, a large proportion of those impacted will need to agree to participate.
• Health-related data outside of the traditional system is also anticipated to be included in an integrated data approach such as wearables and trackers, and direct-to-consumer genetic testing. Other avenues may be further enhanced in the future; again, within a framework in which the healthcare system does not control an individual's data. The individual's data is controlled by the individual.
• Privacy and data protections must be robust, yet the risk of reidentification can never be zero and informed consents should reflect these concerns. Close partnership with patients groups will help evaluate these and other risks of participation in order to best inform potential participants of other options available to them, such as other registries or data sharing opportunities • Care and thought must be applied to the creation of registries, the potential sharing of data, and the appropriate informed consent in participation-again, leveraging the input of patients and patient groups to consider risks as well as options in terms of types of data collected. Possibilities, include however are not limited to the sharing of all or only some of their medical data.
• If a national GUID or similar identifying approach is used to link an individual's data across multiple systems and platforms, existing statutory authority may not be sufficient.
• Some components may not yet be fit for purpose-for example, unstructured doctor notes may not lend themselves immediately to an integrated data structure. However, while some modifications may require time for practices to adjust, the value of an integrated data system can still be realized.
• Population-wide genetic sequencing may result in uncertain diagnoses and require genetic counseling-a key component of current population-based data efforts is the availability of genetic counseling and interpretation of genetic results; resources and support would need to be incorporated into the proposed system.
• Governance, data management and data access-the proposed integrated data system would require management of collection and access and data protections. To truly realize the benefits of this effort, unprecedented approaches to access must be undertaken. Traditional approaches of request for proposals, applications and committee reviews would be cumbersome and limited. True innovation and openness should be the North Star.

| CONCLUSION
The next wave of innovation in therapeutics for rare diseases is and will continue to be gene-targeted therapies. Building robust data sharing systems will speed drug discovery, optimize trial design and execution, and enable long term follow-up of treated patients to assure unbiased assessments by all stakeholders of the relative efficacy and safety of new treatments. To this end, it may be beneficial to start with several, well-circumscribed rare disease areas as pilots, and operating with a clearly defined consortium of global stakeholders including government, academia, patient groups, pharma, philanthropies and payers. Mechanisms for funding and sustainability must be shared between public and private sectors. Existing approaches to incentivization and protection of innovation should be reviewed, and revisions considered where deemed inadequate. Straightforward access and control of data by individuals must be foremost. Starting with several pilot disease areas and working with those who have solved some of these problems on the larger scale, would start us down the path to more rapid and efficient development of GTT for the millions of people living with rare diseases.