Real- world evidence on clinical outcomes of people with type 1 diabetes using open- source and commercial automated insulin dosing systems: A systematic review

Aims: Several commercial and open- source automated insulin dosing (AID) systems have recently been developed and are now used by an increasing number of people with diabetes (PwD). This systematic review explored the current status of real- world evidence on the latest available AID systems in helping to understand their safety and effectiveness. Methods: A systematic review of real- world studies on the effect of commercial and open- source AID system use on clinical outcomes was conducted employing a devised protocol (PROSPERO ID 257354). Results: Of 441 initially identified studies, 21 published 2018– 2021 were included: 12 for Medtronic 670G; one for Tandem Control- IQ; one for Diabeloop DBLG1; two for AndroidAPS; one for OpenAPS; one for Loop; three comparing various types of AID systems. These studies found that several types of AID systems improve Time- in- Range and haemoglobin A 1c (HbA 1c ) with minimal concerns around severe hypoglycaemia. These improvements were observed in open- source and commercially developed AID systems alike. Conclusions: Commercially developed and open- source AID systems represent effective and safe treatment options for PwD of several age groups and genders.


| INTRODUCTION
Despite recent developments in diabetes management, meeting recommended glycaemic targets remains a major challenge for people with type 1 diabetes (T1D). [1][2][3] Intensive treatment can help achieve improved glycaemic outcomes, thereby reducing the risk of long-term complications. 4 However, the burden of treatment and risks of hypoglycaemia represent considerable challenges.
The development of automated insulin dosing (AID) systems, also called '(hybrid) closed-loop' or 'artificial pancreas' systems, represents an important step towards improving diabetes management. These systems combine continuous glucose monitoring (CGM) and insulin pumps to automatically adjust insulin dosing depending on glycaemic levels via a control algorithm. While CGM and pump therapy have already led to a significant improvement in glycaemic outcomes compared to multiple daily injections, 5,6 AID systems promise to optimize diabetes management even further. 7 Since the Food and Drug Administration approved the first commercial AID system in 2016, several others have been developed, approved and introduced to the market. 8 Prior to the developments in industry and academia, a community of people with diabetes (PwD) and their families developed their own diabetes technology solutions behind the hashtag #WeAreNotWaiting. With the source code and documentation freely available online, PwD can build open-source AID systems based on available CGM sensors and insulin pumps, and use them at their own risk. An estimated number of over ten thousand individuals are currently using open-source AID-including children and adolescents-whose caregivers build these systems on their behalf. [9][10][11][12] None of the open-source AID systems have so far received regulatory approval; liability does not apply as in commercially developed medical devices. However, rich community support is available from volunteers. Data and experience are deliberately being shared between peers for individual support and with researchers and open-source developers for continuous improvements. 13 The #WeAreNotWaiting movement is a primary example of how open sharing of data, algorithm transparency and experienced-based evidence from real-world settings have helped make AID technology more accessible and allowed for further developments to the system algorithms and features. Observational studies and analyses of self-reported data point to improvements in glycaemic outcomes and quality of life in open-source AID users. [9][10][11][12][14][15][16] To date, no data from randomized clinical trials (RCTs) are available on open-source AID, although one study is currently in progress. 17 In general, real-world evidence refers to findings based on data collected from multiple sources outside the context of RCTs. Data sources in real-world studies include electronic health records, patient registries, self-reported data, as well as data from medical devices, wearables and health applications. 18 Real-world evidence has several advantages over evidence from conventional RCTs. Data collection in real-world settings may take less time and resources-obtaining outcomes faster with real-world data are particularly beneficial, as the time taken to complete RCTs can delay the pace of developments. Real-world evidence may also reflect Alongside evidence from randomized clinical trials, real-world studies on AID systems and their effects on glycaemic outcomes are a helpful method for evaluating their safety and effectiveness. the true user experience and diversity of the population more accurately, while RCTs only include select populations. 18,19 Moreover, the Hawthorne effect-the changed behaviour in research participants when they are aware of observation 20 -may unduly influence the measured benefit of an intervention in clinical trials. Follow-up of participants in RCTs may be more rigorous than in clinical practice resulting in different adherence levels to several aspects of the trialled intervention, which may produce misleading results. 21 Despite the benefits that real-world evidence adds, it may also pose challenges such as selection bias, missing data, completeness and quality of data and variations in study design. 22 Despite these potential limitations, realworld studies may provide a more realistic estimate of the treatment effect of an intervention. Therefore, real-world data can help augment evidence derived from clinical trial settings and pave the way to tailor healthcare to the needs of a wider population. 18,19 For expensive technology studies where improvements may occur iteratively, real-world evidence may offer significant advantages. 23 Up until recently, realworld data were mainly used for post-market surveillance of medical devices or part of investigator-initiated trials, although the interest of regulatory bodies in realworld evidence is increasing. 24 To date, the majority of available evidence on AID has been generated through RCTs, although the number of real-world studies is growing. Similarly, most of the evidence on open-source AID so far is derived from real-world studies, including observational studies and user-, caregiver-and physician-reported data.

K E Y W O R D S
Our primary aim is to undertake a systematic review summarizing real-world evidence on commercial and open-source AID systems, which has not been reported previously. Secondary aims include obtaining additional insights on safety and effectiveness real-world evidence can offer compared to evidence derived from RCTs.

| METHODS
This review is based on a prespecified protocol and is reported according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement. 25

| Search strategy and selection criteria
We searched the electronic databases Pubmed, MEDLINE, Embase, CINAHL, Cochrane Database of Systematic The following inclusion criteria were used: original research articles, focused on single-hormone AID systems (both commercially developed and open-source); participants with T1D; end points related to glycaemic outcomes; self-reported data or observational studies.
The following exclusion criteria were used: RCTs, studies covering ≤4 weeks' worth of data; studies published after 7th June 2021 in peer-reviewed journals; studies on dual hormone, non-hybrid closed-loop systems or systems with predictive-low glucose suspension only. We did not set any restrictions on the sample size of the study, or age, gender or pregnancy status of participants.

| Data extraction
Titles, abstracts and full-text articles were screened by three independent reviewers (CK, SP, MW). Supplementary material was reviewed, if necessary. Disagreements were resolved by consensus or deferral to three further independent investigators (KR, SH, KB) and joint review. After removing duplicates and papers that did not meet the inclusion criteria, identified references were imported into the reference management software Mendeley (Elsevier), where data were then extracted using a predefined extraction template in Microsoft Excel (Microsoft Corporation) sheet for further analysis.

| End points
Focussed on effectiveness and safety of AID systems, primary end points for this review were: (i) percentage Time-in-Range (TIR; 70-180 mg/dl, 3.9-10.0 mmol/L); (ii) change in TIR; and (iii) HbA 1c . Secondary outcome measures included incidence of hypoglycaemia, defined as Time-Below-Range (TBR; <70 mg/dl/3.9 mmol/L) as well as the reported occurrence of severe hypoglycaemia, diabetic ketoacidosis (DKA), or other serious adverse events that occurred while using AID.
For each of the included publications, the following data were extracted: study duration; number of participants; eligibility criteria; study type; the countries the study was conducted in; primary and secondary end points; methods; results; funding; conflicts of interests and limitations.

| RESULTS
The PRISMA flowcharts (Figures 1 and 2) illustrate the selection process of eligible studies. We initially identified 441 publications, of which 88 potentially eligible publications were retrieved in full-text, resulting in 21 publications that met our inclusion criteria and comprised a total of N = 7083 participants eligible for further analysis: 12 for 670G; one for Control-IQ; one for DBLG1; two for AndroidAPS; one for OpenAPS; one for Loop; three including various AID types. The majority were observational studies-11 evaluated data retrospectively and 10 prospectively. Device data were evaluated in 17 studiestwo evaluated self-reported data and one study evaluated device data and self-reported outcomes combined. Of the studies, 10 were conducted in the United States, five in Europe, one in the United Kingdom, three internationally, one in Qatar and one in China. The duration of the studies ranged from ≥1 month to 1 year, whereas five described parameters before and after commencing AID without further specifying the exact time point of measurements. Public or independent funding was received for 12 of the studies, one study was industry-funded 26 and three studies were conducted with no specific funding. [27][28][29] Funding sources were not indicated in five of the selected studies. 14,[30][31][32][33] An overview of the characteristics of the included studies is presented in Table 1. The heterogeneity in study designs did not allow for quantitative data synthesis ( Figure 3).

F I G U R E 1 PRISMA flowchart
showing the selection process of realworld studies on commercial AID systems

| Commercial AID systems
Results from studies covering a total number of 4054 participants based in five countries and using one of three commercial AID systems (Medtronic 670G; Tandem Control IQ, Diabeloop DBLG1) were analysed. No eligible real-world studies were found for the OmniPod 5, CamAPS FX and Medtronic 780G systems.

| Medtronic 670G
The Medtronic 670G was the first AID system that received regulatory approval from US authorities in 2016. The proportional-integral-derivative controller (PID) algorithm runs on the insulin pump, which is compatible with Guardian 3 sensors. When used in auto-mode (AM), the algorithm adjusts basal rates, aiming for a target glucose of 6.7 mmol/L (120 mg/dl), which can be temporarily adjusted by the user up to 8 mmol/L (150 mg/dl). The system is currently approved for PwD ≥7 years and is available in North America, Australia, select countries in Europe and other regions.
All but two of the 12 studies evaluating the real-world use of the Medtronic 670G found significant improvements in TIR, and five reported significant improvements in HbA 1c .
Lal et al. reported on a population of 79 children, adolescents and adults in a 1-year prospective observational study of participants based in the United States. There was a significant correlation between change in HbA 1c and AM use at all visits (p = 0.036). 34 A similar association between time in AM and HbA 1c reduction was observed by further studies of the 670G system. [35][36][37] However, Lal et al. also reported high discontinuation of AM use (33%)-mainly related to sensor issues, dissatisfaction with the AID system and access to supplies-and a lack of evaluable data from another 29% of the participants, leading to a discontinuation rate of 46% of those who provided data. The study was limited by the inclusion of PwD with insurance coverage only. 34   35 These findings may be limited by the high rate of AM discontinuation, with an additional bias that the clinical centre had experience with the system in clinical trials prior to commercial release.
A retrospective analysis by Faulds et al. 33 of 34 USbased adults observed HbA 1c and TIR improvements but without statistical significance. PwD with lower baseline HbA 1c levels spent more TIR than those with a higher HbA 1c despite spending less time in AM. Prior to their HbA1c TIR participation, the majority of the participants were not regular CGM users. Therefore, the reduction of HbA 1c might not solely be attributed to AID but also the initiation of CGM. Beato-Vibora et al. 32  A case-control study from Italy by Lepore et al. 27 compared users of predictive low glucose suspend systems (640G) with 670G users over a period of 6 months and reported improvements of HbA 1c (57 ± −13 mmol/mol vs. 53 ± −17 mmol/mol [7.4 ± 1.0% vs. 7.0 ± 0.6%, p < 0.05]) and TIR (59.0 ± 16.0% vs. 71.4 ± 9.8%, p < 0.005). No changes in TBR were observed. 27 A retrospective analysis from Duffus et al. 37 analysed the relationship between time spent in AM with HbA 1c and TIR in 96 adolescents and young adults. They found a significant correlation between the improvement of both parameters and time spent in AM.
The largest sample size of all AID studies so far was presented by Stone et al. who retrospectively analysed CareLink TM data of 3141 PwD. Participants aged ≥7 years who completed at least 3 months of continuous AM use were included. The average TIR observed across different age groups was 66.0% in MM compared to 73.3% during AM (p < 0.001). 31

| Control-IQ
The Tandem Control IQ algorithm is an advanced hybrid closed-loop system operated by a predictive control algorithm that runs on a t:slim X2 pump with Dexcom sensors. Target glucose is set to 110 mg/dl (6.1 mmol/L) and can be temporarily adjusted. The system is currently available in North America and select European countries for PwD aged ≥6 years.
Real-world data from 1435 US-based PwD aged ≥14 years using the Tandem Control IQ system were analysed over 7 weeks. TIR improved significantly after 3 weeks and at the end of the study, from 78.2% (70.2%-85.1%) to 79.2% (70.3%-86.2%), p < 0.001, without increasing TBR. 26 Compared to the general T1D population, the study participants had a relatively high TIR prior to using AID. 26

| DBLG1
The DBLG1 system of the French company Diabeloop is operated by a model predictive control algorithm running on a handheld device. The device is compatible with the Kaleido and AccuChek Insight pumps and Dexcom sensors. Target levels are customizable between 100 and 130 mg/dl (5.5-7.2 mmol/L). The system is available in select European countries for PwD ≥18 years with a total daily insulin dose of ≤90 units.

| Open-source AID systems
Results from studies covering a total of 1664 participants from 36 countries and using any of the three open-source AID systems (OpenAPS, AndroidAPS, Loop) were analysed. No studies were found for FreeAPS specifically, which is a separate fork of the Loop system. Of the seven studies, all found significantly decreased HbA 1c levels and increased TIR. In the four studies that assessed TBR, no increase in hypoglycaemia was observed. Some studies reported continuous improvements over time, but not to a statistically significant extent.

| OpenAPS
OpenAPS runs a heuristic algorithm on a microcontroller and may be used as a hybrid or full closed-loop system with announced or unannounced meals. Older models of Medtronic insulin pumps and additional hardware ('rig') are required to operate OpenAPS. A variety of sensors are compatible. Several parameters (e.g. target glucose, duration of insulin action) are customizable, and the system can regularly perform automatic adjustments of therapy parameters and basal profiles ('autosense').
In a 6-month study, Melmer et al. evaluated device data of OpenAPS users, which were donated to the 'Open Humans' portal. 14,41 The average TIR of the entire cohort of N = 80 was 77.5 ± 10.5% during the first 180 days with no further significant changes between days 1-60, 61-120 and 121-180. A subcohort of N = 34 was evaluated before and after changing from sensor-augmented pump therapy to OpenAPS and showed a significant reduction in estimated HbA 1c (eA1c) from 49 ± 14 to 44 ± 17 mmol/mol (6.6 ± 0.9%-6.2 ± 0.6%) (p < 0.0001) and an increased TIR from 71.1 ± 13.5% to 80.4 ± 8.3% (p < 0.0001) with no significant change in TBR and a small decrease in hypoglycaemic events.

| Loop
The Loop algorithm is operated by a mobile application on Apple iPhones and smartwatches and is compatible with older Medtronic pumps and Eros OmniPods via a communication bridge device (e.g. 'RileyLink') as well as with various sensors. Therapy parameters are adjustable individually, and users can enable 'manual overrides' for certain situations to change several parameters at once.
The 'Loop Observational Study' by Lum et al. 10 was a real-world prospective study and registered clinical trial that investigated glycaemic measures of 558 Loop-users based in the United States, ranging from 1 to 71 years of age. TIR significantly increased from 67.0 ± 16.0% to 73.0 ± 13.0% at 6 months, and HbA 1c decreased from 51 ± 11 mmol/mol (6.8 ± 1.0%) to 48 ± 9 mmol/mol (6.5 ± 0.8%) at 6 months (p < 0.001). Improvements were greater in those with higher baseline HbA 1c and lower baseline TIR. The median time of CGM use was 96% and the median time in AM was 83%. The median TBR decreased over the course of the study, with a TBR <70 mg/dl change from 2.9% to 2.8% (p = 0.002) and a TBR <54 mg/ dl change from 0.40% to 0.36% (p < 0.001). No cases of confirmed DKA were reported. Three months prior to the study, 18% (N = 97) of participants reported at least one severe hypoglycaemic event. During the 6-month study duration, only 6% (N = 35) of the participants experienced severe hypoglycaemic events, increasing safety for this vulnerable group.

| Multi-system and comparative studies
Three studies investigated multiple AID systems simultaneously, of which two included various types of opensource AID 9,11 and one study compared the Medtronic 670G with open-source AID. 29 The most extensive study on open-source AID was conducted as part of the OPEN project 13 and evaluated self-reported clinical outcome data of 897 users from 35 countries, from which 722 were adults, and 175 were children and adolescents with their caregivers responding on their behalf. 11 There was a significant decrease in HbA 1c from 55 ± 12 mmol/mol (7.1 ± 1.1%) to 45 ± 7 mmol/mol (6.2 ± 0.6%) and increased TIR from 63.0 ± 16.2% to 80.3 ± 9.4% (p < 0.001), independent from gender and age.
A previous study of OPEN evaluated caregiver-reported outcomes from 209 children and adolescents from 21 countries. 9 HbA 1c decreased from 52 mmol/mol (6.9 ± 0.9%) to 45 mmol/mol (6.3 ± 0.7%) (p < 0.001) and continuously improved over time. TIR increased from 64.2 ± 15.9% to 80.7 ± 9.3% (p < 0.001), with no significant differences between children and adolescents and between the three system types. A limitation of these two studies is that data were self-reported by PwD or caregivers and not calculated from device data or supported by clinical records.

| DISCUSSION
To the best of our knowledge, this is the first systematic review of studies on several AID systems to analyse real-world studies, thereby providing an overview on the safety and effectiveness of various AID types, including both commercial and open-source systems. The number of real-world studies on commercial AID systems, where evidence has so far been derived mainly from RCTs, is increasing, 43,44 thereby acknowledging the additional insights and advantages that real-world studies offer. In open-source AID, evidence is mainly derived from realworld settings. Involving several study types makes a direct comparison of different AID systems challenging. In addition, a significant proportion of real-world studies of AID systems has come from citizen science-based approaches such as the open-source community and researchers working in collaboration with them. 45 In summary, improved glycaemic outcomes were found across all of the investigated AID systems with commercial and open-source AID alike, despite variable clinical and technical characteristics. Time in range increased in all studies. Many of them have found a greater improvement in TIR in those with a lower baseline TIR, but those with a high percentage TIR at baseline also improved their time in glycaemic target ranges. This shows the effectiveness of AID for a wider population with different baseline characteristics. HbA 1c levels were reported in eight of the studies on commercial AID, of which six showed significant improvements. In comparison, all of the seven open-source AID studies reported significantly improved TIR and HbA 1c . Several studies observed an association between time spent in AM and reduction of HbA 1c , 28,30,[34][35][36][37] although AM discontinuation was reported for some commercial systems. Reasons for AM discontinuation were multifaceted, ranging from sensor issues to access to supplies of commercial AID, and those who discontinued were more likely to be younger, male, of lower education status, belong to ethnic groups other than White, and have a higher HbA 1c . 34 Further research should address usability and human factors as well as difficulties in access to AID in real-world settings and how potential barriers to AM use could be resolved to enable PwD to stay in treatment.
When comparing real-world studies on commercial and open-source AID, some of the demographic characteristics, such as geographical location and participant age, were of major difference. For commercial AID, studies with participants based in five countries (United States, Italy, Spain, France, Finland and Qatar) were included, while studies on open-source AID cover up to 36 different countries in several regions of the world. Among other reasons, this may be explained by regulatory approval and access to AID technology which is currently restricted to select and, to a large part, high-income countries and local differences in insurance coverage or reimbursement policies of AID technology. It also highlights the potential for citizen science-based approaches to offer real-world data at an international level and cover regions with different healthcare systems.
Only two studies report on the off-label use of commercial AID in children ≤7 years, 30,36 while studies on open-source AID cover a larger cohort of children in that age group, including individuals aged 1-71 years. The extent of the changes in glycaemic outcomes varied between studies that included children ≤7 years of age. To date, the CamAPS FX system is the only AID system that obtained regulatory approval for very young children. There were no studies on CamAPS FX that fulfilled the inclusion criteria of this review, limiting our ability to evaluate commercial AID options for this age group. In the study of Varimo et al. on the off-label use of the Medtronic 670G system in children aged 3-7 years, the average HbA 1c improvement failed to reach significance, and the study of Salehi et al. reported a significant increase in time in hypoglycaemia in children ≤7 years.
Evidence derived from RCTs has proven commercial AID systems to be safe and effective in reducing hyper-and hypoglycaemia, 46 leading to regulatory approval of several of them. The costs of clinical trials for multi-system comparisons can be high and currently not necessarily required for their regulatory approval, which might contribute to why industry funding of multi-system trials is lacking. On the other hand, real-world studies are likely to provide more effective means to compare AID systems. In general, the findings of this review on real-world evidence of both commercial and open-source AID systems have been in line with RCTs of commercial AID. Therefore, real-world evidence may represent a useful and increasingly popular method of reviewing safety and effectiveness for regulatory approval. The 'Loop Observational Study' will be the first of its kind to be considered supporting evidence for regulatory approval of 'Tidepool Loop', a commercial AID system based on the open-source Loop algorithm, paving the way for future use of real-world evidence in regulatory decision making. 47 Therefore, the generation of real-world evidence from independent sources, ideally comparing multiple AID systems in similar clinical settings, is likely to reflect real-world conditions most accurately and should therefore be encouraged and supported. In order to avoid selection bias that may also apply to RCTs when real-world studies are being conducted by the same clinical centres and geographical regions where RCTs have previously been conducted, more real-world evidence from publicly funded healthcare systems, including individuals with higher HbA 1c levels or frequent hypoglycaemia is needed. Furthermore, the impact of socio-economic status, ethnicity as well as on individuals with further physical and mental health implications should be considered. A recent report highlights the importance of such studies. 29 It further contrasts the outcomes and participant characteristics, which has important implications for the application of technologies in T1D.
There are several limitations to the studies included in this review, depending on their design. Only two studies included a control group: Lepore  Some of the studies reporting on open-source AID rely on self-or caregiver-reported data which have not been validated or supported by device data or clinical records. Although it is reasonable to expect that study participants generally report valid data, there is potential for inaccuracies. For example Nightscout, which is a popular opensource platform for reviewing glucose data widely used by open-source AID users, uses a default TIR of 80-180 mg/dL which can be manually adjusted by the user. Other platforms, such as Dexcom Clarity, use variable lower and higher levels to define time spent in an individual target range. Although the studies explicitly asked the participants to provide values for TIR between 70 and 180 mg/ dl, there might have been inaccurate entries based on different settings of individual users. Of other limitations, we did not perform a quality assessment of the studies and the validity of this review's findings might thereby be limited. Our review of funding sources and conflicts of interests of the study authors further indicates a potential bias in some of the studies and their authors' connection to industry. If unmitigated, this could limit the credibility and progress in the field.
As stated before, real-world studies may provide a more realistic estimate of the treatment effect of an intervention. Participants are not always aware of the intervention, thereby potentially reducing the Hawthorne effect on study outcomes. 48 However, the community behind opensource AID systems represents mostly highly empowered and often tech-savvy people contributing, but possibly also reviewing their own data, which might have an observation effect itself. Sources of education and support for PwD or caregivers are also different between open-source and commercial systems. Studies on commercial AID systems usually include professional system education for staff and PwD and their families, whereas open-source AID users mainly use peer-support and educational resources provided by the #WeAreNotWaiting community throughout the implementation process and systems use. This may affect user experience, the use of AM and other AID-specific features, and consecutively may impact clinical outcomes.
This systematic review highlights important contributions that real-world studies have made to the hybrid-closed loop literature and data on safety and effectiveness. Real-world data from commercial AID systems reinforce findings from RCTs undertaken, although done in limited centres as single system studies. Data from open-source AID systems provide support for the safety and effectiveness of these systems from a wide body of international users. However, potential for selection bias exists and is addressed by recent and ongoing efforts to provide clinically validated metrics. Further efforts are needed to continue generating clinically validated, multi-AID system real-world data to help make comparisons between different AID systems and continue to provide outcomes reflective of clinical practice. Given the pace of technology development and the heavy resource burden for undertaking AID system research, the use of real-world data needs to be given a more prominent position in regulatory and healtheconomic evaluations of technologies.
Care, Dexcom, Medtronic Diabetes, Diabeloop, Sanofi Diabetes, Abbott and BCG Digital Ventures; outside the submitted work. All other co-authors have no conflict of interest to declare.