Contrast‐agent‐based perfusion MRI code repository and testing framework: ISMRM Open Science Initiative for Perfusion Imaging (OSIPI)

Software has a substantial impact on quantitative perfusion MRI values. The lack of generally accepted implementations, code sharing and transparent testing reduces reproducibility, hindering the use of perfusion MRI in clinical trials. To address these issues, the ISMRM Open Science Initiative for Perfusion Imaging (OSIPI) aimed to establish a community‐led, centralized repository for sharing open‐source code for processing contrast‐based perfusion imaging, incorporating an open‐source testing framework.


INTRODUCTION
2][3][4][5] These techniques typically involve advanced or bespoke acquisition protocols, image analysis pipelines and software to generate quantitative or semi-quantitative perfusion parameters.A consequence is that perfusion and permeability parameters can vary widely between research groups and software implementations 3 and, while multisite reproducibility studies are unfortunately rare, 6 there is evidence to suggest that reproducibility is poor. 7This impedes translation from single-site tools to harmonized quantitative imaging biomarkers for general use in clinical studies, multicenter trials, and for clinical diagnosis and monitoring. 6everal factors contribute to the variability, including differences in scanner hardware, pulse sequence, acquisition parameters and data processing pipelines.Initiatives such as the quantitative imaging biomarker alliance (QIBA), 8 and disease-specific initiatives including in neurodegenerative disease 4 and brain tumor imaging 9,10 have proposed recommendations for increased harmonization of acquisition and analysis in perfusion imaging.][13][14][15][16][17] Researchers commonly develop or re-use in-house code due to the specialized nature of the processing or because existing available solutions are limited, unvalidated or difficult to use.However, most scientists are self-taught programmers 18,19 untrained in state-of-the art software development practices such as version control and unit testing, despite growing recognition that software "should be built, checked, and used as carefully as any physical apparatus". 20,21Furthermore, sharing code in public repositories is not yet standard practice, but is important to improve the reproducibility of results. 22Proprietary commercial software may reach a higher standard in software engineering terms, but the implementation details are hidden.Together, these factors lead to site-dependent results, increase the likelihood of errors, reduce transparency and impede replication.
Improvement in coding practices is essential but not sufficient to improve reliability and reproducibility of quantitative imaging biomarkers.Community-driven initiatives are also needed to ensure that code can be re-used in practice, to facilitate open testing using trusted reference data and comparison between implementations, and to develop community-led software libraries that become accepted standards or benchmarks within their fields.To address these challenges in the context of perfusion imaging, Taskforce 2.3 of the ISMRM Open Science Initiative for Perfusion Imaging (ISMRM OSIPI, referred to hereafter as "OSIPI") 23 was established with the following aims (Figure 1): (i) To initiate a centralized repository for hosting open-source code that is maintained by the perfusion imaging community to promote code sharing, reduce the need for time-consuming duplicate development, facilitate reproducible analysis and make perfusion image processing more widely accessible.The intended users are perfusion imaging researchers, clinical researchers, and software developers.(ii) Integrate a testing framework within the repository, so that all code contributions can be easily tested and compared using the same publicly-available test data sets.This will enable researchers and software developers to validate their code, reduce the need to create their own test data and testing frameworks, and enable other researchers to re-use the shared code with greater confidence.(iii) Leverage the code and testing framework resulting from aims (i) and (ii) to develop a community-led open-source perfusion imaging software package permitting full perfusion processing pipelines to be coded.

F I G U R E 1
Overview of the main aims of OSIPI Taskforce 2.3 and its interactions with the perfusion imaging community.The third aim ("Perfusion package") was not addressed during the initial 2-y cycle.
The purpose of this paper is to report on progress made by the taskforce during the first 2-y cycle of OSIPI.We describe a new framework for collecting, sharing and testing open-source code contributions, summarize the code already shared via the repository, and report (with examples) on the tests implemented across different aspects of functionality.We intend that readers will be encouraged to make use of, contribute to, and join this open-science initiative, and that our approach will inspire initiatives in other fields of medical image processing.

Taskforce structure and operation
The OSIPI Taskforce 2. meetings and the GitHub website.The activity of the taskforce followed a roadmap that can be summarized as three phases corresponding to the aims described above: (i) establishment of a code repository (ii) implementation of a testing framework and (iii) development of a community-led perfusion library (Figure 1).Tasks during the first 2-y cycle of OSIPI (2020-2022) were focused on the first two phases, as described below.

Scope
The first aim of the taskforce was to lay the foundations for a robust workflow and a sustainable repository that could be extended in the future.Therefore, the initial focus was on code for processing signal-time curves to obtain biophysical parameters.Peripheral steps such as data input/output and obtaining region-of-interest statistics were considered out-of-scope at this stage, since they are not specific to perfusion imaging and accepted software solutions already exist.As open science is one of the key principles of OSIPI, Python was initially targeted as the most popular and well-supported open-source language for scientific computing.

Open-source code repository
A GitHub repository* was established with the Apache Software License (Version 2.0). Figure 2 gives an overview of the repository structure.Calls for code contributions were made to OSIPI members, the ISMRM Perfusion Study Group, and publicly via OSIPI websites.In addition, individual researchers were approached based on references to Python code in their publications and conference presentations.Contributors were asked to add their code to the repository by creating a feature branch, which was merged with the primary develop branch after review.Code contributions were added as a new subfolder within the src directory of the repository, labeled according to the originating author and institution.There were no restrictions on the organization, style, or formatting of code within this subfolder.Detailed guidelines on code contribution can be found on the project wiki.†

Testing framework
The goals of testing were to identify substantive coding errors and to evaluate and compare the outputs of contributions implementing specific functionalities, using trusted test input data and reference output values.A unit testing framework was implemented using the pytest package and Github Actions.Test files were created in the test directory of the repository and grouped according to function, for example, T 1 mapping and DCE-MRI pharmacokinetic models (Figure 2).Each code contribution was tested using the same input data, reference values and tolerances.The original contributions were not modified for testing, except where essential (for example, adding __init__.pyfiles to facilitate import by the test modules).Test data were converted to match the required input format and units before executing the code, and outputs were converted to match the units of the reference values.In some cases, further steps were required in order for the tests to pass.For example, the accuracy of some pharmacokinetic model implementations was increased by interpolating the input time series.For code contributions that implemented a pharmacokinetic model but did not include a fitting routine, the curve_fit method from the scipy package was used to evaluate the contributed code. 24In such cases, any additional steps were documented in the test files.A detailed description of the procedure for developing and implementing tests is given in the project wiki.
For each category, at least one set of test data and reference values was included, either simulated (e.g., a digital reference object [DRO]) or based on human in-vivo scans.The aim was to base all tests on publicly available data, software and DROs.Where the dataset itself was not citable, references describing the protocol, patient cohort and the method used to obtain the reference values were cited within the test code.Where necessary, image data were condensed to a limited number of voxels or regions to reduce execution time and computing requirements.
Testing was performed in two stages.First, all tests were automatically executed on a GitHub remote runner, triggered by changes within the online repository.For each test case, the code output was compared with the reference output; if the difference between these exceeded the combined absolute and relative tolerances then the test failed and a red "badge" was displayed on the repository home page.The purpose of this testing step was to detect substantial errors in the contributed code or in the test files themselves (e.g., incorrect units for input variables).Therefore, wide tolerance levels were set for these tests in order to detect such errors; these are not intended to indicate an acceptable level of accuracy.Furthermore, we aimed to test the validity of the code and not that of the image acquisition and analysis techniques themselves.Therefore, test cases were avoided for which valid code could not be expected to return accurate results, for example those with low SNR, inadequate temporal resolution and degenerate cases.
Second, a test results website ‡ was created to provide end users of the code collection with visual, quantitative representations of the test results.For this purpose, the output values for each of the above test cases and code contributions were exported to comma-separated values files.An automated workflow was established to read and plot these data using Jupyter notebooks and the Jupyter Book package 25 (Figure 2).The notebooks were exported as HTML pages and can be viewed publicly at a test results website, hosted in a separate repository.§ Results were presented by plotting the deviations of the output values with respect to the reference values (e.g., Bland-Altman plots).

RESULTS
At the time of writing, the taskforce received Python source code contributions to the repository (release 1.0.0;https://doi.org/10.5281/zenodo.7729136)comprising 86 implementations of individual perfusion processing steps, contributed by 12 individuals or teams.These include implementations of all core aspects of DCE-and DSC-MRI processing (Table 1).For DCE-MRI, functionality is available for T 1 mapping, bolus arrival time estimation, conversion from signal to concentration, arterial input functions (AIF), pharmacokinetic models and semi-quantitative parameter derivation; for DSC-MRI, functionality is available for conversion from signal to Overview of the OSIPI DCE-DSC code repository.(A) The repository directory structure with a description of the content of each directory.The notebooks directory contains all files required to publish the results on the test-results website.The test results (csv format) are automatically pushed to a second repository (DCE-DSC-MRI_TestResults) linked to a website displaying the results.(B) A snapshot of this website.

T A B L E 1
Implementations of core perfusion processing functionality collected and tested in release 1.0.0 of the repository Processing steps Implemented methods Collected Tested

DCE-MRI T 1 mapping
Variable flip angle (linear, non-linear, NOVI-FAST 26 ), DESPOT1-HIFI 27 11 9 Bolus arrival time estimation Piecewise linear quadratic function, 28 estimate delay by fitting Tofts model to first third of the curve 2 0 Signal to concentration Conversion from signal to concentration for spoiled gradient echo sequences a 8 7 Arterial input functions Population functions (Georgiou, 29 Heye, 30 Manning, 31 McGrath, 32 Parker, 33  General SNR, enhancement ratio estimation 3 0 Note: Some of the collected code contributions were linked to a publication: Bell et al. 37 Berks et al. 38 Johansen O, 39 Mouridsen et al. 40 Orton et al. 41 Rata et al. 42 For more detailed descriptions of the implemented methods, readers are referred to the citations provided in the table.Abbreviations: 2CUM, two-compartment uptake model; 2CXM, two-compartment exchange model; AATH, adiabatic approximation to the tissue homogeneity; C, concentration; CBF, cerebral blood flow; CBV, cerebral blood volume; DESPOT1-HIFI, driven equilibrium single pulse observation of T 1 with high-speed incorporation of RF field inhomogeneities; iAUC, initial area under the curve; MTT, mean transit time; NOVIFAST, non-linear variable flip angle data based T 1 estimator.a Reverse implementations for concentration to signal are also available (not counted separately).
concentration, automatic AIF selection, leakage correction and perfusion parameter derivation.For most categories, multiple contributions implementing the same functionality are available.In many cases, the implementations were mathematically distinct.For example, both linear and non-linear implementations of variable flip angle T 1 mapping are available.Pharmacokinetic models were also implemented using different approaches, including different convolution methods and the use of linear and non-linear least squares fitting routines for parameter estimation.Code contributions also differed according to options and features available: for example, some pharmacokinetic model implementations accepted an artery-capillary delay parameter, while others assumed no delay.Up-to-date descriptions of all collected code are provided as a table in the repository.**Implementation of the tests for each category of functionality is an ongoing and open-ended process (Table 1).At the time of writing, tests have been implemented for linear and non-linear implementation of variable flip angle T 1 mapping, conversion from signal to concentration for DCE-MRI, population AIFs (Parker, Georgiou, and McGrath), pharmacokinetic models (Patlak, Tofts, extended Tofts-Kety, two-compartment uptake and two-compartment exchange) and DSC-MRI parameter derivation (CBF and CBV).Table 2 gives an overview of the tests developed for each category of functionality and the tolerances that were used.On the test results website, detailed results of the tests were visualized including relevant background information, and a description of the test data and the tolerances.Example test results are shown in Figure 3, where graphs display the difference between the output and reference values.
For some of the tested functionality categories multiple implementations were available.For T 1 mapping, the four non-linear implementations of variable flip angle T 1 estimation yielded near-identical outputs when processing voxels from the QIBA T1 DRO 43 (Figure 3B) and from two sets of in-vivo data.Similar results were obtained for three In-vivo brain data 44 : voxel data from one patient with mild-stroke based on ROIs drawn in the white matter, deep gray matter and cerebrospinal fluid (n = 76) R 1 : 0.14-0.99s −1 In-vivo prostate data 45 : cases correspond to randomly selected voxels in the prostate (10 voxels from each of 5 patients with prostate cancer; n = 50) R 1 : 0.40-2.78s −1

Signal to concentration
In-vivo uterus data 46,47 : cases correspond to randomly selected signal intensity curves from voxels in uterus or aorta from one volunteer (n = 5) The concentration time curve was computed from the AIF parameters given in Table 1 of the original publication 33 using a range of time resolutions, acquisition times and bolus arrival times (n = 20) n.a.

Georgiou AIF
The concentration values from supplementary material of the original publication were used. 29A range of temporal resolutions was tested by interpolating the original time series (n = 7) n.a.

McGrath AIF
The concentration time curve was computed from the AIF parameters given in Table 1 (model B) and equation 5 of the original publication 32   linear implementations of variable flip angle T 1 mapping.Seven implementations for the conversion from signal intensity to concentration also resulted in identical output values (Figure 3C), although one implementation showed small deviations (<5e −7 mM).For code implementing the Parker population AIF, all implementations resulted in the same concentration-time curve as the published AIF function.However, for code contributions applying an optional time delay to the AIF, some differences were observed when the delay was not a multiple of the temporal resolution.For all five pharmacokinetic models, differences in the fitted parameters were observed between the implementations.For example, six implementations of the Tofts model were tested using data from the QIBA DRO, 43 revealing variation in the estimated K Trans and v e values (Figure 3D).

DISCUSSION AND CONCLUSIONS
In this paper, we described the aims, processes and current status of the OSIPI open-source code repository for DCE-and DSC-MRI processing.The repository constitutes both a resource for the perfusion community to use, and a platform for testing and developing new and existing code.

Open-source code collection
The code collection currently includes implementations of the most common steps in DCE-and DSC-MRI analysis pipelines.Most of this code was not publicly available before being contributed.For most categories, multiple implementations of the same functionality are present, which provides opportunities to investigate the impact of differences in software on the reproducibility of quantitative perfusion parameters.3][54][55] Potential DSC-MRI-relevant extensions include multi-echo acquisitions, 56,57 simultaneous spin-and gradient-echo acquisitions to perform vessel size imaging, 58,59 correction for AIF dispersion 60 and partial volume effects, 61 and model based parameter estimation. 62For DCE-MRI, possible extensions include analyses incorporating finite water exchange rates, 63 other T 1 measurement approaches (for example, saturation-recovery spoiled gradient echo 64 ) and patient-specific AIF measurement based on phase or complex signal. 45,65 limiting factor was the availability of code in the Python language.Other programming languages, such as Matlab, Julia and C++, are also used in the perfusion community.In future, we may include code written in other software languages by translating contributions to Python or by using function wrappers.Future calls for code may target specific areas of perfusion functionality and the taskforce may approach specific individuals and groups following literature searches.

Testing framework
Unit tests were designed to compare the outputs of different implementations and to provide quality assurance with respect to scientific performance.An automated framework was used, to ensure that testing reflects future updates to the code and any dependencies.This will allow researchers to re-use code written by others with greater confidence.Furthermore, it provides developers with a framework that can be used to validate new software and to compare its scientific performance with that of other implementations.While this work is primarily focused on open-source software, our testing framework can also be run locally by developers of closed-source or commercial software.
There are some limitations of the testing framework.First, ground-truth reference values are often difficult to define.For DROs, the ability of software to match the reference values depends on the algorithms used to generate the data, the simulated imaging protocol (for example, the temporal resolution) and other factors.For in-vivo data, the reference values depend on the code used to process the data and will be influenced by noise, artifacts, and the processing strategy.Thus, the testing framework, while designed to assure an acceptable level of quality and to compare the quantitative outputs of different code contributions, is not intended as a means to rank or recommend specific contributions.Indeed, there may not be a single implementation suitable for all applications and use cases.For example, in the case of pharmacokinetic model implementations, there were substantial differences in the contributed algorithms: some estimated tracer concentration via a simple discrete convolution of the AIF and impulse response function, while others used more exact approximations to the convolution integral; in one set of implementations, the AIF was parameterized so that concentrations could be calculated analytically.It is expected that the nature of the underlying algorithm will affect both the accuracy and the computational efficiency, dependent on the use case: for example, the simple convolution approach may be accurate and fast only if the impulse response function and the AIF are sampled with sufficient temporal resolution.Therefore, it is left to the user to select an implementation appropriate to their needs and to review the underlying methodology.The current testing framework should nevertheless aid future initiatives to standardize and harmonize perfusion processing.
Secondly, adding new code contributions to the current testing framework requires some manual intervention by taskforce members or the user themselves.In the future it would be beneficial to automate this process.Thirdly, speed and robustness (for example, to invalid inputs) were not assessed at this stage.Nor did we review the code in a line-by-line manner to detect errors (although we note that our testing framework did detect a small number of coding errors that were corrected with permission from the contributing authors).Fourthly, the current test data does not cover parameter ranges relevant to all applications.For example, our extended Tofts model data has a maximum K trans value of 0.08 min −1 , which is below the values seen in many tumors.However, as a community-led initiative, OSIPI welcomes contributions of additional test data from members of the perfusion community.Finally, we focused on test cases where there is a reasonable expectation of obtaining a valid result.For example, when testing pharmacokinetic model implementations, we used test data that was generated and tested using the same model, that had a high temporal resolution and that yielded a single well-defined set of model parameters; it was not our aim to replicate the extensive literature on the validity and interpretations of such models as a function of tissue biology and experimental parameters.

Outlook
The OSIPI DCE-DSC code repository is an ongoing project that welcomes new members, code contributions and test data from the perfusion community.In future, we will extend the range of functionality within the repository and extend the testing framework to cover additional functionality and use cases.
A longer-term objective of the taskforce is to harmonize and integrate code contributions into a coherent code library that is validated, user-friendly and based on community-consensus methodology.The library shall also dovetail with the OSIPI contrast-agent based perfusion MRI lexicon (Taskforce 4.2), such that quantities, units, models and processes referenced in the library correspond precisely to consensus definitions.This will support standardized processing, transparent reporting and ease of replication.The development of such a library depends on the availability of funding, research software engineering expertise and perfusion community engagement.However, the repository and the code therein provide a foundation for future collaborative software development.
In conclusion, we have presented a community-led model for code sharing and testing.By facilitating the re-use of tested code and the benchmarking of new code, we expect that the OSIPI DCE-DSC code repository will be a valuable resource to researchers and developers.The repository should be of particular benefit to new researchers in the field who will not need to begin coding from scratch.We hope that this will result in improved reproducibility, reduced duplicate development, and support the wider use of perfusion imaging as endpoints in clinical trials.

F I G U R E 3
Example results from the testing framework.These are snapshots of figures presented on the test-results website: (A) Bland-Altman plot for CBF estimation showing the difference between output and reference CBF values.At the time writing one implementation was available.The gray-dashed lines indicate the tolerances used for testing.(B) Bland-Altman plots for variable flip angle T 1 mapping tests using the QIBA T1 DRO data.These show the difference between output and reference R 1 values for four different non-linear implementations of T 1 estimation.The gray-dashed lines indicate the tolerances used for testing.Each color represents a different implementation, indicated by a number in the legend.(C) Bland-Altman plot for the conversion of signal intensity to concentration, showing the difference between output and reference concentration values.The tolerances are not shown as they were outside the scale of the plot.(D) Categorical plot for the estimation of K Trans with the Tofts model.These show the difference between output K Trans values and the corresponding reference values for the three test cases with high SNR.Each test case corresponded to a different combination of K Trans and v e .