KomaMRI.jl: An open‐source framework for general MRI simulations with GPU acceleration

Purpose To develop an open‐source, high‐performance, easy‐to‐use, extensible, cross‐platform, and general MRI simulation framework (Koma). Methods Koma was developed using the Julia programming language. Like other MRI simulators, it solves the Bloch equations with CPU and GPU parallelization. The inputs are the scanner parameters, the phantom, and the pulse sequence that is Pulseq‐compatible. The raw data is stored in the ISMRMRD format. For the reconstruction, MRIReco.jl is used. A graphical user interface utilizing web technologies was also designed. Two types of experiments were performed: one to compare the quality of the results and the execution speed, and the second to compare its usability. Finally, the use of Koma in quantitative imaging was demonstrated by simulating Magnetic Resonance Fingerprinting (MRF) acquisitions. Results Koma was compared to two well‐known open‐source MRI simulators, JEMRIS and MRiLab. Highly accurate results (with mean absolute differences below 0.1% compared to JEMRIS) and better GPU performance than MRiLab were demonstrated. In an experiment with students, Koma was proved to be easy to use, eight times faster on personal computers than JEMRIS, and 65% of test subjects recommended it. The potential for designing acquisition and reconstruction techniques was also shown through the simulation of MRF acquisitions, with conclusions that agree with the literature. Conclusions Koma's speed and flexibility have the potential to make simulations more accessible for education and research. Koma is expected to be used for designing and testing novel pulse sequences before implementing them in the scanner with Pulseq files, and for creating synthetic data to train machine learning models.


INTRODUCTION
Numerical simulations are an important tool for analyzing and developing new acquisition and reconstruction methods in ✓ ✓ ✓ Mac OS coreMRI 15,18 ✓ ✓ ✓ Magnetic Resonance Imaging (MRI).Simulations allow us to isolate and study phenomena by removing unwanted effects, such as hardware imperfections, off-resonance, and others.Additionally, with the increasing use of Machine Learning models, simulation becomes even more relevant, because it can be used to generate synthetic data for training, 1,2 or to construct signal dictionaries to infer quantitative measurements from the acquired data. 3,4Moreover, simulations are an excellent tool for education and training, as hands-on experience is a great way to assimilate the theoretical and practical components of MRI. 5,6,7RI simulators can be application-specific or general.Application-specific simulators are efficient computationally and only consider a few relevant effects (e.g.POSSUM, 8,9 CAMINO, 10 and others 11,12,13 ).A common simplification is that the acquired signal is equal to the 2D/3D Fourier transform of the image, not taking into account relaxation during the acquisition, and other elements of MRI physics.On the other hand, general simulators solve the Bloch equations making them computationally intensive but usable in a wider range of applications.
Currently one of the most used general MRI simulators is JEMRIS. 14This open-source simulator considers many properties of interest in MRI, such as 0 , 1 , 2 , * 2 , Δ 0 , movement, etc. JEMRIS only uses CPU multi-threading.Other alternatives are two closed-source simulators: MRISIMUL 15 and BlochSolver. 16,17Both are accelerated through the use of GPUs.Originally, MRISIMUL did not have a Graphical User Interface (GUI), but recently a cloud-based implementation called coreMRI has become available. 18In 2017 a new open-source and GPU-accelerated simulator was introduced: MRiLab. 19The main drawbacks of MRiLab are that it does not use self-contained sequence files like JEMRIS, and that it was not designed to be extensible.
All of the previously mentioned simulators are written in C++ for speed.In practice, this may raise the bar for researchers in order to implement or modify these simulators.
The current open-source alternatives use MATLAB-based GUIs, resulting in non-intuitive interfaces.Furthermore, they do not support all Operating Systems (Table 1).
We believe that an ideal simulator should be general, fast, easy to use, extensible, open-source, and cross-platform.In this work, we developed an MRI simulation framework built from the ground up to satisfy these requirements.
To achieve these goals we made four important design decisions: programming language, compatibility with accepted standards, interface, and simulation method.
We chose the Julia programming language 20 because its syntax is similar to MATLAB (widely used by the MRI community), its excellent GPU support, 21,22 and its speed is comparable to C/C++ (Julia is a compiled language).This has been shown to be the case in other MRI applications such as image reconstruction with MRIReco.jl, 23where the authors achieved speeds on par with state-of-the-art toolboxes. 24In contrast to many other languages Julia can select the definition of a function to call at runtime via multiple dispatch.This is perhaps its most powerful feature of Julia.This allowed us to use syntax that more closely follows mathematical notation.
The inputs to our simulation framework are the scanner parameters, the phantom, and the pulse sequence.For the latter we offer the possibility to program it directly in the code or alternatively to read it from a file in the standard Pulseq format. 25,26,27The output raw data is stored in the standard ISMRMRD format. 28For reconstruction, our framework offers the possibility to use MRIReco.jl, 23any other reconstruction application that can read ISMRMRD, or direct programming of the code.
Using web technologies, we designed a GUI to improve accessibility for non-programmers and to facilitate the exploration of data and parameter tuning in a clear manner.This GUI also allows reading or writing the intermediate results.
We chose not to use an Ordinary Differential (ODE) solver, like DifferentialEquations.jl, but to handcraft an MRIspecific solver.This enabled us to use efficient solutions to the Bloch equations, and to implement an adaptive time-stepping based on information already available in the sequence.Both contributing to the simulation speed and accuracy.
We called our simulator "Koma", inspired by the Japanese word for spinning top, as its physics resemble MRI's.

METHODS
In this section we start by describing in detail the simulation framework and its implementation (2.1), we then describe the experiments we did for comparison (2.2), and finally, we showcase an application in quantitative imaging (2.3).

Overview
KomaMRI.jl has three input objects that describe the scanner, the phantom, and the sequence (Figure 1): FIGURE 1 The simulation pipeline is divided in two steps: simulation, and reconstruction.Each arrow represents a file, and the pipeline can be initiated from any stage of the data workflow.Note that we are able to both read and write ISMRMRD files from the GUI.
• Scanner: Description of the hardware specifications such as 0 , maximum gradient max and slew rate max .
Other properties, such as inhomogeneity of the field Δ 0 ( ) and coil sensitivity should also be defined here (these are not available yet in the public version).
We used the JLD2.jlpackage to implement our own HDF5compatible file formats, .scanner,.phantomand .seqk, to load these objects.For the sequence, the simulator also accepts the newest versions of the Pulseq format (versions 1.2-1.4).This format gives our simulator more compatibility and convenience since it is also read by some real scanners.
The output of the simulator is written in the ISMRM Raw Data format 28 .This allows to test the reconstruction with different sources of data, simulated or real, and also allows to link the simulator with an external reconstruction.

Physical and Mathematical Background
Koma simulates the magnetization of each spin by solving the Bloch equations in the rotating frame with the gyromagnetic ratio, = , , T the magnetization vector, and the effective magnetic field.0 is the proton density, 1 and 2 are the relaxation times, and Δ is the off-resonance, for each position.
The solution of Equation ( 1) for a single spin is independent of the state of the other spins in the system, a key feature that enables parallelization. 15ur simulator uses a first-order splitting method 29 to simplify the solution of Equation ( 1).This reflects mathematically the intuition of separating the Bloch equations in a two-step process, rotation and relaxation, for each time step Δ = +1 − (Figure 2).The rotation is described by with initial condition (1) ( ) = ( ), and the relaxation is described by with (2) ( ) = (1) ( +1 ).Then, the magnetization at the end of the time step is ( +1 ) = (2) ( +1 ).Furthermore, we define two regimes in the pulse sequence: excitation and precession.During the latter, the excitation fields are nulled: = = 0 in Equation (3).In the precession regime, the operator splitting method gives an exact solution, whereas during the excitation regime the method has Δ 3 convergence. 30rom this point forward, we will drop the vectorial notation for and 1 , and we will use = + i and 1 = 1, + i 1, to describe the simplifications made in each regime.
The rotations during the excitation regime are stored in their spin-domain or SU(2) representation: characterized by the Cayley-Klein complex parameters or Spinors for short ( , ). 31 Spinors can represent any 3D rotation as To solve Equation (3) the parameters for the Spinors are = 1 ∕ ‖ ‖, = ∕ ‖ ‖, and is the phase accumulated due to , the effective magnetic field.Then, the application of a Spinor rotation to a magnetization element is described by the operation For the precession regime, all the rotations are with respect to , and therefore they can be described with a complex exponential applied to the transverse magnetization where is defined in Equation ( 8).Finally, to solve the relaxation step described in Equation ( 4) the magnetization is updated by The presented model solves the Bloch equations for a single isochromat, and by itself cannot simulate all the physical properties of interest in MRI.Other simulators implement more general equations like Bloch-Torrey, 32,14 to simulate diffusion, and Bloch-McConnell, 33,19,34 to simulate chemical exchange 35,36 , magnetization transfer 37,38 and spin-lock effects. 39,40Besides non-rigid motion, we assume constant magnetic properties over time, so dynamic contrastenhanced 41 would require an extension of the model.
Nevertheless, incorporating other realistic effects like * 2 and diffusion could be easily added by increasing the number of spins.On the other hand, the Bloch-McConnell equations could also be implemented accurately with the operator splitting method. 29While Koma is not as feature-complete as other well-established simulators, we focused on improving its speed, extensibility, and ease of use, and will keep adding more features in the future.

Simulation blocks, Regime Switching, and Sequence-Aware Time Stepping
To reduce the memory usage of our simulator, we subdivided time into Nblocks (Figure 3).Koma classifies each block in either the excitation regime or the precession regime before the simulation.
For precession blocks, we can improve the accuracy of the simulations by using the integral representation of Equation ( 10), obtained by applying the limit as Δ → 0 of iterated applications of Equation ( 10), giving a phase of Assuming that during the -th simulation block ( ∈ , +1 ) the gradients ( ) are piece-wise linear functions, and ( ) and Δ ( ) are approximately constant, then, if we use the trapezoidal rule to obtain the value of this integral, we will obtain an exact result by sampling just the vertices of ( ), greatly reducing the number of points required by the simulation.We will only need intermediate points in the case of motion and for recording the sampling points as required by the Analog to Digital Converter (ADC).The user can control the time between intermediate gradient samples with the parameter Δt (Figure 3).
We can do something similar with 1 ( ) in the excitation regime.If we assume 1 ( ) is a piece-wise constant function (or concatenation of hard pulses), then Equation (9) will give an exact solution to Equation (3). 42The parameter Δt_rf manages the time between RF samples (Figure 3).Thus, Koma uses the rationale mentioned above to: (1) call different methods based on the regime of each block, while also (2) obtaining a variable time stepping schedule that adapts to the sequence needs.We named the latter sequence-aware time stepping (Figure 3).While this concept is not new per se, 15,19,43 in Koma we directly calculate the required simulation points from Pulseq files or a designed Sequence and then provide a convenient DiscreteSequence type to the user to simulate with our  default Bloch<:SimulationMethod or to use in their custom simulation method.We comment further into this in subsection 2.1.7.

GPU/CPU Parallelization
One key advantage of using Julia is its support for CPU parallelization using macros like Threads.@threads before a for loop, or the package ThreadsX.jl.Using these resources, we increased the simulation speed by separating the Bloch calculations into Nthreads.This separation is possible as all magnetization vectors are independent of one another.To ensure thread safety, we stored the acquired signals per thread in different matrices to add them later into a signal matrix sig[ ] (Figure 3).Julia also has native GPU support using the package CUDA.jl.This package supports operations using CuArray types which run as GPU kernels, but direct GPU kernel programming is also possible.To transfer variables between CPU and GPU memory, we used the packages Adapt.jl and Functors.jl.These packages let us transparently transfer our data types from CPU to GPU without losing the type abstractions.Then, the transfer looks like obj = obj |> gpu.Our data types, Phantom, DiscreteSequence, and Mag<:SpinStateRepresentation were used in this way, to then perform the simulation inside the functions run_spin_excitation! and run_spin_precession!.
It was important to ensure the type stability of our simulation functions to give enough information to the compiler to infer the concrete type of every variable, enabling high performance.Moreover, we had special care to perform in-place operations and not generate unnecessary variable copies using the @view macro in the functions run_spin_excitation_parallel!, run_spin_precession_parallel!, and run_sim_iter!.Finally, we used NVIDIA Nsight Systems to profile GPU performance with the NVTX.@range and CUDA.@profile macros.

Reconstruction
For the image reconstruction, we used MRIReco.jl, 23a reconstruction framework written in Julia with comparable performance to BART, 24 a state-of-the-art reconstructor.Coincidentally, as of version 2.9, JEMRIS uses BART as reconstructor.The obtained image can be saved in multiple formats such as .mat,.dcm,and others, as shown in Figure 1.
The reconstructor can also load directly the raw data from an ISMRMRD file, skipping the simulation.

Graphical User Interface
For the GUI we used Blink.jl, a framework to develop applications using web technologies.This package is a wrapper of Electron, and can serve HTML content in a local window.The .The menu at the left of the GUI tries to mimic the pipeline of Figure 1, where the blue button calls Koma and the green button calls MRIReco.jl.

Extensibility
As we mentioned in the Introduction, in Julia, functions use different methods based on the input types via multiple dispatch.We used this to specialize the simulation functions for a given sim_method<:SimulationMethod specified in simParams.For a given simulation method, the function initialize_spin_state outputs a variable Xt<:SpinStateRepresentation that is passed through the simulation (Figure 3).For the default simulation method Bloch, the spin state is of type Mag, but can be extended to a custom representation, like for example EPGs 44 or others.Then, the functions run_spin_excitation! and run_spin_precession! can be described externally for custom types sim_method and Xt, extending Koma's functionalities without the need of modifying the source code and taking advantage of all of Koma's features.

Experiments
To test Koma, we performed two kinds of experiments: one to compare the quality of the results and the execution speed, and the second one to compare its usability.

Simulation Accuracy and Speed
To test the accuracy of our simulator, we compared Koma with the latest version of JEMRIS (v2.9), which has been compared with real MRI acquisitions. 45.
We did 2D experiments with different number of spins to look at the scalability of the simulations.The experiments were as follows: (d) EPI acquisition of the same brain of (c) without offresonance but with motion.We applied a displacement field in the -direction of ( , ) = with = 0.1 m∕s.
(e) Spiral acquisition of the same brain of (c) without offresonance.
The EPI was a single-shot sequence with TE = 100 ms.The spiral acquisition was a single-shot sequence with TE = 0.1 ms.Both were Gradient Echo sequences with the same FOV = 230 × 230 mm and spatial resolution/voxel size Δ = 2.3 mm, while the resolution of the phantoms were Δ obj = 1 mm.All the images were reconstructed in a 100×100 matrix with FOV = 230 × 230 mm.Both sequences used hard RF pulses.
For the reconstruction of the spiral data for both simulators we used MRIReco.jl, and not BART for JEMRIS since it uses a different implementation of the NUFFT algorithm, and we wanted to keep the image comparison as fair as possible.
To compare the simulation accuracy, for each experiment the signals were normalized by JEMRIS' signal maximum, and then we calculated the Mean Absolute Error (MAE) between them, MAE ( The errors are shown as a percentage of JEMRIS' signal maximum ( -space center).
All these examples were run in a computer with an 11th Gen Intel Core i7-1165G7 @ 2.80GHz×8, with four physical cores, 16 GB RAM, a GPU GTX 1650 Ti (4 GiB of memory), and an eGPU RTX 2080 Ti (11 GiB of memory).For these examples, we only reported times for the faster GPU RTX 2080 Ti.
On the other hand, we compared the speed of our simulations against MRiLab, an open-source GPU-accelerated MRI simulator.For this, we replicated MRiLab's gradient echo multi-shot spiral sequence "PSD_GRE3DSpiral" (TE = 50 ms, TR = 10 s, and Δ = 2.5 mm), which contains a slice-selective sinc RF pulse with a slice thickness of 6 mm, in conjunction with an 8-shot spiral acquisition.We selected this sequence to stress test both simulators, as it has arbitrary waveforms for both RF and gradients pulses.We used the standard resolution (Δ obj = 2 mm) 3D brain phantom present in MRiLab.We followed their simulation procedure and only simulated in a slab of spins = 20, 630 contained within the slice selection.We paid special attention to matching the number of time points and the spiral waveforms 47 to get comparable results.We ran this simulation in the CPU, and both GPUs, for both simulators 20 times for each device and calculated the mean and standard deviation.
We used the same computer as in the accuracy experiments, but we ran the test with both GPUs.We also simulated a similar sequence in JEMRIS to have as a reference.

User Experience
To compare the ease of use for first-time users, we designed a pilot experience with students of an Imaging course in Engineering, where they learned some fundamentals of MRI.The experience consisted in identifying the artifacts generated by the presence of different degrees of off-resonance and motion (like examples (c) and (d) of Figure 5).They were to compare the artifacts of Gradient Echo and Spin Echo EPI acquisitions.
There were 19 students in the class.We divided them in two groups.The first half of the students programmed the Gradient Echo sequence and the other half the Spin Echo sequence.Each student performed 6 experiments per simulator with brain phantoms ( spins = 5, 890) with different levels of off-resonance and motion.
For the first part of the assignment, which tested from the installation to the first simulation, they used JEMRIS v2.8.3, MRiLab 1.3, and Koma v0.3.8.For the second part, they used JEMRIS and Koma to generate the EPI sequences and different phantoms separately.We made tutorials to help them install the simulators and gave them examples of how to set up a simulation in all software packages.They used the same phantoms and sequences.For the sequence, they had to calculate the timings and gradients strengths with the information taught during the course.
To gather information about how long it took them to perform each task and their perceived level of difficulty, they filled out a form (available in Koma's GitHub) and returned it with their reports.The difficulty level of each task was rated on a Likert scale from 1 to 5, with 1 being hard and 5 being easy.
The students ran the simulations on their personal computers.

Magnetic Resonance Fingerprinting
We used our simulator to showcase its potential, simulating a quantitative MRF acquisition. 3he sequence was a radial bSSFP sequence ( spokes = 158) to acquire the signal fingerprints for each pixel.The MRF sequence had an initial inversion pulse with TI of 50 ms.For the first 500 TRs of the sequence, we used a Perlin-noise 48 flip angle pattern, and for the last 500 TRs, we used a noisy sinusoid flip angle pattern between 0 deg and 80 deg similar to Ma. 3 The TRs were randomly distributed between 14.5 and 18.0 ms, and a constant TE of 5 ms was used.A dictionary was generated to do the fingerprint matching with the following ranges of 1 and 2 : 1 (300 to 2, 500 ms, every 10 ms) and 2 (40 to 350 ms, every 4 ms).
The Phantom object was a 2D axial brain constructed using the BrainWeb database 46 with 6, 506 spins.Two variations of the sequence were tested, by rotating the spokes is probably explained by CPU-to-GPU memory transfers as the external GPU could be bottle-necked by the Thunderbolt bandwidth capacity.We put most of our attention on the GPU performance, specifically to reduce the number of memory transfers to the GPU by profiling with NVIDIA Nsight tools, which are easily accessed within Julia.
Regarding the time taken to install each simulator, most of the students were able to install Koma (mean 13.2 min), JEMRIS (mean 33.8 min), and MRiLab (mean 16.9 min) in less than 40 minutes.Their first simulation took them more time in JEMRIS (mean 19 min) and MRiLab (mean 13.9 min) than in Koma (mean 5.7 min).31% of the students could not simulate on MRiLab (6 students using Mac OS), so we decided to only use Koma and JEMRIS for the rest of the activities.
To program the pulse sequence, students found that JEM-RIS's GUI was slightly better (mean 3.85/5) than Koma's code-based pulse programming (mean 3.69/5).This makes sense since the students' self-reported computational expertise was less than expected (Q1/median/Q3=1.6/2.2/2.6,where 3 meant "I can implement my ideas easily in one programming language").This feedback helped us improve the pulse sequence programming by implementing our Pulseq file reader, which enables programming the sequence in JEM-RIS's GUI.
Nevertheless, the students also commented that it was unintuitive that the gradients' strengths in the JEMRIS's GUI were not in mT∕m but scaled by , so JEMRIS = rad∕us∕mT ⋅ mT∕m ≈ 0.267538 ⋅ mT∕m .This caused many failed simulations, which prompted us to do an additional tutorial session.
They also readily modify the phantoms with different levels of off-resonance and motion with JEMRIS (mean 4.27/5) and with Koma (mean 4.24/5).
Finally, their reported median simulation speeds were 8.4 times faster with Koma than with JEMRIS (Figure 6), and 65% ended up recommending Koma over JEMRIS.

Magnetic Resonance Fingerprinting
The MRF simulation took approximately 20 seconds to finish.Results of using different sequences and reconstruction algorithms for our MRF simulated acquisition are shown in Figure 7.The best overall pipeline was to rotate the spokes by the tiny golden angle and then reconstruct using LRTV, which had the lowest MAE's of [WM-T1, GM-T1, WM-T2, GM-T2]= [55, 26, 15, 13] ms.Thus, we reproduced results from the state-of-the-art in MRF acquisition just by using simulations. 49,51his showed the flexibility of our simulator to test novel sequences.Ideas proven in simulation could potentially be directly executed on a Pulseq-compatible scanner. 25

DISCUSSION
Efficiency.Similarly to other MRI simulators, 43,15,19 Koma's variable time-stepping accelerates the simulations, nevertheless, this is not the only contributing factor to efficiency.Another factor is that Koma chooses a different simulation method depending on the sequence regime (excitation or precession).For RF blocks, the method used rotates the magnetization assuming a constant effective field for each time step.For the rest of the sequence (gradient-only or free precession blocks) we assumed a linear effective field for each time step and used trapezoidal integration to estimate the accumulated phase.These methods offer accurate solutions given that we chose the time points carefully, and also that we approximately satisfy the assumptions described in Section 2.1.3,which is generally the case in MRI (even with motion, see Figure 5).Thus, prior knowledge of the MRI physics and the expected characteristics of 1 and ( ) provides insight to choose a less computationally intensive method needed for an accurate simulation.This was seen in the students experience (3.2) where they used a previous version of the software that had uniform time steps.Even then, Koma's MRI-specific solver outperformed the CVODE general ODE solver.This result is not explained by the use of GPUs, as most of the students did not have one in their personal computers.
In summary, the speed acceleration in our simulator comes from the time-stepping procedure, from the switching

Reconstructions
Full-Dict LRTV-5 Golden angle between regimes, and from the CPU parallelization and GPU implementation.But there is still room for growth, we can still improve our CPU performance (as shown in Table 2), and we can do the same for the GPU implementation.Currently, we are just performing the operations with CuArray types.A proper implementation of some of the functions as GPU kernels, which is also possible in Julia, will potentially further accelerate our simulations.For example, the method to simulate RF-blocks with long soft RF pulses (like adiabatic pulses) is not as efficient as one would expect.The code has already been set up for implementing GPU kernels by using functions that perform in-place operations without returning a result.Open Source and Community Standards.Our simulator is open-source, and it is already available on GitHub (github.com/cncastillo/KomaMRI.jl).Furthermore, we seek contributions from the community.For this, we are currently developing documentation with examples using and extending Koma's functionalities.We made and continue to make an effort to make Koma as modular as possible to facilitate its modifications.
We used community-driven and public file formats to increase reproducibility.Koma writes and reads ISMRM raw data, making it compatible with other reconstruction software.Koma's GUI can display and reconstruct raw data acquired on an actual scanner using MRIReco.jl.
We can also read the newest versions of the Pulseq standard, enabling the generation of the sequence directly in Koma, in MATLAB's Pulseq toolbox, or by using JEMRIS' GUI.If these standards change, the file readers could easily be updated.Our simulator is one of the first to receive Pulseq files as the sequence definition.This will allow us to customize pulse sequences and then test them in real scanners. 53he use of MRIReco.jlbrings state-of-the-art reconstructions to Koma.This reconstructor can do direct or regularized iterative reconstructions (FISTA, 54 ADMM, 55 and others).This software also brings flexible and easily customizable reconstructions (see MRIReco.jl documentation).
Similarly to MRIReco.jl, we used a dictionary (Dict) to store all the simulation parameters, which can be easily updated to add new parameters.All these parameters are saved to the raw data for later inspection.
Maintainability and Reproducibility.While performing the experiments for this work, we experienced problems running JEMRIS and MRiLab in modern systems.For JEMRIS, we could not perform multi-threaded simulations out of the box, as some library versions are not supported anymore.We had to compile the package and change the source code to fix some problems with up-to-date versions of CVODE.On the other hand, for MRiLab, the GPU simulations had a similar problem, as it assumed that the system had an old version of CUDA.We had to fix the source code and makefiles to compile for modern versions of CUDA and the MEX libraries.Despite this, we were not able to recover all the functionalities, like the ability to export the signal as ISMRMRD, as it uses a deprecated version of the library.
We believe this experience perfectly represents a common problem for the MR community: maintainability and reproducibility of the software we produce.While not perfect, we believe that Julia helps in minimizing many of these problems.Its modular approach incentives the separation of packages with minimal functionality, which are easier to maintain.Furthermore, all Julia packages are associated with a GitHub page, and to be registered, each new package version is required to maintain or improve the Code Coverage of the tests, and pass the Continuous Integration which assesses the package in multiple versions of Julia and operating systems: Windows, Linux, and Mac.Moreover, even if a package is no longer maintained by the creator, if shared, the Manifest.toml of a package contains all the specific versions of each module, and the environment can be replicated by using the command Pkg.activate("."),enablingreproducibility.We shared not only the Manifest.toml,but all the code used to replicate the simulations presented in this work in Koma's GitHub.
Julia not only brings high-performance and easy-to-read code but also forces package developers to produce professional software.Decreasing the technical debt passed to new researchers.
Limitations.The current implementation of Koma suffers from the same limitations as other Bloch simulators, 19 which means that some intra-voxel effects, like * 2 and diffusion, require many spins per voxel, which in turn affects the simulation speed.
An important issue with Bloch simulators is the potential aliasing or spurious echoes when simulating gradient spoilers.They arise due to the finite separation between spins (discrete delta functions) which produces overlapping or aliasing in the Fourier domain.The simplest solution, as before, is to increase the number of spins at the cost of extra computational load.To solve this problem, alternative intermediate solutions should be explored.One of them is to use a different model to describe the spins' state as the one used by the hybrid Bloch-EPG. 56hile it is straightforward to implement, we do not yet have multiple coils in our simulator.The lack of coils precludes its ability to simulate conditions at ultra-high fields where coil combination is an issue, 57 or in highly accelerated sequences where the coil noise characteristics are essential, like in wave-CAIPI. 58Also, our simulator is not currently considering some effects, including eddy currents, concomitant gradients, temperature changes, or the drift on the k-space center produced by long readouts. 59e designed our simulator to run reasonably fast on a student notebook.More testing is required for more complex scenarios in more powerful servers.Our software has not yet been tested in a multi-GPU system like in Xanthis et al., 60 and more work is needed to take advantage of multiple GPUs.

CONCLUSIONS
In this work, we presented a new general MRI simulator programmed in Julia.This simulator is fast, easy to use, extensible, open-source, and cross-platform.These characteristics were achieved by choosing the appropriate technologies to write easy-to-understand and fast code with a flexible GUI.Furthermore, our simulation method exploits MRI physics and information about the sequence to reduce the simulation times.
We compared the accuracy of our simulations against JEMRIS, in which we showed high accuracy with MAEs below 0.1%.We also compared the performance against MRi-Lab, showing slower CPU times but GPU performance as much as 6 times faster using an RTX 2080 Ti eGPU, and 2.6 times faster using a GTX 1650 Ti.
We also tested the ease of use of Koma with students without previous knowledge of MRI.Their feedback helped us improve Koma by adding compatibility with communitydriven standards like Pulseq and the ability to load JEMRIS phantoms.Thus, Koma can use the same sequences and phantoms utilized in JEMRIS.We can also receive simulations from JEMRIS, or scanner-generated raw data, and reconstruct them in our GUI using the exported ISMRMRD file.Moreover, we can export our raw data to the same format and reconstruct the images externally.
Finally, we showcase the potential to quickly test novel pulse sequences for quantitative MRI before implementing them in the scanner by simulating different MRF acquisitions.

FIGURE 2
FIGURE 2 Solution of the Bloch equations for one time step can be described by (1) a rotation and (2) a relaxation step.

*
These functions can be extended by changing the type of sim_method and/or Xt.The default types are Bloch<:SimulationMethod and Mag<:SpinStateRepresentation respectively.

FIGURE 3
FIGURE 3The sequence seq is discretized after calculating the required time points in the wrapper function simulate.The time points are then divided into Nblocks to reduce the amount of memory used.The phantom obj is divided into Nthreads, and Koma will use either run_spin_excitation! or run_spin_precession! depending on the regime.If an ADC object is present, the simulator will add the signal contributions of each thread to construct the acquired signal sig [ ] .All the parameters: Nthreads, Nblocks, Δt_rf, and Δt, are passed through a dictionary called simParams as an optional parameter of the simulate function.

FIGURE 4
FIGURE 4 Koma's GUI: (a) Sequence (interleaved spiral), (b) -space, (c) Phantom, (d) Raw Data, (e) Image, and (f) -space data (Fourier transform of (e)).The menu at the left of the GUI tries to mimic the pipeline of Figure1, where the blue button calls Koma and the green button calls MRIReco.jl.

5 FIGURE 6
FIGURE 6The students' simulation times with their PCs.Each dot represents one of 6 experiments they needed to make.We only show the results for the 11 students that successfully simulated all the examples.

FIGURE 7
FIGURE 7Comparison of different MRF acquisitions and reconstruction (GT) Ground-truth, (Full-Dict) reconstruction using the full dictionary, and (LRTV-5) Low-rank with Total Variation reconstruction with rank of 5 for the dictionary.

TABLE 1
Overview of general MRI simulators.