Efficient Qubit Routing for a Globally Connected Trapped Ion Quantum Computer

The cost of enabling connectivity in Noisy-Intermediate-Scale-Quantum devices is an important factor in determining computational power. We have created a qubit routing algorithm which enables efficient global connectivity in a particular trapped ion quantum computing architecture. The routing algorithm was characterized by comparison against both a strict lower bound, and a positional swap based routing algorithm. We propose an error model which can be used to estimate the achievable circuit depth and quantum volume of the device as a function of experimental parameters. We use a new metric based on quantum volume, but with native two qubit gates, to assess the cost of connectivity relative to the upper bound of free, all to all connectivity. The metric was also used to assess a square grid superconducting device. We compare these two architectures and find that for the shuttling parameters used, the trapped ion design has a substantially lower cost associated with connectivity.


Introduction
Quantum computers are expected to solve classically intractable problems, such as accurately simulating the dynamics of large molecules [1,2], which would greatly impact both material science and the pharmaceutical industry. In the finance industry even minor advantages can lead to significant returns [3]. Phase estimation [1] (for quantum chemistry) and Shor's algorithm [4] (for breaking RSA encryption), are two algorithms which promise an exponential speed up [5,6], but they both require a fault tolerant device for useful applications. Error correction techniques, such as the surface code [7,8], which facilitate a fault tolerant device have a very large physical to logical qubit overhead, requiring physical qubit numbers in the range of 10 5 − 10 8 . In recent years there has been growing interest and algorithmic development for Noisy Intermediate Scale Quantum (NISQ) computers [9] which do not require fault tolerance. The realization of quantum supremacy [10] represents a major milestone for such systems. Hybrid quantum algorithms such as the Variational Quantum Eigensolver [11,12], may provide an exponential speed up as compared to the classical counterparts. Assessing the capability of NISQ devices to run quantum algorithms is quite distinct from that of full-scale error-corrected quantum computers. For NISQ devices, this typically involves quantifying the achievable circuit depth of the device which represents the number of sequential gate operations that can be executed within the available coherence time. The analysis can be done in reverse to instead tailor near term algorithms to a specific device.
Superconducting circuits [13] and trapped ion devices [14] are two of the leading quantum computing platforms. In particular, one architecture which offers a scalable approach to trapped ion quantum computing, involves a large connected ion trap array across which individual ions are physically shuttled [15,16]. The architecture provides a solution to scale to very large qubit numbers, which will be a requirement to run many important algorithms.
To utilize this architecture it is necessary to have a routing algorithm which can move large numbers of ions across the square grid array in parallel, and in an efficient manner. In this manuscript, we provide an ion routing algorithm which can enable arbitrary global connectivity, and we quantify its efficiency relative to a lowerbound. When considering the achievable circuit depth of a NISQ device, one must include factors such as connectivity, gate fidelity, and the coherence time of the qubit. We provide an error model which can be used to estimate the achievable circuit depth of this quantum computing design as a function of experimental parameters.
Quantum computing architectures vary greatly, from the underlying system which represents the qubit, the available quantum gate set and to the means by which qubit connectivity is enabled. For superconducting ar- Figure 1: (A) A depiction of a single X junction which is repeated to form a grid on which the ions are restricted to, with zones dedicated to specific tasks. (B) A 3d representation generated from the simulation tool, where the green grid represents the X-Junctions, which the ions (red spheres) are restricted to, and the blue squares represent gate zones. Arrows represent the lane priority of the routing algorithm. (C) A 3d representation generated from the simulation tool to demonstrate the decongestion of X-Junction centres. Ions assigned to interior gate zones (blue square labelled D) have the closest X-Junction centre (labelled B) as their destination (one space off the centre because it is an area of lower trap stability (labelled A and C)). The ion in square A has been assigned to the local gatezone and it will travel back and forth between positions A and C directly, by ignoring the lane system priority, to decongest for ions still travelling to their destination.
chitectures, qubits are stationary and connectivity is enabled through sequences of swap gates via nearest neighbor interactions, which will incur a high gate overhead for globally connected algorithms. For square grids with nearest neighbor connectivity, the best known method for globally connected algorithms on N qubits scales with an overhead of Θ(N 0.5 ) [17], although it is only logarithmic if non-planar architectures are considered [18,19] and optimisation of this swapping procedure is necessary to maximize performance [20,21]. The characteristics of the desired algorithm will dictate the degree to which a device with inherent all to all connectivity outperforms a device which has a cost associated with enabling connectivity [22]. The way in which connectivity is enabled varies greatly even within the trapped ion architectures. Architectures with stationary ions confined to a linear string benefit from global connectivity and multi-qubit gates [23,24], however as the number of ions co-existing in a single trap increases, it becomes progressively challenging to maintain key device specifications, such as gate fidelity. Furthermore, as the ion number N increases, gate times increase as √ N , and the increasing requirement on the number of motional modes will eventually lead to frequency crowding [25]. There are two main approaches to enable connectivity between modules, one involves the use of photonic interconnects [26,27], while the other, as described in the design analysed in this manuscript, utilises electric fields to connect adjacent modules. The connected modules form a continuous 2D plane, giving rise to connection speeds between modules orders of magnitude higher as compared to photonic interconnects.
The quantum computing architecture investigated in this manuscript consists of an ion trap array on a microchip, giving rise to a 2D grid to which all ions are restricted. The ions (where each ion represents a physical qubit) do not have to be stationary and are instead able to traverse the grid via shuttling operations. Entangling operations are performed by bringing the two (or more) desired ions to the same region of space (a gate zone). The smallest repeated unit of the architecture is the X-Junction (see Figure 1A). Logic gates may be performed by applying static voltages to a microchip in the presence of globally applied microwave fields and a local magnetic field gradient [28]. An alternative approach instead makes use of pairs of laser beams to execute quantum gates [29], but this may be more challenging to implement for large numbers of qubits. This electronic microwave-based architecture has a clear path towards scaling to large qubit numbers [16], and constitutes a practical blueprint for a quantum computer capable of solving some of the hardest problems, such as breaking RSA encryption. Furthermore, arbitrary two qubit connectivity can be enabled in near term devices relying only on ion shuttling operations (which can have a state fidelity comparable to stationary trapped ions [30]), without sequences of swap gates, as may be required in other architectures. To run an algorithm on a quantum computer based on this design, one first needs a routing algorithm which efficiently enables arbitrary connectivity between the ions in the square grid device, which is the main challenge addressed in this manuscript. Finding the optimum instruction set for each individual ion in real time is intractable and so we have solved the problem in a heuristic manner. The solution is motivated by one-way traffic flow with additional rule sets to deal with junction centres more efficiently. We quantify the efficiency of our approach relative to an unattainable lower bound and investigate its flexibility with regards to device shape, and ion density. We use these results, in combination with an error model we propose, to investigate the achievable depth and quantum volume for this design as a function of experimental parameters. Quantum volume (QV) is a conceived metric for quantum computational power designed to enable sincere comparison between architectures [31,32], and we will discuss it in more detail in section 3.2.
The remainder of this manuscript is organized as follows: in section 2 we discuss the problem specification and introduce an efficient qubit routing algorithm. In section 3 we quantify the efficiency and versatility of our routing algorithm and present results on the achievable depth and quantum volume as a function of experimental parameters.

Problem specification and routing algorithm
In the design being investigated here, ions (qubits) are restricted to a square grid, (see Figure 1B), which consists of an array of repeated X-Junctions (see Figure 1A), each containing a single gate zone. Ions must first be shuttled (physically moved) into the gate zones for gates to be performed. The X-Junctions have a defined spacial resolution, which arises from the fixed number of electrodes on each arm but ions may be moved continuously. The gate zones enable both single and two qubit gates. To perform a quantum algorithm on this device it must first be decomposed into the native gate set, which can be optimized [33]. A decomposed quantum algorithm is defined by multiple rounds of gates, ideally all the required gates of an individual round will be applied in parallel, however the qubit number, gate density of the algorithm, and the number of gate zones will dictate the gate round overhead. In this architecture, each gate round is further broken into two parts, a routing sequence, where ions are shuttled into gate zones, which is then followed by the application of gates. We use the terminology "shuttling" to refer to the act of moving ions in the device, and "routing" to refer to the higher order logic of the shuttling. In this design, gates cannot be applied concurrently with shuttling. When the required number of gates in an individual round exceeds the number of available gate zones it is necessary to have multiple rounds of shuttling and gates, e.g. a gate round overhead of 2 would imply the need for: shuttle, apply gates, shuttle, apply gates. The shuttling round, which enables the connectivity, is the focus of this manuscript. When designing the routing algorithm, we optimized for the total time taken to enable global connectivity. Naive routing algorithms would not converge on a solution as ions with opposite travelling directions meet and cause permanent blockages. Positional swaps between ions have been demonstrated experimentally [34] and they would greatly simplify the required routing procedure, however they have been shown to incur a relatively large time and fidelity cost and for this reason we sought a solution that does not use swaps. We compare routing with and without swap operations in section 3.
To solve the problem of enabling arbitrary connectivity for this quantum computer design, we have created a simulation tool which represents the devices as a digitized square grid where each position may either contain an ion or not. The ions are distributed across the grid and a quantum circuit (list of required two-qubit gates, i.e., ions that must be connected) is inputted. When bench-marking the device a randomly generated, globally connected, circuit was used. In order to assign ion (qubit) pairs to gate zones, we employ a greedy approach, assigning each pair to the nearest available gate zone (i.e., minimum combined distance of travel for the two ions), and addressing the pairs in an arbitrary order. This greedy approach is sufficient for a proof of principle using this prototype ion-routing algorithm, however we note that it may not yield the optimum gate-zone designations overall. To this end, a more sophisticated optimisation may be considered in future work, but we note that such combinatorial optimisations are generally hard problems themselves. At each time step in the simulation, each ion is evaluated and moved sequentially according to the routing algorithm, which involves assessing it's location, local environment and destination.
The routing algorithm assigns alternating direction priorities to each lane of the square grid. The top-most horizontal lane is a right only lane, the lane below it is left only, and so on, and this also applies to vertical lanes (see Figure 1B). We ensure that the outer perimeter of the device is a clockwise loop regardless of the number of lanes, so that all gate zones can be reached, which means that odd size devices, e.g. one which consists of 3 by 3 X-Junctions, will not have fully alternating lane directions and instead will have, right, left, left, and up, down, down. We define a square grid device formed from M by M X-Junctions to be of device size M. We preferentially position gate zones on the exterior of the device where possible (on the outer arms of the perimeter Xjunctions). Exterior gate zones are more favourable for routing as waiting ions do not block the movement of other ions. For square devices the number of interior gate zones scale with device size as (M − 2) 2 and the exterior gate zones scale as 4M − 4, which results in a cross over point at device size 7 (98 qubits at 2 per X-Junction).
The centres of X-Junctions are decision points, where an ion will follow the lane priority towards its destination. Ions can enter the outer arms into the exterior gate zones only when it allows them to reach their assigned destination. Ions which are not destined to a gate zone during a given shuttling round have their destination set to their current location, and therefore only move to decongest. During development of the routing algorithm, a major bottleneck identified was congestion at interior gate zones. Devices larger than 2 by 2 have interior gate zones, and the ions waiting there can cause permanent blockages or unnecessary movement depending on how they are handled. To remedy this problem an additional feature was included, in which ions assigned to interior gate zones wait at the closest available X-Junction centre, where they are able to decongest efficiently by temporarily ignoring the lane priority (see Figure 1C).

Results
In section 3.1 we assess the efficiency and versatility of our routing algorithm. In section 3.2 we present an error model which utilizes the routing algorithm and includes experimental parameters such as gate fidelities, coherence time, ion loss and shuttling speed. We then use this error model to estimate the achievable depth and quantum volume of quantum computers based on this architecture.

Assessing the Routing Algorithm
In this section we characterize the performance and flexibility of our routing algorithm, which we refer to as lane priority routing. Randomly generated depth 1 circuits on N qubits consisting of N/2 two qubit gates were iterated sufficiently to represent the requirement of global connectivity. After each iteration we count the total number of time steps which were required (τ), which can be converted into a total time (s) by considering, the estimated speed at which one can shuttle between adjacent X-Junctions. At each iteration, a lower bound is calculated for that particular set of pairings, which is equal to the minimum number of time steps that will enable connectivity. To calculate the lower bound it is assumed that, qubits (ions) take the shortest path towards their destination and swap with no time penalty (i.e. the time required for an ion to move one discrete step is independent of whether a swap is performed or not). For a particular iteration, the ion with the greatest distance to travel is identified, and the number of spacial steps The relative frequency distribution of passes through X-Junction centres for four different device sizes, 4x4 (N=32), 6x6 (N=72), 8x8 (N=128), 10x10 (N=200). Red bars: Qubits assigned to exterior gate zones. Green bars: Qubits assigned to interior gate zones. 300 iterations of the globally connected depth one algorithm were used to generate a representative sample, and the frequency is scaled accordingly. between its starting location and its destination is equal to the lower bound.
The average shuttling time required to enable the global connectivity can then be compared to the lower bound as shown in figure 2 A. These results are for devices with perfect two qubit gate parallelizability, i.e. there are two qubits initialized per X Junction. We would hope for the total shuttling time to scale linearly with device size, M , because randomly selected distances in a square scales linearly with the length of the square. Both our routing procedure and the lower bound scale with device size with a gradient of 1.82 and there is a constant overhead which becomes less significant the larger the device is. The scaling for total shuttling time, τ, as a function of qubits, N , where N = 2 × M 2 is τ = 1.3 (3) √ N + 2 (5), the fit and standard error were calculated using linear regression. An oscillating pattern on the lane priority routing results is noticeable with its relative magnitude decreasing with device size, which results from even sized devices outperforming odd sized devices. Odd sized devices (for example a device of 3 by 3 X-Junctions) cannot fully realize the alternating lane priority because we ensure that the outer perimeter lane is always a clockwise path.
The routing algorithm is flexible and works well for a wide range of qubit numbers for a given device size. Figure 2 B shows the shuttling dependence on qubit number for qubit densities less than or equal to 2 per X-Junction, i.e. with full gate parallelizability. The oscillating pattern resulting from odd and even device sizes is more notable. Shuttling time increases for both the lane priority routing and lower bound as more qubits are added to a device of static size, and peaks at a density of two qubits per X-Junction. The main criteria we optimized for when creating the routing algorithm was the total time. To calculate the achievable circuit depth at which a device can run, the total error will not just be a function of the total time, but also include factors such as gate fidelity and ion loss.
Traversing an X-Junction will have a corresponding ion loss rate which may be higher than the loss associated with linear shuttling. In order to quantify the associated error we have used our simulation to count the number of times qubits are expected to move through an X-Junction centre. The implications of these results for achievable depth will be explored in the following section. In figure  3 A the mean number of passes through an X-Junction, X count , is plotted as a function of qubit number with vertical lines corresponding to a single standard deviation, and the dependence is well described by the following √ N + 2 (2). The distribution of passes is investigated in 3 B for four different device sizes, 4, 6, 8 and 10. The qubits are separated into two data sets, according to whether they are assigned to an interior or exterior gatezone. Across all device sizes investigated the maximum passes did not exceed 4x the stated mean. For the device with 72 qubits investigated in figure 3 B, the probability of an individual ion passing through an X-Junction centre 14 times is low, at approximately 0.2%.
It may be desirable to increase the qubit density beyond 2 per X-Junction despite the potential loss of gate parallelizability as additional X-Junctions are experimentally costly to implement. Figure 4 A shows the efficiency of the routing protocol for three different qubit densities. The increase in shuttling time is predominantly attributed to the multiple rounds of shuttling (and gates) which are required for the 100% gate density (where gate density is the percentage of qubits involved in gates per time step) algorithm which we are assessing against. With a density of four qubits per X-Junction, a 100% two qubit gate density algorithm would be completed by two full rounds of shuttling and gate applications. The oscillating pattern attributed to odd and even devices becomes more apparent with increasing qubit density. This analysis only includes the additional time associated with the multiple rounds of shuttling and does not include the gate time. The overall cost of increasing qubit density will depend on the gate density of the desired algorithm.
We created a new routing algorithm which relies on positional swaps where qubits take the shortest available route (ignoring the previously mentioned lane priority routing) and swap to decongest. We have compared the total shuttling time of the swap routing against the lane priority routing, for two different swap time penalties, shown in figure 4 B. The time penalties were chosen based on early experimental results, H, Kaufmann et al demonstrated fast ion swapping of 42µs at a state fidelity of 99.5% [34]. Van Mourik et al demonstrated positional ion swapping with an associated coherence loss of 0.2(2)% [35]. For ion shuttling speed, Walther et al demonstrated fast shuttling of cold ions, over a distance of 280µm in 3.6µs [36] and P, Kaufmann et al demonstrated a state fidelity of 99.9994%, for shuttling over a distance of 280µm in 12.8µs [30]. For a wide range of de-vice sizes the lane priority routing outperforms the swap based routing for the swap time penalties used here. The total error is not considered in this comparison, but it would further favour the lane priority routing because of the associated infidelity of swapping, which is of a comparable order to experimental two qubit gate fidelities. This analysis suggests that for efficient routing in this design, it will not be necessary to perform positional swap operations. Of course, improvements on the achievable swap fidelity and time cost may impact this conclusion. We characterized the average number of swaps, n swap , per qubit for each connectivity run and found that for 18 qubits, the average was 1 swap, and for 50 qubits the average was 1.7 swaps. The dependence was well described by the following equation, n swap = 0.23(2) N 0.5 + 0.1(2), where the fit and standard error were calculated using linear regression. The average number of swaps per ion which was required to enable connectivity was found to be only weakly dependent on the swap time cost penalty, therefore doubling the time penalty results in minimal change to the number of swaps.

Achievable depth and quantum volume estimations
For comparison between near term quantum computers, one must consider more than just the number of qubits. Quantum volume (QV) is a conceived metric for quantum computational power designed to enable sincere comparison between architectures [31,32]. QV includes factors such as gate fidelity, qubit number, connectivity and the available gate set, and is given by for the number of qubits within the device N , and effective error rate ef f , which typically depends on N . QV reflects the limiting factor of the device, which is either the qubit number or the achievable depth D, where D = 1/(N × ef f (N )). To compute QV, a randomly generated depth one circuit on N qubits with N/2 arbitrary (SU(4)) two qubit gates is used. The achievable depth represents the circuit depth at which the device can run before coherence is lost, specifically, the depth at which at least one qubit error is statistically likely. The achievable depth is a useful metric which can be used separately from QV to estimate the feasibility of running an algorithm on a NISQ device.
The effective error ef f for each depth one circuit includes gate error, and errors associated with gate decomposition, connectivity and parallelizability. The effective error can be used to calculate the achievable depth. Many iterations of the randomly generated circuit should be used to best capture the properties of the device. In this section we present an error model for the quantum computing design analysed in this manuscript and present results for a range of experimental parameters that may be achievable. In the following analysis we assume linear propagation of errors, which represents a worst-case outcome, as it does not account for the possibility of a new error reducing a previous error. We combine the errors associated with connectivity and gates, as opposed to a full simulation of the quantum states and associated noise model. The advantage of this methodology is that we are able to make estimations on effective error (and therefore achievable depth) for a wide range of qubit numbers and device sizes. The effective error ef f for this design and circuit requirement is, ef f = gate + conn , where gate is the two qubit gate error and conn is the error associated with enabling the required global connectivity. We decompose conn into two components conn = deco + loss where deco is the quantum decoherence associated with the total time taken to enable connectivity where deco = 1−e −t/c for time t and coherence time c. Recent work by Kaufmann et al [30] demonstrated high state fidelity shuttling, where the coherence time associated with shuttling was extrapolated to be 2.13s. A coherence time of 50s has been demonstrated for stationary ions in the atomic clock states of calcium [39]. In figure 2 A we quantify the average time required to enable connectivity as a function of device size (qubit number). The stated dimension-less time τ can be converted to a real time by multiplying it with the expected time to shuttle an ion between two adjacent X-Junctions. For ion shuttling speed, Walther et al demonstrated fast shuttling of cold ions, over a distance of 280µm in 3.6µs [36] and Kaufmann et al demonstrated high state fidelity shuttling (99.9994%), over a distance of 280µm in 12.8µs [30]. There will be an additional time cost associated with performing a single combination and a separation of ions, per iteration of the depth one circuit, which have been performed in 80µs [37,38]. loss represents the likelihood for an ion to be lost to the vacuum per iteration of the depth one circuit. Investigations of ion loss for routing across X-Junction centres [40] found continuously Doppler cooled ion survive more than 10 5 round trips whereas uncooled ions survive at least 65 round trips. Ion loss occurs when its motional energy exceeds the trap depth, which can be remedied by increasing the trapping potentials and by cooling techniques. Significant work is carried out in order to allow the application of large trapping voltages in order to increase the effective trapping potential; recently trapping voltages as large as 1000V have been demonstrated [41]. In figure 3 A we quantify the average number of X-Junction crosses, X count , as a function of device size (qubit number), which can be combined with an ion loss per shuttle rate, X loss , for loss . This can all be combined into a single equation defining the effective error in this design This error model can be used to estimate the achiev- Figure 5: Quantum volume with a native two qubit gate requirement as a function of inverse gate error, 1/ , for different architectures. Here, the number of qubits utilised to achieve a given value of QVnative is equal to log2 QVnative rounded up to the nearest even integer. Red: An architecture with all to all connectivity where QVnative is solely defined by the native two qubit gate fidelity using equation 1 and represents the upper bound. Blue: The trapped ion architecture investigated in this manuscript using our proposed error model and the the routing results of the previous section. The coherence time and the time taken to shuttle between adjacent X-Junctions is extrapolated from work by Kaufmann et al [30]. We assume a distance between adjacent X-Junctions of 2500µm [16] which implies a shuttling time, t, of 114µs, and we use the demonstrated state fidelity of shuttling [30] to infer a coherence time, c, of 2.13s, and so t/c ≈ 5 × 10 −5 . We assume an ion loss rate of 10 −5 per X-Junction pass. We assume each iteration of the depth one circuit requires one combination and one separation operation, each of which have a duration of 80µs [37,38], and we assume the state fidelity of the operation can be inferred from the coherence time. Green: All the assumptions are identical to the above except for the coherence time which has been increased by a factor 10 [39]. Yellow: A square grid superconducting architecture where connectivity is enabled through sequences of nearest neighbour swap operations which require 3 native two qubit gates (the CNOT). The depth overhead was found to scale as a function of qubit number N as 2.77 √ N − 4.53 using the publicly available quantum compiling software, CQC's t |ket ; improvements to the connectivity compiler would reduce this overhead. able depth for a wide variety of device sizes and experimental parameters for devices following this design. The gate error will depend on the requirement of the algorithm we are assessing against, which in the case of QV is the arbitrary two qubit gate. The focus of this manuscript is the cost of enabling connectivity, therefore we have chosen to utilise the concept behind QV but alter its algorithm requirement to instead be the native two qubit gate of the architecture being assessed. We will refer to this new metric as QV native going forward. The costs associated with arbitrary two qubit gate decompositions will be discussed later.
We use our error model to quantify QV native as a function of two qubit gate fidelity for this architecture with two different assumptions on experimental shuttling parameters, shown in figure 5. These can be compared to the upper bound of this metric which corresponds to a hypothetical architecture with free, all to all connectivity. To demonstrate an example, a device with free all to all connectivity and a two qubit gate fidelity of 99.9% has a log 2 QV native of 31.25. This implies that one could effectively run a globally connected native two qubit gate algorithm with approximately 30 qubits at depth 30. We investigate up to a two qubit gate fidelity of 99.99%; this analysis indicates that without error correction techniques, chasing high qubit numbers will be futile even with considerable improvement to the current state of the art two qubit gate fidelities. The trapped ion plots of figure 5 have an ion loss rate of 10 −5 ; we found that increasing this rate substantially decreases the QV native , which seriously emphasises the importance of achieving an ion loss rate of this order. The ion loss rate can be improved by deeper trapping potentials and by techniques such as sympathetic cooling.
We also quantify this metric for a model of a superconducting architecture, which is a square grid with only nearest neighbour interactions. In superconducting square grid systems, connectivity is enabled by sequences of swap operations, and the best known method has an overhead of Θ(N 0.5 ) [42] for the random complete graph (global connectivity). IBM provide an equation to estimate the depth overhead, of the form (a √ N + b), for a square grid but it includes their gate decomposition costs of arbitrary two qubit gates [32]. Cowtan et al developed a compiler to map quantum circuits to devices with restricted qubit connectivity and provides results on the depth overhead for nearest neighbour square devices [20]. Using the publicly available software, CQC's t |ket and its recently improved connectivity compiler, the depth overhead was found to scale with qubit number N as, 2.77 √ N − 4.53. This overhead corresponds to a depth one, 100% gate density, native two qubit gate (CNOT) algorithm with 10N iterations. A SWAP gate is implemented with three CNOTs and no advantageous initial qubit mapping was utilised.
The native two qubit gate of this trapped ion design is the Mølmer-Sørensen [43] and although it does not directly depend on the motional state, it is affected by the heating rate and experimental offsets whereby it is favourable to begin in a low motional state. Therefore to reach the high two qubit gate fidelities used in Figure 5, it will be necessary to use cooling techniques. Techniques such a Doppler and sideband cooling are only suitable for the beginning of a quantum algorithm as they do not preserve quantum information. Sympathetic cooling is a way of actively cooling throughout a quantum algorithm, whereby the qubit is sympathetically cooled via a different laser cooled ion species. It is likely to be a critical technique for the use of trapped ion devices, particularly in the fault tolerant regime. Shuttling based designs may benefit from multi-species shuttling. The relative difference between the upper bound of free all to all connectivity, and the plots for trapped ions, increases with the two qubit fidelity due to the independent cost associated with shuttling. We find a notable difference in the QV native between the superconducting plot and the all to all, particularly at higher two qubit gate fidelities. Superconducting square grids have a slower growth rate with two qubit gate fidelity because the associated depth overhead of swaps increases with the number of qubits (the size of the grid). In this model, the trapped ion design outperforms the superconducting square grid for this set of experimental shuttling parameters. The number of shuttling operations, τ, required to enable connectivity in the trapped ion design analysed here, scales as τ = 1.3 (3) √ N + 2 (5) which is comparable to the depth overhead for swapping on the superconducting square grid. Extrapolating from the high state fidelity shuttling of Kaufmann et al [30], it implies a fidelity per shuttling operation (2500µm) of 99.995% which is significantly higher than the two qubit gate fidelities achieved so far by superconducting systems. In order to facilitate further work with our error model by others, we have made it open access [44].
The QV metric requires application of arbitrary, randomly generated SU(4) two qubit gates, as opposed to the native two qubit gate investigated above. The purpose of this requirement is to capture the power of the architecture's native gate set. There is an upper bound circuit which can express any arbitrary U(4) using 3 CNOTs and 15 elementary single qubit gates [45], with a native gate set consisting of R x (θ), R z (θ), and the CNOT. We have translated this upper bound circuit into the native gate set of the architecture analysed here, which is R x (θ), R y (θ) and the Mølmer-Sørensen (MS) two qubit gate [43] (see Appendix). The gate count of the new upper bound circuit is 3 MS gates and 18 elementary single qubit gates. We reduced the initial single qubit gate count from 29 to 18 by utilising basic commutation relations and the degrees of freedom which are available [33]. The upper bound circuit represents a worst case and optimal circuits can be found for particular SU(4)s using analytical techniques [46] but most exact decompositions of arbitrary two qubit gates will require the three native two qubit gates of the upper bound. A new technique demonstrated by IBM can considerably improve the fidelity of decomposing these gates [32]; Cross et al instead start with an allowable error on the decomposition, which allows one to identify cases which require less than the upper bound of three two qubit gates. This can result in a considerable improvement to the final fidelity, particularly when working with lower two qubit gate fidelities. The quantum volume with native qubit gates we have used in this section is a clear tool of comparison for the cost of connectivity in these two architectures. To extend this comparison to architectures with drastically different gate sets, such as those in some trapped ion designs which enable multiple (> 2) qubit gates, the original QV metric is more suitable. Once more research characterising quantum volume for various quantum computing designs becomes available, a more detailed comparison would be warranted.

Conclusion
The quantum computing architecture analysed in this manuscript has a clear path towards scaling to large qubit numbers. Arbitrary connectivity between qubits can be enabled in this design on near term devices, relying only on shuttling across a square grid, but prior to this work there were no proposed routing algorithms. We have created a routing algorithm which efficiently enables connectivity in this design. A simulation tool was created which allowed us to characterize the routing al-gorithm and compare it against a strict lower bound to which it scales with an equal gradient. The routing algorithm compares favourably against positional-swap based routing for the experimental values used. We propose an error model which can be combined with the results from the simulation tool, to estimate the circuit depth of a device as a function of experimental parameters. We use a metric, QV native , based on quantum volume which instead has native two qubit gates, to focus on and assess the cost of connectivity in this trapped ion design. Ion loss was found to be an important parameter of the model and needed to be low, at 10 −4 -10 −5 , to reach appreciable circuit depths and it can be improved experimentally with larger trapping potentials. It is necessary to maintain a sufficiently low motional state energy of the ions to reach high two qubit gate fidelities, which highlights the importance of developing techniques such as sympathetic cooling, and therefore multi-species shuttling. We use QV native to assess a model of a square grid superconducting device, and find that for the shuttling parameters used, this trapped ion design has a substantially lower cost associated with connectivity. The simulation tool and this analysis can be used to inform the development of devices following this design, by metering experimental priorities, and by solidifying the requirements on shuttling. This work has implications for error correction schemes, especially those which rely on non-nearest neighbor interactions. 5 Acknowledgement 6 Appendix 6.1 Decomposing arbitrary two qubit gates An upper bound circuit for expressing arbitrary two qubit gates in terms of R x (θ), R z (θ), and the CNOT, was found by Vatan et al [45]. An arbitrary single qubit gate U 1 , can be expressed in the form, for appropriate choices of α, β, γ, σ, wheren, andm are non-parallel real unit vectors in three dimensions [5].
We have converted the circuit of figure 6 into the native gate set of the architecture investigated here, which is, R x (θ), R y (θ) and the Mølmer-Sørensen gate U M S (χ) [43] which has the form, where χ can be set between −π/4 and π/4. The new converted circuit is shown in figure 7, and has a gate count of 3 MS gates and 18 single qubit gates. The single qubit gate count was reduced by combining superfluous sequences of single qubit gates, utilising commutation relations, and the available degrees of freedom. The MS gate commutes with any R x (θ). When decomposing the CNOT and R z (θ) gate, there is an available degree of freedom, where one may choose the direction of rotation on certain R y gates, which can then be used to eliminate some R y gates from the circuit [33]. Figure 7: A circuit for implementing any transform in U(4) with a gate set consisting of Rx(θ), Ry(θ), and the Mølmer-Sørensen gate [45], for a total gate count of 18 elementary single qubit gates and 3 MS gates.