Solving Large-Scale Linear Systems of Equations by a Quantum Hybrid Algorithm

Today’s intermediate-scale quantum computers, although imperfect, already perform computational tasks that are manifestly beyond the capabilities of modern classical supercomputers. However, so far, quantum-enabled large-scale solutions have been realized only for limited set of problems. Here a hybrid algorithm based on phase estimation and classical optimization of the circuit width and depth is employed for solving a speciﬁc class of large linear systems of equations ubiquitous to many areas of science and engineering. A classiﬁcation of linear systems based on the entanglement properties of the associated phase-estimation unitary operation is introduced, enabling a highly eﬃcient search for solutions that is facilitated by a straightforward matrix-to-circuit map. A 2 17 -dimensional problem is implemented on several IBM quantum computer superconducting quantum processors, a record-breaking result for a linear system solved by a quantum computer. Demonstrated realisation sets a clear benchmark in the quest for the future quantum speedup in the linear systems of equations solution.


INTRODUCTION
Since Peter Shor's discovery of the factorization algorithm in 1994 1 , finding quantum algorithms for solving other classically intractable problems has become one of the mainstreams of computational science.The subsequent developments brought in the idea of an overwhelming advantage of quantum computing versus classical computing, often referred to as quantum supremacy 2 .The manifestations of the latter sparked expectations for upcoming breakthrough provided quantum solutions would be implemented on noiseless quantum processor units (QPUs).Yet the progress in quantum computing has been hindered by the inherent errors in logical gates, imperfections in readout procedures, and decoherence affecting qubits.Despite these obstacles, state-of-the-art intermediatescale quantum devices 3 , although suffering these noise-related problems, enable computations far beyond the capabilities of modern classical supercomputers.
While quantum advantage has recently been demonstrated on QPUs programmed to execute random instructions 4 , the possibility of implementing specific transformations that can be adopted into a framework of practical quantum algorithms remains an open question.An exemplary core protocol for quantum computing is the quantum phase estimation 5 (QPE) protocol, which has been extensively studied and applied as a subroutine in a variety of quantum algorithms such as the Shor algorithm, quantum counting, and the calculation of the eigenvalues of unitary matrices.By virtue of the QPE, a quantum computer gains an exponential speedup in the context of the matrix inversion problem 6 .The latter is a significant computational challenge in many areas, including artificial intelligence 7 , partial differential equations 8 , and data analysis 9 .
In our work, in order to exploit the QPE protocol for the fast ma-trix inversion, we analyze and implement experimentally a hybrid version of the quantum Harrow-Hassidim-Lloyd algorithm 6 (H-HHL) for solving linear systems of equations.One of the main advantages of the HHL approach is the exponential compression of the data, which provides a notable improvement of the computational resource allocation.
Here, in order to most profoundly demonstrate such an advantage, we consider certain classes of linear systems that can be effectively solved with a quantum algorithm demonstrating an exponential speedup.Our experiments, carried out on the high-end simulator and on IBMQ superconducting processors 10 , demonstrate the superiority of quantum algorithms over the classical ones in solution of linear systems.Besides, a thorough experimental analysis enables us to spot the shortcomings of the modern QPUs and outline solutions for surpassing them in order to boost a large-scale quantum data processing 11 .

Hybrid quantum algorithm
Solving a general system of linear equations A x = b means finding the N-dimensional vector x for a given N × N matrix A and a vector b.In this work we solve the linear system using the HHL quantum algorithm 6 .This algorithm employs matrix exponentiation Û = e iA , or Hamiltonian simulation, as a preparation step.In general, such an exponentiation procedure is a major challenge, however for certain types of matrices this problem has been solved 12,13 , with general approaches being presented in Ref. 14. Here, to focus on the phase estimation part of the quantum algorithm, we consider only particular matrices A for which the Hamiltonian simulation can be efficiently performed.
Let us briefly discuss the structure of the quantum algorithm for linear systems.The HHL algorithm that inverts a N × N matrix exploits three groups of qubits: the vector register that consists of [log N] qubits, the p-qubit phase register, and a single qubit ancilla register, n = [log N] + p + 1 qubits in total.The whole computation, in turn, consists of the QPE algorithm, the ancilla quantum encoding (AQE) part, in which the single ancillary qubit conditionally operates on the state of the phase registers, and the inverse QPE.Here, we omit the quantum amplitude amplification step 6 that improves the complexity in terms of the conditional number, which is fixed.The phase estimation protocol exploits controlled unitary operations C Û, C Û2 , . . ., C Û2 p−1 over phase and vector qubits and the Quantum Fourier Transform afterward in order to process eigenvalues and eigenvectors of the matrix.For instance, the quantum circuit of the HHL algorithm that utilizes 3 phase registers is depicted on Fig. 1a.
In order to implement the HHL algorithm we use the spectral decomposition of the N × N matrix and encode the vector b into the qubit's amplitudes |b = N i=0 β i |i .Once we run the algorithm, the solution is encoded into the quantum 1 Terra Quantum AG, St. Gallerstrasse 16A, 9400 Rorschach, Switzerland. 2 Moscow Institute of Physics and Technology, 141700, Institutskii Per. 9, Dolgoprudny, Moscow Distr., Russian Federation. 3 1 | a.Quantum scheme of the original HHL algorithm with 3 phase qubits that involves the quantum phase estimation protocol (QPE) and the ancilla quantum encoding step (AQE) in order to solve Eq. 3 with a unitary matrix Û.While the QPE part exploits controlled unitary operations C Û, C Û2 , C Û4 and Quantum Fourier Transform in order to process eigenvalues and eigenvectors of Û, the AQE algorithm assists the matrix inversion.b.Quantum scheme of the simplified hybrid HHL algorithm, which provides the solution for the same matrix as the circuit from a.The algorithm involves only 2 phase registers, and the QPE part is significantly reduced.c.The circuit depth as function of problem size for TP 1 (circles), TP 2 (triangles) and NTP (crosses) matrix types processing by the original HHL and by the simplified H-HHL.d. e. f.Quantum circuits that generate 3 types of 2 5 × 2 5 Û: ÛTP1 that involves only single-qubit gates Ûs ; ÛTP2 that localizes two-qubit clusters with a single layer of CNOTs; ÛNTP that entangles all qubits expect the first.All types also contains Û3 operator, which is used for the setting of the matrix spectrum.g.The scheme of a controlled-unitary gate C Ûs of the single-qubit operation Ûs that involves single two-qubit interaction.
where N x is the normalization coefficient.Using projective measurement, we obtain the expectation value x| M |x for some operator M within exponentially shorter time than that allowed by the classical algorithms.
The HHL algorithm is improved by using some prior knowledge about the matrix A: this classical information allows us to refine the quantum algorithm in order to downsize the noise-sensitive quantum part.We implement a hybrid HHL algorithm (H-HHL) 15 that takes advantage of the fact that some bits of A −1 eigenvalues could be the same for any eigenvector.Since the QPE part encodes eigenvalues into the phase register qubits, we determine the identical bits of eigenvalues and, as a consequence, qubits corresponding to those bits.We treat such phase qubits as classical bits and exclude them from the computational scheme yielding a reduction in the width and in the depth of a circuit.In general, one can apply an iterative quantum phase estimation algorithm 16 in order to determine the identical bits of the eigenvalues.
In contract to previous works, here we focus on the large-scale implementation.For this purpose, we theoretically introduce and experimentally realize special classes of linear systems that can be solved using the H-HHL algorithm without limitation on the size of the system.

Benchmarking problem
Here we discuss the class of matrices that we consider in this work.As an exemplary task, we choose an efficient solution of the following system of linear equations where Û and b are a given matrix and vector, respectively, and (log Û) is the spectral radius of log Û.Since the logarithmic function is ambiguous, we fix the resulting matrix spectrum such that the largest absolute value of eigenvalues is less than 1.
, the fidelity level is similar.However, Johannesburg posses slightly higher V Q = 16, which is reflected in the fidelity.
To construct the quantum circuit that realize the operator Û = e iA , we consider the matrix Û comprising quantum gates.Furthermore, we choose the computational basis |b = |0 .The entire family of the gatebased matrices is expressed as a tensor product of M local operators Ûi that act only on the i-dimension subset of the computational circuit: To illustrate purposes the quantum algorithm capabilities, we consider three types of such U operators.First of all, we choose the block structure that consists of 2x2 blocks, 4x4 blocks and dim(U) blocks.We divided the physical operators Û that correspond to this structure and are commonly implemented in quantum circuits into three groups: TP 1 : ÛTP 1 is the Tensor Product of single-qubit gates Ûs resulting in dim(U i ) = 2: this operator does not entangle qubits.See Fig. 1d.
that leads to the emergence of two-qubit clusters within which qubits are entangled.See Fig. 1e.
NTP: ÛNTP is Not a Tensor Product of single-or two-qubit gates that leads to dim(U i ) = dim(A): this operator entangles all qubits.See Fig. 1f.
The QPE protocol, which is a vital subroutine of the studied algorithm, involves controlled unitary operations and, therefore, is the most complicated part.Indeed, the realization of an arbitrary control singlequbit gate in Û requires two CNOTs 5 , which leads to a highly complex quantum circuit, e.g.[log N]-qubit ÛNTP comprises up to 2 × [log N] single-qubit gates.Thus, we consider the continuous subset of singlequbit gates Ûs in a way that the control-Ûs is implemented via a single CNOT resulting in a dramatic simplification of the algorithm circuit.The circuit for the control-Ûs is depicted in Fig. 1g.
In order to demonstrate the classical part of the quantum algorithm, we manually control the spectrum {λ} of A. For that purpose, we introduce a correcting single-qubit gate Ûc in the quantum circuit; this assumption significantly simplifies the analysis of a quantum solution and, at the same time, does not lead to loss of generality.Leveraging the full control over the matrix spectrum via the correcting gate Ûc , we immediately eliminate the unnecessary phase qubits without the iterative phase estimation algorithm (for more details see Methods).
As an illustrative example, we adjust the correcting gate in such a way that λ ∈ 1 8 , 3 8 , 5 8 , 7 8 .As a result p = 3 phase qubits are required to encode the whole spectrum of N × N ÛTP 1 , ÛTP 2 or ÛNTP matrices.The hybrid part allows us to utilize only p = 2 instead of 3 phase qubits in order to solve the linear system resulting in n = [log N] + 3 qubits in total.The scheme of the H-HHL algorithm is shown in Fig. 1b.Such a technique allows us to significantly reduce the depth of the circuit, by an order of magnitude for the 50-qubit scheme.The circuit depth of the simplified H-HHL (red) and the original HHL (blue) as function of a problem size is shown in Fig. 1c for discussed matrix types.

Small-scale quantum algorithm implementation
Here, we demonstrate the performance of the H-HHL algorithm, which exploits 2 phase and a single ancillary qubit, using real QPUs provided by IBMQ 10 .
Primarily, we consider a low number of qubits case where n ∈ [4, 7], three qubits that are assigned to the phase and ancillary registers.Under these conditions we employ the full state tomography analysis, which requires 3 n−3 experiments with one circuit; each experiment consists of 8912 runs.We investigate all matrix types TP 1 , TP 2 , and NTP by analyzing 140 transpiled random quantum circuits (RQCs) for every type for every matrix size.Let us recall that the randomness is defined by the random single-qubit gates Ûs .We run the low-width circuits on the IBMQ Burlington (5 qubits), Yorktown (5 qubits), Melbourne (15 qubits) and Johannesburg (20 qubits) QPUs.First, as an example, let us solve the following system of linear equations where the matrix is determined by the operator A = log Û3 (π/2, 0, 0) (see Methods).The normalized solution, in turn, would be encoded into a pure single-qubit quantum state |x = 4 5 |0 + 3 5 e −iπ/2 |1 .The corresponding density matrix ρ is depicted on Fig. 2a: dashed squares represent the noiseless solution, solid squares show the solution obtained on the real IBMQ QPUs, blue and red color indicates the positive and negative values, respectively.High final fidelity ensures that the eigenvector of ρ, which corresponds to the maximum eigenvalue, approximates the solution with high precision.For instance, we find that the density matrix measured on the Johannesburg QPU provides the eigenvector | x = 0.81 |0 + 0.58e −iπ/2.43|1 that corresponds to the maximum eigenvalue, which is the solution with a | x| x | 2 = 97.5% precision.Next, we focus on the algorithm performance with only NTP matrices, since the fully entangled state is the most interesting in terms of complexity analysis.The fidelity of the full state tomography averaged over 70 random NTP matrices is shown in Fig. 2b for different H-HHL circuit widths.Since the measured quantum volumes 17 V Q of IBMQ Burlington, Yorktown and Melbourne processors are equal, the fidelity behavior is similar.However, we find that the V Q of the Johannesburg QPU is twice as large, resulting in a slightly higher fidelity level.

Large-scale quantum algorithm implementation
Here we consider large circuits performance, but, at first, let us elaborate on a suitable performance metric.Since the full state tomography requires 3 n experiments for n-qubit circuit, it is not possible to obtain tomographical fidelity in a reasonable time.Thus, we employ a similar cross entropy benchmarking approach as in Ref. 4, which allows us to estimate the algorithm fidelity with only one experiment and single Z-projective measurement that gives us the probability distribution of outcomes.Let us define the fidelity as follows: where (•, •) is a scalar product.The p t j and p e j are probability distribution of outcomes, which correspond to the noiseless implementation and experimental run of jth circuit, respectively; p u is the uniform probability distribution.In the above formula, we average the probability distribution over M different jth RQCs.It is clear that the introduced metric shows averaged proximity of the obtained vector projection to the ideal solution rather than to a chaotic state -such a definition matches the cross entropy fidelity 4 .More details on the algorithm characterization can be found in the Supplementary Information (SI).
The fidelity F XEB of the H-HHL algorithm as a function of the circuit width is presented in Fig. 3 (a,b) for the high-performance simulator with an embedded noise model based on the IBMQ Melbourne device and for the real experiment conducted on Melbourne QPU, respectively; the same data, which was measured on the Johannesburg device, is presented in Fig. 3 (c,d).The noise model includes the gate and readout errors that were measured beforehand.Each point is the fidelity that is averaged over 140 RQCs, which were topologically optimized in order to get the minimal depth, and each RQC experiment consists of 10 5 runs on the simulator as well as on the QPU.
Besides, we consider the digital error model 4 (DEM), which takes into account gates and readout errors and characterizes circuit performance by a set of localized Pauli errors; measured errors are presented in SI.For the estimation of the algorithm performance on a large QPUs via DEM, we assume that only the fidelity of readout F r , single-qubit gates F 1QG and two-qubit gates F 2QG contribute to the final fidelity resulting in The digital error approach can be used to predict the fidelity behavior when there is no space or time correlation between gate errors, which is the case in the simulator with an embedded depolarizing noise model, see Fig. 3(a,c).However, for the experiment conducted on IBMQ QPUs, we find that the DEM fails in predictions of the algorithm performance.It becomes clear from Fig. 3(b,d) that the measured fidelity is almost independent on the matrix type despite the significant changes in corre-sponding circuits.Indeed, it was confirmed earlier that gate errors are correlated in some of the IBM superconducting processors 18 .Nonetheless, we expect that an advanced equipment combined with the necessary precautions against the correlated errors, e.g.Purcell filters, will provide the fidelity behaviour according to the digital error model as was shown in Ref. 4.
As a result, in this work we experimentally find the quantum state projection that corresponds to the solution of a 2 17 × 2 17 system of linear equations, which appears to be a record in matrix inversion on a gate-based QPU.Previously, the 8 × 8 problem was solved with an adiabatic-inspired quantum algorithm 19 on a four-qubit nuclear magnetic resonance device 20 , and the 32 × 32 problem was tackled with variational quantum linear solver 21 using Rigetti's 16-qubit superconducting chip.

Classical solution
Here we discuss classical algorithms for solving the linear system A x = b, where A has a fixed spectrum and is sparse, which is the case of Eq. 3.
Our analysis is carried out on NTP-type of system, as the most interesting in terms of complexity.
The classical computational cost of solving the system with N × N matrix using advanced numerical methods is no less than O(N 2 ) regardless of mentioned assumptions.The fixed spectrum can be exploited in the iteration eigenvalue algorithms (IEA) 22 .However, such an algorithm involves multiplications of matrices that lead to polynomial scaling in time and memory, since the product of sparse matrices is not guaranteed to be sparse 23 .Then the IEA cannot be performed in less than O(N 2 ) operations.On the other hand, in order to compute log Û one can utilize matrix diagonalization 24 or the inverse scaling and squaring method 25 .Such techniques require at least an exponential number of operations or addresses to matrix elements.If the sparse log Û is obtained, one may use iterative algorithms to solve the linear system afterward, e.g. by applying Kaczmars method 26 , which has also the O(N 2 ) scaling.

Quantum algorithmic advantage
As we have shown before, while the runtime of the classical algorithms that solves NxN linear system is O(N 2 ), the runtime of the hybrid quantum algorithm has O(log N) scaling.We expect that future improvements in classical algorithms and hardware will provide a considerable reduction in runtime and computational resources, however, persistent enhancement of quantum software allows hybrid quantum approaches to consistently outperform classical solutions.Besides, we find that the high-performance circuit emulation 4 is considerably faster than the direct solution: O(N 2 ) vs O(N log N), indicating that simulation of a quantum circuit is essentially an efficient tool for the solution and a proper reference for the quantum hardware to compare.Here, we consider the straightforward Schrödinger-Feynman simulation presented in Ref. 4.
For algorithmic benchmarking, let us compare the efficiency of the 2 17 × 2 17 NTP problem solution implemented on a QPU and a Schrödinger-Feynman simulation implemented on the most significant supercomputers in history.In order to properly compare classical approaches and quantum devices, we must take into account the final fidelity, since the equal-fidelity computation time scales linearly with F XEB as was shown in 4. While the net calculation time on a quantum device with 0.04% fidelity does not exceed 1 ms (assuming several runs to ensure the ancillary qubit is measured in |1 ), the equal-fidelity classical solution would require 7 orders of magnitude higher time for the supercomputers from 1960, 5 orders for devices from 1970 and few times higher for computers from 1990; the detailed comparison is shown in Fig. 4. Note that the estimations for the simulation performance are over evaluated for supercomputers that do not have enough memory to process the whole circuit.We thus omit the final vector recreation runtime since the estimation of such an operation is a highly complicated task.4 | Runtime of the equal-fidelity classical solution of a 2 17 ×2 17 linear system with NTP matrix on historical supercomputers.The fidelity level of a IBMQ Johannesburg processor, which solves the problem in 200 µs (dashed green line), is 0.04%.The runtime was calculated by estimating the whole 20-qubit circuit simulation runtime and multiplying it on fidelity.In case when the supercomputer's memory is not enough we divide the circuit in two and simulate all paths using Schr ödinger-Feynman approach (see Methods).The Atlas supercomputer (1962) would solve the system with the same fidelity in 17 minutes, CDC 7600 (1969) in 30 seconds, Cray-1 (1976) and Cray-2 (1985) in 6.3 seconds and 0.9 seconds, respectively, Intel ASCI Red (1997) in 1.2 ms.While XXth century devices are inferior to the 20-qubit quantum computer, present supercomputers Tiahne-1 (2010) and Summit (2020) surpass the IBMQ QPU.Nonetheless, regular modern laptops (10 GFLOPS) are less efficient than the quantum processor.Runtime fit (dotted red line) indicates the Moore's law.The embedded table shows RAM in the number of qubits, which can be fitted into memory, and performance in FLOPS.
Such a dramatic increase in speed is an experimental evidence that the hybrid quantum computing will surpass the classical computing industry that reigned in the XXth century.
We want to emphasize that our comparison was made with the most straightforward simulations that include matrix multiplications.However, we expect that more advanced quantum simulation approaches, such as tensor networks 27,28 , may allow for a solution in poly-logarithmic time.Nevertheless, this kind of quantum-inspired solution would show the power of quantum algorithms even more vastly.The realization of these methods is a subject of the forthcoming works.

Near-term applications
Finally, we discuss the possible practical applications of the H-HHL algorithm in the framework of the posed problem.Despite the fact that the fidelity level is low, the quantum algorithm still provides the solution with a tremendously higher probability than a random guess.
The exact realization of our algorithm allows for addressing some tangible problems, for instance, in the context of Markov processes, one could obtain the generator of a known stochastic P matrix -a transition rate matrix Q = log P 29 .Then, we can solve the linear equation Q f • ∆t = ∆ f at some point of the process.Here, f is the distribution of state probabilities and ∆ f /∆t is a probability current, which is supposed to be known.Upon obtaining | f and measuring f | M | f we check that the probability of a specific state is non-zero.
An analogous problem can be addressed in the control theory.Let us suppose that a discrete-time process with time step ∆t: x n+1 = A x n is modelled by a continuous one: ˙ x = B x. Since A = exp(B∆t), one could obtain the vector x by solving ˙ x = B x at some point of the process if ˙ x is provided.By averaging proper Hermitian operators x| M |x we determine whether some components of x are non-zero.
The algorithm can be modified by applying matrix simulation techniques for A in order to solve A x = b as it was originally proposed in Ref. 6.In that case, a wider range of problems can be approached.For instance, it was proposed to use the SWAP test in order to determine whether solutions of different systems coincide.Such linear systems are also used to solve partial differential equations, e.g. one can find the electromagnetic field energy in some region using the Poisson equation.One of the promising applications related to deep neural network training was discussed in Ref. 30: since the extension of the Bayesian approach to deep architectures is a serious challenge, one can exploit the hybrid quantum HHL algorithm developed for Gaussian processes in order calculate a model's predictor.

CONCLUSION
We implemented the quantum hybrid HHL algorithm solving a system of linear equation by fast matrix inversion.The matrix, in turn, is approximated by a unitary transformation, which was dictated by the sequences of single-qubit rotations and CNOT gates.The size of the linear system grows exponentially with the increasing number of qubits.Using the state-of-art 50 qubits processor, one can invert the 10 14 × 10 14 matrix, which is far beyond the capabilities of the present state-of-the-art supercomputers.
We probed the algorithm on the simulator with an embedded noise model, which is devoid of correlations in gate errors, and on the real IBMQ QPUs with 5, 15 and 20 qubits.The implemented quantum solution of a 2 17 -dimensional problem, which is a record for a linear system solution on quantum computers, demonstrates algorithmic quantum advantage over classical solutions.
Our experimental results and theoretical estimates hold high potential for solving large linear systems utilizing state-of-the-art NISQ computers and quantum algorithms in general.We intent to probe the H-HHL algorithm both on the cutting edge low-noise QPUs and on highperformance classical simulator machines.We envision that the planned experiments will stimulate major players in the quantum computing industry to demonstrate the hardware and software achieving quantum advantage.

Linear system construction
Here, we consider the implementation of TP 1 , TP 2 and NTP matrix types in more detail.In order to construct quantum circuits we use Qiskit open-source framework 31 that allows us to generate Û from single-qubit and two-qubit CNOT gates.An arbitrary single-qubit gate is decomposed by Qiskit into physical gates: the Û3 operator, which is the most general gate, defines as follows: Since the QPE protocol, which is a vital part of the H-HHL algorithm, involves controlled unitary operations, we need to extend single-qubit gates Ûs and CNOTs in Û to a control ĈU s and Toffoli gates, respectively 5 .However, realization of an arbitrary control single-qubit gate ĈU s requires two CNOTs that leads to highly complex quantum circuit.Thus, we consider subclass of a single-qubit Ûs that the ĈU s requires only one CNOT resulting in dramatic simplification of the H-HHL circuit.
Let us define operator Ûs = Û3 { X, Ŷ, Ẑ} Û † 3 , where any of the Pauli matrices { X, Ŷ, Ẑ} can be chosen.Such a form leads to the following C Ûs representation: where the Pauli gate is applied only when the control qubit is in the excited state, see Fig. 1g.It is clear that Ûs gate is essentially an arbitrary Hermitian Û3 gate, which can be build according to definition Eq. 8 with following condition: Such a definition ensures that all controlled single-qubit gates contain only one twoqubit operation and, since Ûs is a Hermitian gate, it has only two eigenvalues {1, −1}.
Unfortunately, it leads to another problem: studied log( ÛTP1 ), log( ÛTP2 ), log( ÛNTP ) matrices have zero eigenvalues, which is unacceptable for the HHL algorithm 6 .In order to avoid such an issue we add the following correcting gate to the first register of Û: The introduction of a such correcting gate leads to two important and fruitful features: (i) the spectrum of all matrix types is exp[2πi • { 1 8 , 3 8 , 5 8 , 7 8 }]; (ii) Û2 TP1 = Û2 TP2 = Û2 NTP = Û3 (π, 0, 0) ⊗ Î that greatly simplifies the QPE part.
We exploit the knowledge about matrix spectrum in the classical part of the H-HHL algorithm.Since the binary form of 1 2πi log( Û) eigenvalues contain three bits, one needs three phase qubits to store whole spectrum.It is easy to show that the last bit of all eigenvalue is 1 if we write down four eigenvalues in binary representation: λ = {0,1} 2 + {0,1} 4 + 1 8 , where the different combinations of {0, 1} corresponds to different eigenvalues.Therefore, we immediately eliminate one phase qubit without conducting the iterative phase estimation procedure resulting in p = 2. Hence, we can utilize n = [log N] + 3 qubits overall for each matrix type.The 5-qubit examples of ÛTP1 , ÛTP2 and ÛNTP operators are depicted on Fig. 1d, Fig. 1e and Fig. 1f, respectively.

Circuit emulation
Since the classical computational cost of solving Eq. 3 with N × N matrix is no less than O(N 2 ), an alternative method, the high-performance simulation of the H-HHL quantum circuit, should be considered.The simulation that consists of Schrödinger algorithm (SA) 4 , on one hand, is straightforward and provides a great speed in processing low-width circuits in comparison to the other simulations.On the other hand, the Schrödinger algorithm requires a significant amount of RAM when processing manyqubits circuits.Such a requirement is difficult to meet since the memory of modern supercomputers is substantially limited.However, the Schrödinger-Feynman algorithm 4 (SFA) solves the memory issues inherent in a SA, which presumably makes it the fastest simulation of high-deep high-width quantum circuits.
Primarily, in order to estimate the classical simulation runtime, we perform Schrödinger simulations, which are essential building blocks of SFA; the runtime scaling of a single n-qubit SA is T S A = C • n • 2 n .We exploit a POWER8 processor with 160 cores and 512 Gb of RAM 31 in order to estimate the time constant C -evaluating the runtime for each circuit width we fit the dependence and find that C TP1 = 5 • 10 −9 s, C TP2 ≈ 14 • 10 −9 s and C NTP ≈ 17 • 10 −9 s, which we assume are scaled linearly in number of cores.Since modern supercomputers have roughly 100K cores, we expect the constants to be C TP1 ≈ 10 −12 s, C TP2 ≈ 2.8 • 10 −12 s and C NTP ≈ 3.4 • 10 −12 s in the best scenario.
For memory estimates, while the state-of-the-art supercomputers have 3 PB of RAM, we suppose that one can store a 2 47 -dimensional vector using 8 bytes to encode single complex number.Hence, SFA allows us to split the quantum circuit in the ratio (n − n) : n, where n < n − n ≤ 47.The execution time of the Schrödinger-Feynman method is T S FA = C (n − n) • 2 n− n • N p , where N p is the number of paths in final state vector calculation 4 .The scaling constant C is the same as in the Schrödinger algorithm, since we are forced to use almost all RAM and algorithm parallelization becomes an elusive task.
In order to compare 20-qubit QPUs with classical supercomputers, we recalculate constants C in seconds per Flops; for the NTP circuit we found that T S A = 1.5 • 10 5 • n • 2 n /Flops s.For the characterization of the equal-fidelity calculation, we assume that the simulation runtime scales linear with fidelity, therefore the equal-fidelity runtime is F XEB T S A .However, when it is impossible to store the whole vector in the memory we divide our circuit into 16 : 4 ratio.In that case, we obtain a single Toffoli gate on the cut for the single QPE step, which have two control qubits above the cut and controlled qubit bellow the cut.This gate should be decomposed in order to simulate two parts independently and combined afterwards to get a solution.There are 2 2 paths for such Toffoli gate that gives us N p = 2 4 in total (due to QPE and inverse QPE).Using the calculated runtime for 2 4 16-qubit circuit emulations, we estimate the equal-fidelity runtime for the supercomputers and plot the data in Fig. 4.
QTF Centre of Excellence, Department of Applied Physics, Aalto University School of Science, P.O.Box 15100, FI-00076 AALTO, Finland. 4 Materials Science Division, Argonne National Laboratory, 9700 S. Cass Avenue, Argonne, Illinois 60637, USA. 5 Department of Physics, Northern Illinois University, DeKalb IL, 60115, USA. 6Physics Department, City College of the City University of New York, 160 Convent Ave, New York, NY 10031, USA.† e-mail: vv@terraquantum.swiss. 1 arXiv:2003.12770v3[quant-ph] 30 Jun 2021 Fig.1| a.Quantum scheme of the original HHL algorithm with 3 phase qubits that involves the quantum phase estimation protocol (QPE) and the ancilla quantum encoding step (AQE) in order to solve Eq. 3 with a unitary matrix Û.While the QPE part exploits controlled unitary operations C Û, C Û2 , C Û4 and Quantum Fourier Transform in order to process eigenvalues and eigenvectors of Û, the AQE algorithm assists the matrix inversion.b.Quantum scheme of the simplified hybrid HHL algorithm, which provides the solution for the same matrix as the circuit from a.The algorithm involves only 2 phase registers, and the QPE part is significantly reduced.c.The circuit depth as function of problem size for TP 1 (circles), TP 2 (triangles) and NTP (crosses) matrix types processing by the original HHL and by the simplified H-HHL.d. e. f.Quantum circuits that generate 3 types of 2 5 × 2 5 Û: ÛTP1 that involves only single-qubit gates Ûs ; ÛTP2 that localizes two-qubit clusters with a single layer of CNOTs; ÛNTP that entangles all qubits expect the first.All types also contains Û3 operator, which is used for the setting of the matrix spectrum.g.The scheme of a controlled-unitary gate C Ûs of the single-qubit operation Ûs that involves single two-qubit interaction.

Fig. 2 |
Fig.2| Results of the full state tomography at the end of the H-HHL algorithm for different QPUs: IBMQ Burlington, Yorktown (both 5 qubits), Melbourne (15 qubits) and Johannesburg (20 qubits).a.The Hinton plot of an example of a density matrix, which corresponds to the vector register state -ancilla and phase registers were filtered out -that contains the solution of Eq. 5: dashed squares depict the density matrix that corresponds to the ideal solution, solid squares depict the measured density matrix.While blue color indicates positive values, red color indicates negative values; the size of a square shows the absolute value.b.The fidelity F T OM of the algorithm circuit that was averaged over 70 NTP RQCs.Since the quantum volume of Burlington, Yorktown and Melbourne processor is the same V Q = 8, the fidelity level is similar.However, Johannesburg posses slightly higher V Q = 16, which is reflected in the fidelity.

Fig. 3 |
Fig. 3 | Cross entropy fidelity F XEB of the H-HHL algorithm as a function of a total number of qubits characterized for the IBMQ 15-qubit Melbourne and 20-qubit Johannesburg processors.a.The fidelity obtained on a simulator with an embedded depolarizing noise model based on the gate and readout errors measured for the IBMQ Melbourne device (filled crosses) for TP 1 (blue), TP 2 (orange) and NTP (red) matrices.The solid lines indicate the digital error model.It is clear that the simulator, which is devoid of any spatial or temporal correlations of gate errors, is described by the DEM.b.The experimental results measured on the Melbourne QPU (empty crosses); colormap matches the matrix types.The same type of measurements were performed for the 20-qubit Johannesburg processor c. on the simulator with a noise model and d. on the real device.The experimental results are not fitted by DEM by reason of correlations of control errors, which are specific to IBMQ devices.