Quantum circuit design methodology for multiple linear regression

.


Introduction
Quantum algorithms running on quantum computers aim at quickly and efficiently solving several important computational problems faster than classical algorithms running on classical computers [1][2][3][4][5][6][7][8][9][10][11].One key way in which quantum algorithms differ from classical algorithms is that they utilise quantum mechanical phenomena such as superposition and entanglement, that allows us to work in exponentially large Hilbert spaces with only polynomial overheads.This in turn, in some cases, allows for exponential speed-ups in terms of algorithmic complexity [1].
In today's world, machine learning is primarily concerned with the development of low-error models in order to make accurate predictions possible by learning and inferring from training data [12,13].It borrows heavily from the field of statistics in which linear regression is one of the flagship tools.The theory of multiple linear regression or more generally multivariate linear regression was largely developed in the field of statistics in the pre-computer era.It is one of the most well understood, versatile and straightforward techniques in any statistician's toolbox.It is also an important and practical supervised learning algorithm.Supervised learning is where one has some labelled input data samples {x i , y i } i = 1 N (where x i 's are the feature vectors and y i 's are the corresponding labels) and then based on some criteria (which might depend on the context) chooses a mapping from the input set X to the output set Y. That mapping can help to predict the probable output corresponding to an input lying outside of the training data sets.Multiple linear regression is similar in the sense that given some training samples one identifies a closely fitting hyperplane depending on the specific choice of a loss function (the most common one being a quadratic loss function based on the 'least squares' method).Interestingly, any multiple regression problem can be converted into an equivalent system of linear equations problem or more specifically, a quantum linear systems problem (QLSP) [14].The process has been outlined using an example in Section 3.
Suppose that we are given a system of N linear equations with N unknowns, which can be expressed as Ax = b.Now, what we are interested in, is: given a matrix A ∈ ℂ N × N with a vector b ∈ ℂ N , find the solution x ∈ ℂ N satisfying Ax = b (which is A −1 b if A is invertible), or else return a flag if no solution exists.This is known as the linear systems problem (LSP).However, we will consider only a special case of this general problem, in the form of the QLSP [14,15].
The quantum version of the LSP problem, the QLSP, can be expressed as follows.
Let A be a N × N Hermitian matrix with a spectral norm bounded by unity and a known condition number κ.The quantum state on ⌈log N⌉ qubits b⟩ can be given by b⟩ and x⟩ by where b i , x i are, respectively, the ith component of vectors b and x.Given the matrix A (whose elements are accessed by an oracle) and the state b⟩, an output state x ~⟩ is such that ∥ x ~⟩ − x⟩ ∥ 2 ≤ ϵ, with some probability Ω(1) (practically at least 1/2) along with a binary flag indicating 'success' or 'failure' [14].
The restrictions on Hermiticity and spectral norm can be relaxed by noting that, even for a non-Hermitian matrix A, the corresponding This is an open access article published by the IET under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/)matrix is Hermitian.This implies that we can instead solve the linear system given by which has the unique solution y = 0 x when A is invertible [4].Also, any non-singular matrix can be scaled appropriately to adhere to the given conditions on the eigenspectrum.Note that the case when A is non-invertible has already been excluded by the fact that a known finite condition number exists for the matrix A.
In 2009, Harrow, Hassidim and Lloyd [4] put forward a quantum algorithm (popularly known as the 'HHL algorithm') to obtain information about the solution x of certain classes of linear systems Ax = b.As we know, algorithms for finding the solutions to linear systems of equations play an important role in engineering, physics, chemistry, computer science and economics apart from other areas.The HHL algorithm is highly celebrated in the world of quantum algorithms and quantum machine learning, but there is still a distinct lack of literature demonstrating practical uses of the algorithm and their experimental implementations.Experimentally implementing the HHL algorithm for solving an arbitrary system of linear equations to a satisfactory degree of accuracy remains an infeasible task even today, due to several physical and theoretical restrictions imposed by the algorithm and the currently accessible hardware.Our fundamental aim in this paper is to put forth a neat technique to demonstrate how HHL can be used for one of the most useful tools in statistical modellingmultiple regression analysis.We show that there is a canonical reduction of any multiple linear regression problem into a QLSP that is thereafter amenable to a solution using HHL.It turns out that, after slight modifications and classical pre-processing, the circuit used for HHL can be used for multiple linear regression as well.Here we should point out, in Scott Aaronson's words [16], that the HHL algorithm is mostly a template for other quantum algorithms and it has some caveats that must be kept in mind during the circuit design and physical implementation step, to preserve the exponential speedup offered by the algorithm.Nonetheless, these issues are not always detrimental and can often be overcome by appropriately preparing and justifying the circuit design methodology for specific purposes, as we will do here.There is another brief discussion related to these issues in the conclusions section of the paper.
In Section 4, we present an application (in the context of multiple regression) of a modified version of the earlier circuit design by Cao et al. [17] which was meant for implementing the HHL algorithm for a 4 × 4 linear system on real quantum computers.This circuit requires only 7 qubits and it should be simple enough to experimentally verify it if one gets access to a quantum processor having logic gates with sufficiently low error rates.Previously, Pan et al. demonstrated HHL on a 4-qubit NMR quantum computer [18], so we believe that it will be easily possible to experimentally implement the circuit we discuss, given the rapid rise in the number of qubits in quantum computer chips, in the past few years.
We also note that although HHL solves the QLSP for all matrices A or it can be efficiently implemented only when they are sparse and well-conditioned (the sparsity condition may be slightly relaxed) [14].In this context, 'efficient' means 'at most polylogarithmic in system size'.A N × N matrix is called s-sparse if it has at most s non-zero entries in any row or column.We call it simply sparse if it has at most poly(log N) entries per row [15].We generally call a matrix well-conditioned when its singular values lie between the reciprocal of its condition number (1/κ) and 1 [4].Condition number κ of a matrix is the ratio of largest to smallest singular value and is undefined when the smallest singular value of A is 0. For Hermitian matrices, the eigenvalue magnitudes are equal to the magnitudes of the respective singular values.At this point, it is important to reiterate that unlike the output A −1 b of a classical linear system solver, the output copy of x ~⟩ does not provide access to the coordinates of A −1 b.Nevertheless, it allows for sampling from the solution vectors like ⟨x ~M x ~⟩, where M is a quantum-mechanical operator.This is one main difference between solutions of the LSP and solutions of the QLSP.We should also keep in mind that reading out the elements of x ~⟩ in itself takes O(N) time.Thus, a solution to QLSP might be useful only in applications where just samples from the vector x ~⟩ are needed [4,14,15].The best existing classical matrix inversion algorithm involves the Gaussian elimination technique which takes O(N 3 ) time.For ssparse and positive semi-definite A, the conjugate gradient algorithm [19] can be used to find the solution vector x in O(Nsκlog(1/ϵ)) time by minimising the quadratic error function Ax − b 2 , where s is the matrix sparsity, κ is the condition number and ϵ is the desired precision parameter.On the other hand, the HHL algorithm scales as O(log(N)s 2 κ 2 /ϵ), and is exponentially faster in N but polynomially slower in s and κ.In 2010, Andris Ambainis further improved the runtime of HHL to O(κlog 3 κlog N /ϵ) [20].The exponentially worse slowdown in ε was also eliminated by Childs et al. in 2017 [14] and it got improved to O(sκpolylog(sκ/ϵ)) [15].Since HHL has logarithmic scaling only for sparse or low-rank matrices, in 2018, Wossnig et al. [21] extended the HHL algorithm with quantum singular value estimation and provided a quantum linear system algorithm for dense matrices which achieves a polynomial improvement in time complexity, that is, O( Npolylog(N)κ 2 /ϵ) (HHL retains its logarithmic scaling only for sparse or low-rank matrices).Furthermore, an exponential improvement is achievable with this algorithm if the rank of A is polylogarithmic in the matrix dimension.
Last but not least, we note that it is assumed that the state b⟩ can be efficiently constructed, i.e. prepared in 'polylogarithmic time'.In reality, however, efficient preparation of arbitrary quantum states is hard and is subject to several constraints.

Harrow Hassidim Lloyd algorithm
The HHL algorithm consists of three major steps which we will briefly discuss one by one.Initially, we begin with a Hermitian matrix A and an input state b⟩ corresponding to our specific system of linear equations.The assumption that A is Hermitian may be dropped without loss of generality since we can instead solve the linear system of equations given by which has the unique solution y = 0 x when A is invertible.This transformation does not alter the condition number (ratio of the magnitudes of the largest and smallest eigenvalues) of A [4,14].However, in the case our original matrix A is not Hermitian, the transformed system with the new matrix needs oracle access to the non-zero entries of the rows and columns of A [14].Since A is assumed to be Hermitian, it follows e iAt e −iAt = e iAt − iAt = e 0 = I.Moreover, e iAt shares all its eigenvectors with A, while its eigenvalues are e iλ j t if the eigenvalues of A are taken to be λ j .Suppose that u j ⟩ are the eigenvectors of A and λ j are the corresponding eigenvalues.We recall that we assumed all the eigenvalues to be of magnitude <1 (spectral norm is bounded by unity).As the eigenvalues λ j are of the form 0.a 1 a 1 a 3 … in binary [1, p. 222], we will use λ j ⟩ to refer to a 1 a 2 a 3 …⟩.We know from the spectral theorem that every Hermitian matrix has an orthonormal basis of eigenvectors.So, in this context, A can be re-written as ∑ j λ j u j ⟩⟨u j (via eigendecomposition of A) and b⟩ as ∑ j β j u⟩ j .

Phase estimation
The quantum phase estimation algorithm performs the mapping where u⟩ is an eigenvector of a unitary operator U with an unknown eigenvalue e i2πφ [1].φ ~ is a tbit approximation of φ, where t is the number of qubits in the clock register.The superscripts on the kets indicate the names of the registers which store the corresponding states.In the HHL algorithm, the input register begins with a superposition of eigenvectors instead, i.e. b⟩ = ∑ j β j u j ⟩ instead of a specific eigenvector u⟩, and for us the unitary operator is e iAt .So, the phase estimation circuit performs the mapping C where λ ~j′s are binary representations of the eigenvalues of A to a tolerated precision.To be more explicit, here λ ~j is represented as b 1 b 2 b 3 …b t (t being number of qubits in the clock register) if the actual binary equivalent of λ j is of the form λ = 0.b 1 b 2 b 3 ….To avoid the factor of 2π in the denominator, the 'evolution time' t 0 is generally chosen to be 2π.However, t 0 may also be used to 'normalise' A (by re-scaling t 0 ) in case the spectral norm of A exceeds 1 [Ideally, we should know both the upper bound and the lower bound of the eigenvalues, for effective re-scaling.Furthermore, to get accurate estimates, we should attempt to spread the possible values of λt over the whole 2π range.].Additionally, an important factor in the performance of the algorithm is the condition number κ.As κ grows, A tends more and more towards a non-invertible matrix, and the solutions become less and less stable [4].Matrices with large condition numbers are said to be 'illconditioned'.The HHL algorithm generally assumes that the singular values of A lie between 1/κ and 1, which ensures that the matrices we have to deal with are 'well-conditioned'.Nonetheless, there are methods to tackle ill-conditioned matrices and those have been thoroughly discussed in the paper by Lloyd and co-workers [4].It is worth mentioning that in this step the 'clock register'controlled Hamiltonian simulation gate U can be expressed as C ⊗ e iAτt 0 /T , where T = 2 t and evolution time t 0 = O(κ/ϵ).Interestingly choosing t 0 = O(κ/ϵ) can at worse cause an error of magnitude ϵ in the final state [4].

R λ ~−1 rotation
A 'clock register' controlled σ y -rotation of the 'ancilla' qubit produces a normalised state of the form S These rotations, conditioned on respective λ ~j, can be achieved by the application of the exp( − iθσ y ) = cos θ −sin θ sin θ cos θ operators where θ = cos −1 C/λ ~j .C is a scaling factor to prevent the controlled rotation from being unphysical [15].That is, practically C < λ min is a safe choice, which may be more formally stated as C = O(1/κ) [4] (Fig. 1).

Uncomputation
In the final step, the inverse quantum phase estimation algorithm sets back the clock register to ( 0⟩ ⊗ n ) C and leaves the remaining state as Postselecting on the ancilla 1⟩ S gives the final state . The inverse of the Hermitian matrix A can be written as ∑ j 1/λ j u j ⟩⟨u j , and hence A −1 b⟩ matches ∑ j = 1 N β j /λ ~j u j ⟩ I .This outcome state, in the standard basis, is component-wise proportional to the exact solution x of the system Ax = b [17].

Linear regression utilising HHL
Linear regression models a linear relationship between a scalar 'response' variable and one or more 'feature' variables.Given a nunit data set {y i , x i1 , …, n , a linear regression model assumes that the relationship between the dependent variable y and a set of p attributes, i.e. x = {x 1 , …, x p } is linear [12].Essentially, the model takes the form where ϵ i is the noise or error term.Here i ranges from 1 to n. x i T denotes the transpose of the column matrix x i .x i T β is the inner product between vectors x i and β.These n equations may be more compactly represented in the matrix notation, as y = X β + ϵ.Now, we will consider a simple example with three feature variables and a (3) (4) (5) To estimate β we will use the popular 'least squares' method, which minimises the residual sum of squares If X is positive definite (and in turn has full rank), we can obtain a unique solution for the best fit β ^, which is (X T X) −1 X T y.It can happen that all the columns of X are not linearly independent and by extension X is not full rank.This of situation might occur if two or more of the feature variables are perfectly correlated.Then X T X would be singular and β ^ would not be uniquely defined.Nevertheless, there exist techniques like 'filtering' to resolve the non-unique representations by reducing the redundant features.Rank deficiencies might also occur if the number of features p exceeds the number of data sets N. If we estimate such models using 'regularisation', then redundant columns should not be left out.The regulation takes care of the singularities.More importantly, the final prediction might depend on which columns are left out [22].
Equations ( 3)-( 6) may be expressed in the matrix notation as Note that unlike common convention, our representation of X does not contain a column full of 1's corresponding the bias term.This representation is used simply because of the convenient form that we obtain for X T X.The final result remains unaffected as long as y = X β represents the same linear system.Now Thus, we need to solve for β ^ from X T X β ^= X T y [12].

Quantum circuit
Having discussed the general idea behind the HHL algorithm in Section 2 and its possible application in drastically speeding up multiple regression in Section 3, we now move on to the quantum circuit design meant to solve the 4 × 4 linear system which we encountered in Section 3, i.e.X T X β ^= X T y.For sake of convenience, we will now denote X T X with A, β ^ with x and X T y with b.The circuit requires only 7 qubits, with 4 qubits in the 'clock register', 2 qubits in the 'input register' and the remaining 1 as an 'ancilla' qubit.At this point, it is imperative to mention that we specifically chose the form of the regression data points in the previous section such that A turns out to be Hermitian, has four distinct eigenvalues of the form λ i = 2 i − 1 and b has a convenient form which can be efficiently prepared by simply using two Hadamard gates A is a Hermitian matrix with eigenvalues λ 1 = 1, λ 2 = 2, λ 3 = 4 and λ 4 = 8.The corresponding eigenvectors encoded in quantum states u j ⟩ may be expressed as T can be written as ∑ j = 1 j = 4 β j u j ⟩ where each β j = (1/2).We will now trace through the quantum circuit in Fig. 2. q 0 ⟩ and q 1 ⟩ are the input register qubits which are initialised to a combined quantum state which is basically the state-encoded format of b.This is followed by the quantum phase estimation step which involves a Walsh-Hadamard transform on the clock register qubits j 0 ⟩, j 1 ⟩, j 2 ⟩, j 3 ⟩, a clock register controlled unitary gates U 2 0 , U 2 1 , U 2 2 and U 2 3 , where U = exp(iAt/16) and an inverse quantum Fourier transform on the clock register.As discussed in Section 2, this step would produce the state , which is essentially the same as ∑ j = 1 N β j u j ⟩ I λ ~jt 0 /2π C , assuming t 0 = 2π.Also, in this specific example λ ~j⟩ = λ j ⟩, since the 4 qubits in the clock register are sufficient to accurately and precisely represent the 4 eigenvalues in binary.As far as the endianness of the combined quantum states is concerned, we must keep in mind that in our circuit q 0 ⟩ is the most significant qubit and q 3 ⟩ is the least significant qubit.Next is the R(λ ~−1 ) rotation step.We make use of an ancilla qubit s⟩ (initialised in the state 0⟩), which gets phase shifted depending upon the clock register's state.Let us take an example clock register state (binary representation of the eigenvalue corresponding to u 3 ⟩, that is 4).In this combined state, q 1 ⟩ is in the state 1⟩ while q 0 ⟩, q 2 ⟩ and q 3 ⟩ are all the in state 0⟩.This state will only trigger the R y (2π/2 r ) rotation gate, and none of the other phase shift gates.Thus, we may say that the smallest eigenvalue states in C cause the largest ancilla rotations.Using linearity arguments, it is clear that if the clock register state had instead been b⟩, as in our original example, the final state generated by this rotation step would be For this step, an a priori knowledge of the eigenvalues of A was necessary to design the gates.For more general cases of eigenvalues, one may refer to [17].

Simulation
We simulated the quantum circuit in Fig. 2 using the Qiskit Aer QasmSimulator backend [23], which is a noisy quantum circuit simulator backend.One of the main hurdles while implementing the quantum program was dealing with the Hamiltonian simulation step, i.e. implementing the controlled unitary U = e iAt .Taking a cue from the Cao et al. paper [17] which employed the group leaders optimisation algorithm (GLOA) [24], we approximately decomposed the U gate into elementary quantum gates, as shown in Fig. 3.It is however important to keep in mind that using GLOA to decompose the U is useful only when the matrix exponential e iAt is readily available.Let us call the resulting approximated unitary U.The parameters of the R x and R zz gates given in [17] were refined using the scipy.optimize.minimizefunction [25], to minimise the Hilbert-Schmidt norm of U − U (which is a measure of the relative error between U and U).The scipy.optimize.minimizefunction makes use of the quasi-Newton algorithm of Broyden, Fletcher, Goldfarb and Shanno [26] by default.Also, we noticed that it is necessary to use a controlled-Z gate instead of a single qubit Z gate (as in [17]).
A sample output of the Qiskit code has been shown in Fig. 4. It is clear from the low error value that the gate decomposition we used for U helps to approximately replicate the ideal circuit involving U. Also, the difference between the predicted and simulated results (with shots = 10 5 ) has been shown in Fig. 5.
One should remember that using genetic algorithms like GLOA for the Hamiltonian simulation step is a viable technique only when cost of computing the matrix exponential e iAt is substantially lower than the cost of solving the corresponding linear system classically.A popular classical algorithm for calculating matrix exponentials is the 'the scaling and squaring algorithm' by Al-Mohy and Higham [27][28][29], whose cost is O(n 3 ), and which is generally used along with the Padé approximation by Matlab [30] and SciPy [25].However, this algorithm is mostly used for small dense matrices.For large sparse matrices, better approaches exist.For instance, in the Krylov space approach [31,32], an Arnoldi algorithm is used whose cost is in the ballpark of O(mn), where n is the matrix size and m is the number of Krylov vectors which need to be computed.In general, for large sparse matrices, GLOA may be useful but that decision needs to be made on a case-by-case basis depending on the properties of the matrix A.

Discussion and conclusions
We have noticed that any multiple linear regression problem can be reduced to an equivalent QLSP.This allows us to utilise the general quantum speed-up techniques like the HHL algorithm.However, for HHL we need low-error low-cost circuits for the Hamiltonian simulation and controlled rotation steps to get accurate results.There already exist well-defined deterministic approaches like the celebrated Solovay-Kitaev and Trotter-Suzuki algorithms [33][34][35][36] that can help to decompose the Hamiltonian simulation unitary U up to arbitrary accuracy.However in most cases, they provide neither minimum cost nor efficiently generable gate sequences for engineering purposes.The Solovay-Kitaev algorithm, which scales as O(log(1/ϵ)), takes advantage of the fact that quantum gates are elements of the unitary group U(d), which is a Lie group and a smooth differentiable manifold, in which one can do precise calculations regarding the geometry and distance between points.Similarly, the Trotter-Suzuki algorithm uses a product formula that can precisely approximate the Hamiltonian exponential, and which exploits the structure of commutation relations in the underlying Lie algebras.In practical engineering scenarios, however, one would prefer a low-cost gate sequence that approximates the unitary just well enough, rather than a high-cost and exact or almost exact decomposition.This is where stochastic genetic algorithms like GLOA come into play.These are heuristic algorithms that do not have welldefined time complexities and error bounds.Given a particular matrix A, if the time taken to find a circuit approximation for the Hamiltonian evolution is (say) polynomial-time O(n c ) for some constant c, the exponential speedup disappears.This remains an issue with GLOA because we still need to classically compute the 2 n × 2 n unitary matrix of the evolution we are trying to approximate.The real advantage of GLOA shows up when dealing with large data sets, as the standard GLOA [17,24] restricts the number of gates to a maximum of 20 (a detailed explanation of GLOA is available in the supplementary material).So, while the number of linear equations and the size of the data sets can be arbitrarily large, it will only ever use a limited number of gates for approximating the Hamiltonian simulation.This property of GLOA ensures that time taken for Hamiltonian simulation remains nearly independent of the size of the data sets as they grow large.Our circuit design technique will be useful for engineering purposes, particularly when the regression analysis is to be done on extensive amounts of data.Using GLOA, it is also often possible to incorporate other desired aspects of circuit design into the optimisation, like specific choices of gate sets.The maximum size of data set on which we can use this approach is only limited by the number of qubits available in the quantum processors.
It is noteworthy that regression analysis is widely used in various scientific and business applications, as it enables one to understand the relationship between variable parameters and make predictions about unknowns.Every day in the actual world, the handling of larger and larger data sets craves for faster and more efficient approaches to such analyses.It is natural to ask whether it might be possible to leverage quantum algorithms for this purpose.The answer seems to be a yes, although there remain several technical and engineering challenges; the dearth of sufficient experimental realisations is testimony to this.Our work on quantum multiple linear regression, which is a first of its kind, shows a practical circuit design method that we believe will pave the path for more viable and economic experimental implementations of quantum machine learning protocols, addressing problems in business analytics, machine learning, and artificial intelligence.In particular, this work opens a new door of possibilities for time-and cost-efficient experimental realisations of various quantum machine learning protocols, exploiting heuristic approaches like GLOA, which will make incrementally large amounts of data tractable.

Fig. 1
Fig. 1 HHL algorithm schematic [The HHL algorithm schematic (Fig. 1) was generated using the TikZ code provided by Dr. Niel de Beaudrap (Department of Computer Science, Oxford University).It was inspired by figure 5 of the Dervovic et al. paper [15].]

Fig. 2 Fig. 3
Fig. 2 Quantum circuit for solving a 4 × 4 linear system Ax = b.Here f = exp(iAt/16) and τ = π/2 r .The top qubit ( s⟩) is the ancilla qubit.The four qubits in the middle ( j 0 ⟩, j 1 ⟩, j 2 ⟩ and j 3 ⟩) stand for the clock register C. The two qubits at the bottom ( q 0 ⟩ and q 1 ⟩) form the input register I and two Hadamard gates are applied on them to initialise the state b⟩

Fig. 4 Fig. 5
Fig. 4 Sample output of the Qiskit simulation 2, pp.55-61 This is an open access article published by the IET under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/)financial support of IISER Kolkata.The authors acknowledge the support of the Qiskit SDK in enabling them to simulate the quantum circuits.They appreciate the assistance received from Charles Moussa (Oak Ridge National Lab, Tennessee) in comprehending the intricacies of the GLOA algorithm.They also acknowledge the help received from members of the Quantum Computing Stack Exchange and the Qiskit team in various stages of the research project.