Boosted Ensembles of Qubit and Continuous Variable Quantum Support Vector Machines for B Meson Flavour Tagging

The recent physical realisation of quantum computers with dozens to hundreds of noisy qubits has given birth to an intense search for useful applications of their unique capabilities. One area that has received particular attention is quantum machine learning (QML), the study of machine learning algorithms running natively on quantum computers. Such algorithms have begun to be applied to data intensive problems in particle physics, driven by the expected increased capacity for pattern recognition of quantum computers. In this work we develop and apply QML methods to B meson flavour tagging, an important component of experiments in particle physics which probe heavy quark mixing and CP violation in order to obtain a better understanding of the matter-antimatter asymmetry observed in the universe. We simulate boosted ensembles of quantum support vector machines (QSVMs) based on both conventional qubit-based and continuous variable architectures, attaining effective tagging efficiencies of 28.0% and 29.2% respectively, comparable with the leading published result of 30.0% using classical machine learning algorithms. The ensemble nature of our classifier is of particular importance, doubling the effective tagging efficiency of a single QSVM, which we find to be highly prone to overfitting. These results are obtained despite the strong constraint of working with QSVM architectures that are classically simulable, and we find evidence that continuous variable QSVMs beyond the classically simulable regime may be able to realise even higher performance, surpassing the reported classical results, when sufficiently powerful quantum hardware is developed to execute them.

In this work we introduce ensembles of boosted quantum support vector machines (QSVMs) as a technique for performing B meson flavour tagging near the level of state-ofthe-art classical algorithms.QSVMs are powerful classifiers which both have the potential to exponentially outperform their classical counterparts [42] and are capable of being implemented on near-term NISQ devices [34], making them attractive candidates to study in the search for practically useful applications of QML.Indeed, QSVMs have already been employed in the analysis of high energy physics data, for example to distinguish signal from background in neutral B meson decays [34] and ttH production [37].Here we apply them for the first time to full-scale B factory simulated data, with no simplifying assumptions about the number or type of decay products resulting from the collision, and including background derived from real experimental conditions, in order to perform direct comparisons with the latest classical techniques.This necessitates using QSVMs which accept 130 datapoints as input, the simulation of which involves extensive classical computing resources.Performing such large-scale simulations is an important milestone on the journey to QML algorithms which genuinely outperform their classical counterparts in real world scenarios.
Our investigation of QSVMs for B meson flavour tagging is performed by benchmarking two implementations: conventional qubit based QSVMs (see Figure 1(a,b)), and QSVMs based on the promising continuous variable (CV) quantum computing model [21,35,[43][44][45] (Figure 1(c,d)).In contrast to quantum computers built from qubits (as realised for example via superconducting circuits [46], donor implantation in semiconductors [47] and trapped ions [48]) CV quantum computers operate on bosonic modes, realised in quantum optical systems [44,45].Our decision to employ CV quantum computers is inspired by the observation that they can naturally emulate the highly successful radial basis function kernel [49] popularly employed in classical SVMs, which can be used as a starting point for developing more interesting quantum kernels.Indeed, we find that QSVMs constructed from Gaussian operations in the CV picture (CV-QSVMs) can outperform QSVMs constructed in the usual qubit model out of single qubit rotations and entangling 2-qubit controlled rotations, achieving tagging efficiencies of 29.2% and 28.0% respectively, approaching the peak tagging efficiency published using classical machine learning techniques (30.0% using fast boosted decision trees [5]).Crucial to the success of the QSVMs are the twin applications of ensemble learning and boosting.By constructing ensembles of 200 QSVMs and averaging the re- sults, while also boosting each individual QSVM using the AdaBoost algorithm [50], we achieve drastic increases in performance over a single, non-boosted QSVM (14.6% and 14.0% for qubit and continuous variable QSVMs respectively).Thus, boosted ensembles transform a weak quantum classifier into one which is commensurate with powerful classical ML techniques.Moreover, our results are achieved while working solely with classically simulable QSVMs, severely restricting the available choices for the design of the CV-QSVMs.By analysing the performance of the CV-QSVMs in a simplified setting resulting from a dimensionality reducing PCA transformation [51] we argue that CV-QSVMs beyond the classically simulable regime may be able to outperform the reported classical results, when quantum computing hardware becomes capable enough to run them.

II. METHODS
A. Quantum Support Vector Machines Support vector machines (SVMs) are linear classifiers on data which has typically been non-linearly mapped into a high dimensional feature space [52].Given two classes of data, an SVM attempts to find a hyperplane in the feature space which maximally separates them.In a quantum support vector machine (QSVM) the mapping is into a quantum Hilbert space, x → |ψ(x)⟩ ⟨ψ(x)|.QSVMs were one of the first QML algorithms to be introduced [18], and are thought to offer the potential for quantum advantage due to their ability to implement feature maps which make use of the exponentially large Hilbert spaces available to a quantum computer, hopefully making the embedding of the data into linearly separable subsets possible.Indeed it has been demonstrated that in principle QSVMs can offer an exponential speed-up over their classical counterparts by efficiently detecting patterns equivalent to solving problems which are not thought to be classically solvable in polynomial time [42], although it remains unclear how often such dramatic benefits will be seen in practice on real-world data.Despite the absence of robust theoretical guarantees, however, early work applying QSVMs to problems in high energy physics has shown that they can achieve results competitive with classical methods, at least on small-scale data [34].Importantly, one never has to explicitly read out any (exponentially large) quantum states, as an SVM does not explicitly utilise the embedding of individual datapoints into the embedding space, but rather only the pair-wise inner products between the embedded datapoints, information which is readily available to quantum computers.Specifically, given an embedding ψ, a QSVM is a function only of the kernel matrix over all pairs (x i , x j ) of training events.Having used a quantum computer (or in our case, a simulation of a quantum computer) to calculate K, the rest of the procedure is an entirely classical algorithm implemented within many standard ML frameworks.In this work we utilise the implementation of scikit-learn [53].
The key component of a QSVM, then, is the map which embeds a datapoint x into the Hilbert space of the quantum computer.Here we implement this map in two different ways, considering both circuits constructed from single qubit rotations and two qubit entangling gates in the standard qubit picture (see Figure 1(a,b)), and circuits constructed from alternating layers of displacement and squeezing operations in the continuous variable (CV) picture of quantum computing [43] (see Figure 1(c,d)).Explicitly, in the qubit case we have (for a circuit with d layers) where H is the Hadamard gate, R j z a z rotation on the jth qubit (similarly for R j y ) and CR a,b x a controlled x rotation with control qubit a and target qubit b.We work wth n = 10 qubits.While our qubit based QSVMs are quite standard and similar to designs which have previously appeared in the literature [34], CV based QSVM architectures are comparatively understudied, and so we now describe those models in detail.

B. Continuous Variable Quantum Support
Vector Machines Continuous variable quantum computers [21,[43][44][45] have emerged as a candidate architecture for large-scale quantum computers developing in parallel to the mainstream qubit based systems, and are defined by their use of continuous quantum variables, quantum systems with continuous degrees of freedom.Concretely, the fundamental units of a CV quantum computer are qumodes (as opposed to qubits), which are states of the countably infinite dimensional bosonic Fock space F + (C), with the total Hilbert space of an n qumode CV quantum computer then being H = (F + (C)) ⊗n .Experimen-tally, such systems are most readily constructed within the framework of quantum optics [54], with each qumode corresponding to a different mode of quantised light, representable by a quantum harmonic oscillator.The state of a single such qumode |ψ⟩ qumode may be written in the photon number basis, with transitions between states of definite photon number implemented by the usual harmonic oscillator annihilation and creation operators a and a † : Our continuous variable quantum support vector machines (CV-QSVMs) involve embedding the event data x into states in H via parametrised operators built from a and a † .In particular, we employ multi-mode displacement and squeezing operations, respectively defined by where a i (a † i ) is the annihilation (creation) operator acting on the ith qumode, and β and γ are hyperparameters which we set to be β = γ = 0.1.The action of the squeezing and displacement operators may be visualised by considering their effect on the Wigner functions W (x, p) of the states |ψ⟩, where we recall with |x⟩ and |p⟩ being eigenstates of the position and momentum operator, respectively x = â † + â / √ 2 and p = i â † − â / √ 2. A quasi-probability distribution on phase space, the Wigner functions of a single mode (initially in the vacuum state) subjected to displacement and squeezing are shown in Figure 1(d), explaining the nomenclature "displacement" and "squeezing".
As previously discussed, the contribution of the quantum computer in the QSVM algorithm is to carry out the mapping x → |ψ(x)⟩ which embeds a datapoint x into the Hilbert space of the quantum computer, and to take inner products of embedded datapoints.In the CV setting we implement this mapping with circuits constructed from alternating layers of displacement and squeezing operations, for various values of l (see Figure 1(c)).We simulate the circuits within the framework of bosonic-qiskit [55], which  9), the flavour of the signal side meson can then be inferred from the tag side flavour.For our flavour tagging algorithm we employ ensembles of N quantum support vector machines (QSVMs) which are boosted with the AdaBoost algorithm [50] for G generations.At testing time, the flavour is predicted by a majority vote of the QSVMs, with the confidence |qr| determined by the margin of the vote.We consider both QSVMs constructed from standard parameterised gates acting on qubits (see Figure 1(a)) and QSVMs constructed from parametrised Gaussian operations acting on qumodes (Figure 1(c)).
allows us to represent a qumode as a set of qubits.Although each qumode is formally a state of an infinite dimensional Hilbert space (Equation 3), we can in practice truncate these spaces at some dimension D and represent each qumode with log 2 D qubits.Further details may be found in Ref. [55].In this work we take D = 8, thus assigning three physical qubits to each logical qumode.

C. B Meson Flavour Tagging
B 0 −B 0 pairs are routinely created by asymmetric electronpositron collisions at the Υ(4S) resonance at the Belle-II experiment (see Figure 2) [9] in order to investigate some of the less well understood aspects of the Standard Model, including CP violating effects [56][57][58] and the possibility of of lepton flavour non-universality [2,4,59].Flavour tagging is the process of determining the quark flavour content of each of the B mesons, and is a critical step of the analysis of many experiments which probe CP violation and B 0 − B 0 mixing [5].It utilises the maximally entangled form of the coherent state |Ψ⟩ prepared by the e + e − → Υ(4S) → B 0 B 0 transition, and the resulting perfect anticorrelation of the flavours of the two mesons, to determine the flavour of a B meson which is involved in otherwise ambiguous (neutral) decays.For example, to investigate the asymmetry in the decay rate Γ of B 0 s and B 0 s to the CP eigenstate π 0 π 0 , one can infer the flavour of the B meson which underwent the decay to the neutral final state π 0 π 0 (the signal side meson) by determining the flavour of the other B meson (the tag side meson) and then invoking the anti-correlation of the flavour of the B mesons at the point of the creation.(see Figure 1 and Ref. [9]).The accuracy with which one can carry out the flavour tagging process directly impacts upon the precision of measurements of CP asymmetries such as in Equation 10[9].Improved flavour tagging algorithms therefore allow for increasingly sharp tests of SM predictions, and may lead to the discovery of New Physics beyond the SM [56,57].
As an example of flavour tagging we consider the semileptonic decay B 0 → Xl + ν l for some hadron X (see Figure 3).In this case the positive charge of the primary lepton l + unambiguously determines the flavour of the parent B meson (c.f. the charge conjugate process B 0 → Xl − ν l , also shown in Figure 3).Unfortunately the charge of a resulting lepton alone is in general insufficient to determine the flavour, as for example the hadron X may itself decay, emiting a secondary lepton of the opposite charge to the primary lepton.Such a secondary decay product will however have a different momentum distribution to that of the primary lepton, and by combining the momenta and particle type of all of the decay products we can hope to infer the flavour with high probability.In practice this has been most successfully accomplished by feeding all of the available information from the event into a classical ML model, with both fast boosted decision trees and deep neural networks having been employed successfully [5,9,60].
Given an event E, a flavour tagger customarily outputs a prediction qr ∈ [−1, 1], with q ∈ {−1, 1} denoting the predicted flavour, and r = |qr| the confidence of the prediction.The performance of a flavour tagger is typically measured by its effective tagging efficiency ϵ eff on a set of test events, defined as where i indexes mutually orthogonal bins of events corresponding to predictions of various levels of confidence, and w i is the fraction of events in the ith bin which are misclassified by the flavour tagger.In this work we employ seven bins, consistent with the Belle-II convention [5,9].The definition of tagging efficiency given in Equation 11 is employed as the figure of merit of flavour tagging algorithms (rather than, say, the raw accuracy) due to the observation [9] that the statistical uncertainty σ of CP asymmetric measurements scales approximately as σ ∝ ϵ −1/2 eff .As a major goal of flavour tagging is to serve as a step in the analysis of CP violating decays, maximising the tagging efficiency becomes the primary goal when training a classifier.Previously, (classical) fast boosted decision trees and deep neural networks have been employed to achieve effective tagging efficiencies of 30.0% and 28.8% respectively [5].

III. RESULTS AND DISCUSSION
The tagging efficiencies obtained on the training and test data for a generic 10 qubit 52 layer QSVM as depicted in Figure 1(a) are shown in Figure 4 as a function of the number of events used in the training process.We observe severe overfitting, with the tagging efficiency on the unseen test data failing to rise above 12% (c.f. the classical state of the art value of ∼ 30%), depite extremely high performance on the training data.Remarkably, the high expressibility of QSVMs [61] allows it to exploit the 2 n qubits = 1024 dimensional Hilbert space to almost perfectly fit entire training sets of 100,000 examples without successfully generalising.We find that varying the strength of the QSVM regularisation is of limited help in this regard (see Figure 8, Appendix A).This failure to generalise beyond the training set can also be seen as a reflection of the difficulty of B meson flavour tagging as a classification task, with the high-performance classifiers developed to date employing large training datasets of ∼ 10 7 events [5,9,60].Unfortunately, SVMs scale poorly with the size of the training dataset (at least quadratically [52]) which makes training an SVM, either classical or quantum, on a comparable number of events infeasible.
In order to mitigate this issue we train large ensembles of N QSVMs, which are individually trained on a manageable number (∼ 10 4 ) of events (see Figure 2).At testing time each QSVM outputs its predicted qr value for the test events, and the final output of the global classifier is taken to be the average value over the ensemble.By taking N = 200 we ensure that the total number of events seen by the ensemble classifier is comparable to that of the classical algorithms (see Figure 6 in Appendix A for the effect of varying N ).Additionally, we employ the method of AdaBoost [50] to boost each QSVM in the ensemble, for a total number of boosting iterations G. Boosting is a common technique in which a sequence of classifiers is trained, with each classifier in the sequence having enhanced focus on the training examples which were misclassified in the previous generation.We therefore train a collection of N G total QSVMs (see Figure 2).The result of using this boosted ensemble scheme with the standard QSVM architecture of Figure 1(a) is shown in Figure 5(a) as a function of both the number of events used to train each QSVM and the AdaBoost generation number, for 50,000 test events.We obtain substantial gains over using a single large QSVM, more than doubling the tagging efficiency on the test data.In fact, even at the first generation of the AdaBoost procedure (before any boosting has taken place) we observe a considerable improvement over the single QSVM.We attribute this gain to an interference  1(a), indexed by the number ntrain of training event seen by each QSVM and as a function of the number of boosting iterations.By switching to this ensemble technique we observe a massive increase in performance relative to the case of a single QSVM (cf Figure 4 and Figure 6 in Appendix A).Moreover, boosting with AdaBoost [50] provides a further significant improvement over the vanilla QSVM ensemble.(b) Similarly, we evaluate the performance of an ensemble of 200 boosted CV-QSVMs (with l = 1, see Equation 8) throughout the boosting process.With a peak tagging efficiency of 29.2%, the CV-QSVMs are competitive with state-of-the-art classical ML techniques [5,9].Classically simulating an ensemble of CV-QSVMs with l > 1 on the full 130 dimensional data is computationally intractable.Additional plots showing the effect of changing the number of CV-QSVMs in the ensemble and the binning strategy used for the flavour tagging (see Equation 11) may be found in Figures 6 and 7 in Appendix A. (c, d) In order to investigate the performance of deeper QSVMs we consider a reduced dataset consisting of the top 5 PCA components of the 130 dimensional input data.In both the qubit and CV cases we find increasing the depth of the circuits highly beneficial.Similar plots calculated using QSVMs with access to various numbers of PCA components are shown in Figure 9.In all cases we test on 50,000 events.effect in which the various QSVMs in the ensemble "overfit in different ways", leading on average to a stronger classifier.Through these improvements we achieve a peak tagging efficiency of 28.0% on the test set, approaching the classical results.
Next we investigate the performance of the continuous variable based QSVMs depicted in Figure 1(c) and defined in Equation 8.As our CV data encoding scheme requires a qumode for each element of the input data vector (Equation 5), and each event consists of 130 dimensional vectors [5], we are required to run simulations of 3 × 130 = 390 qubits (recall that we simulate each qumode via three qubits [55]).Unfortunately, as the requirements of simulating generic circuits grows exponentially with the number of qubits, simulating arbitrary circuits of this size is computationally intractable.Because of this computational restriction, in order to test CV-QSVMs of this form using all 130 datapoints available in an event we are forced to restrict to circuits with l = 1 (see Equation 8).Such circuits possess entanglement only between the (sets of three) qubits which make up a given qumode, with no inter-mode interactions (see Figure 1(c)) and so are easy to simulate classically.The resulting tagging efficiencies for boosted ensembles of 200 CV-QSVMs are shown in Figure 5(b) for varying amounts of training data.Although the restriction to circuits with depth l = 1 significantly reduces their power, we find that QSVMs constructed in this way achieve results competitive with those reported previously via classical ML methods (30.0% via fast boosted decision trees and 28.8% via deep neural networks [5]), reaching 29.2% tagging efficiency on a test set of 50,000 events using an ensemble of 200 QSVMs each trained on 50,000 events.A breakdown of the wrong tag and total event fractions for each bin for this classifier is given in Table I.Due to the Gaussian nature of the displacement embedding of the event data into the qumodes (see Figure 1(d)), this classifier is (approximately) an ensemble of radial basis function SVMs, and is therefore essentially a classical model.In order to investigate the performance of deeper CV-QSVMs with nontrivial interactions between the various qumodes (i.e.l > 1 in Equation 8) we perform a PCA transformation on the event data to reduce its dimensionality from 130 to 5, resulting in circuits with a manageable 3 × 5 = 15 qubits and a level of complexity at which it is feasible execute deep, highly entangling feature maps.
The performance of the CV-QSVMs on the PCA reduced data is shown in Figure 5(d).We find that deeper, more expressive feature maps significantly outperform the simple l = 1 map on the PCA reduced data.We also evaluate the qubit-based QSVMs on the reduced dataset, with the results shown in Figure 5(c).As with the full 130 dimensional event data, we find that the CV models are capable of outperforming their generic qubit-based counterparts.Although the raw tagging efficiency on the five component PCA data (unsurprisingly) suffers greatly from the reduction of dimensionality, the relative increase in efficiency by moving to deeper circuits displayed in Figure 5(d) is encouraging, with the highly entangled, more difficult to classically simulate CV-QSVMs recording considerable improvements in performance over the separable l = 1 case.This hints at the prospect of achieving stronger results by increasing the depth of the l = 1 maps of Figure 5(b) which classify the full data.Such circuits, however, are beyond both our ability to simulate classically and the capabilities of the noisy, small scale quantum computers available today.

IV. CONCLUSION
Machine learning has come to play an important role in the analysis of high energy physics data, and is only expected to increase in usefulness as the amount of data created by particle accelerators increases in the future, as for example in the planned high luminosity upgrade to the LHC [62].The prospect of using QML methods to augment and, hopefully, improve upon these classical techniques has been widely recognised in the particle physics community, with forward-looking studies having been already undertaken  despite quantum computing hardware remaining in its infancy [31][32][33][34].In this work we have performed large scale simulations of qubit and continuous variable based QSVMs, finding that they can achieve results competitive with those from the classical ML algorithms which are currently employed in practice.This is achieved despite our inability to simulate a large quantum computer in full generality, and subsequent restriction to studying the small class of quantum feature maps that are efficiently classically simulable, with promising results achieved despite this heavy restriction.A full investigation of the performance of arbitrary quantum feature maps on high dimensional data such as that produced at Belle-II must wait for the emergence of physcial quantum computers of sufficiently high quality to manipulate several hundred qubits (or qumodes) with high fidelity.Excitingly, according to the published roadmaps of major quantum hardware developers [63][64][65], such capability may be only a few years away.

FIG. 1 .
FIG. 1.Quantum Support Vector Machine Architectures.We implement quantum support vector machines (QSVMs) based on both qubit and continuous variable based quantum computing hardware.(a) The architecture of our qubit-based QSVMs.The embedding consists of repeated layers of Hadamard gates, data encoding z rotations, parameterised y rotations and entangling controlled x rotations.These layers are repeated until the entire event (a vector with 130 components) has been encoded.We utilise 10 qubits QSVMs with 52 such layers (so the input data is encoded 10 × 52/130 = 4 times).(b) In the initial stages of the embedding, each qubit is mapped from the state |0⟩ by a data dependent rotation, which can be represented on the Bloch sphere.(c) As described in Equation 8, our embedding procedure in the continuous variable case consists of a variable number l of alternating layers of displacement and nearest neighbour two mode squeezing operations (see Equations 5 and 6 respectively).When l = 1 only single mode displacement operations are employed, and the modes remain unentangled.The initial state |0⟩ ⊗n denotes the vacuum state of the system.(d) The effects of the squeezing and displacing operations on the vacuum state can be visualised by examining the resulting Wigner functions (the Wigner function of the vacuum is a Gaussian centred at the origin of phase space).Left: a displaced vacuum state; Right: a squeezed vacuum state.The magnitude of the displacing and squeezing is determined by the input event, producing a data dependent encoded state.

FIG. 2 .
FIG. 2. Flavour tagging with boosted ensembles of quantum support vector machines.The collision of e − e + pairs at the Υ(4S) resonance can lead to the production of entangled B 0 B 0 pairs, one of which subsequently decays into a flavour agnostic CP eigenstate (the signal side meson) and the other to a possibly flavour-specific state (the tag side meson).The tag side decay products are inputted into a flavour tagging algorithm which attempts to determine the flavour of the parent (tag-side) B meson.Due to the entangled state in which the pair of B mesons were originally created (see Equation9), the flavour of the signal side meson can then be inferred from the tag side flavour.For our flavour tagging algorithm we employ ensembles of N quantum support vector machines (QSVMs) which are boosted with the AdaBoost algorithm[50] for G generations.At testing time, the flavour is predicted by a majority vote of the QSVMs, with the confidence |qr| determined by the margin of the vote.We consider both QSVMs constructed from standard parameterised gates acting on qubits (see Figure1(a)) and QSVMs constructed from parametrised Gaussian operations acting on qumodes (Figure 1(c)).

FIG. 3 .
FIG.3.Primary leptons.Top: in the semileptonic decay B 0 → Xl + ν l (for some hadron X, here a D − meson) the positive charge of the so-called primary lepton l unambiguously identifies the flavour of the original B meson.Bottom: similarly, in the charge conjugate process B 0 → Xl − ν l the charge of the primary lepton again determines the flavour of the parent B meson.

FIG.
FIG.Overfitting in QSVMs.The tagging efficiencies on training and test data for a 10 qubit, 13 layer QSVM as in Figure1(a) as a function of the training set size.We observe massive overfitting, with the QSVM able to almost perfectly classify a training set of 100,000 events, while barely generalising at all to the test data.Modifying the regularisation hyperparameter of the QSVM has only a minor effect on the test performance (see Figure8, Appendix A).Due to the expensive scaling of SVMs with the size of the training set, it is infeasible to combat this by increasing the size of the training set far beyond 100,000

6 FIG. 5 .
FIG. 5. Flavour tagging efficiencies.(a)The tagging efficiencies for ensembles of 200 10-qubit QSVMs with depth d = 52, as depicted in Figure1(a), indexed by the number ntrain of training event seen by each QSVM and as a function of the number of boosting iterations.By switching to this ensemble technique we observe a massive increase in performance relative to the case of a single QSVM (cf Figure4and Figure6in Appendix A).Moreover, boosting with AdaBoost[50] provides a further significant improvement over the vanilla QSVM ensemble.(b) Similarly, we evaluate the performance of an ensemble of 200 boosted CV-QSVMs (with l = 1, see Equation8) throughout the boosting process.With a peak tagging efficiency of 29.2%, the CV-QSVMs are competitive with state-of-the-art classical ML techniques[5,9].Classically simulating an ensemble of CV-QSVMs with l > 1 on the full 130 dimensional data is computationally intractable.Additional plots showing the effect of changing the number of CV-QSVMs in the ensemble and the binning strategy used for the flavour tagging (see Equation11) may be found in Figures6 and 7 in Appendix A. (c, d) In order to investigate the performance of deeper QSVMs we consider a reduced dataset consisting of the top 5 PCA components of the 130 dimensional input data.In both the qubit and CV cases we find increasing the depth of the circuits highly beneficial.Similar plots calculated using QSVMs with access to various numbers of PCA components are shown in Figure9.In all cases we test on 50,000 events.

FIG. 8 .
FIG.8.Regularisation.Although modifying the constant Creg which controls the strength of the regularisation[53] can reduce the overfitting of a single qubit-based QSVM to the training data, we find that it is incapable of significantly increasing the performance on the test data, motivating our consideration of ensemble classifiers.The QSVM used here is as described in Figure4.

= 5 FIG. 9 .
FIG. 9. Number of PCA Components.We find steady improvements in peak tagging efficiency when making more PCA components (out of a total of 130) available to the QSVM ensembles.With each qumode in the CV-QSVMs being simulated with three qubits, we are able to simulate CV-QSVMs accepting up to five PCA components, corresponding to 15 qubits in the backend implementation.The architectures of the QSVMs employed in this Figure are the same as the highest performing architectures in Figure 5(c,d) (i.e.d = 50 in the qubit case and l = 6 in the CV case).

TABLE I .
[9] wrong tag fraction wi, fraction of total events ϵi and r -interval of the i'th bin.For low values of r the classifier is essentially randomly guessing, with a wrong tag fraction w0 ≈ 0.5.More generally, the relationship ⟨ri⟩ ≈ 1 − 2wi is observed as expected[9](see Figure7, Appendix A)