QuDiet: A Classical Simulation Platform for Qubit-Qudit Hybrid Quantum Systems

In the recent years, numerous research advancements have extended the limit of classical simulation of quantum algorithms. Although, most of the state-of-the-art classical simulators are only limited to binary quantum systems, which restrict the classical simulation of higher-dimensional quantum computing systems. Through recent developments in higher-dimensional quantum computing systems, it is realized that implementing qudits improves the overall performance of a quantum algorithm by increasing memory space and reducing the asymptotic complexity of a quantum circuit. Hence, in this article, we introduce \textbf{QuDiet}, a state-of-the-art user-friendly python-based higher-dimensional quantum computing simulator. \textbf{QuDiet} offers multi-valued logic operations by utilizing generalized quantum gates with an abstraction so that any naive user can simulate qudit systems with ease as compared to the existing ones. We simulate various benchmark quantum circuits in \textbf{QuDiet} and show the considerable speedup in simulation time as compared to the other simulators without loss in precision. Finally, \textbf{QuDiet} provides a full qubit-qudit hybrid quantum simulator package with quantum circuit templates of well-known quantum algorithms for fast prototyping and simulation. The complete code and packages of \textbf{QuDiet} is available at https://github.com/LegacYFTw/QuDiet so that other platforms can incorporate it as a classical simulation option for qubit-qudit hybrid systems to their platforms.


INTRODUCTION
A signi cant progress has been made in quantum computing recently due to its asymptotic advantage over classical computing [15,27,28].Moreover, with the introduction of the mathematical notion of a qudit in [26], the boundaries of quantum computing are extended beyond the binary state space.dits are states in a -dimensional Hilbert space where  > 2, thus allowing a much larger state space to store and process information as well as simultaneous control operations [6,7,12,14,21,30,34].It has been shown that usage of qudits leads to circuit complexity 1:2 Turbasu Cha erjee, Arnav Das, Subhayu Kumar Bala, Amit Saha, Anupam Cha opadhyay, and Amlan Chakrabarti reduction as well as enhanced e ciency of some quantum algorithms.For practical demonstrations, qudits have been realized on several di erent hardware, including photonic quantum systems [17], ion-trap systems [22], topological quantum systems [5,9,10], superconducting systems [23], nuclear magnetic resonance systems [13,18], continuous spin systems [1,3] and molecular magnets [25].
In order to continue innovation in the multi-valued logic space for quantum computing, an e cient, easy to use quantum computing simulator with the support of qudits is the need of the hour.Given the size of the unitaries in the qudit space, one also needs to consider the computational costs incurred while simulating such large complex systems, which presents a signi cant challenge even in the binary state-space simulators.
e inception of this simulator was in the wake (or lack thereof) of accessible simulators that were capable of simulating multi-valued logic in an user-friendly manner.is meant that a lot of research time and e ort was previously spent in the construction of matrices and checking their compatibility, dimensions and kronecker products manually before their output could be deciphered from a huge 1D-array [4,11,19].As an example, let say a generalized gate is imposed on ve qudits in a 4-dimensional quantum system.For simulation purpose, the matrix of that generalized gate of 4 5 × 4 5 .., 1024 × 1024 needs to be prepared manually, which apparently makes the simulation time-consuming and error-prone.
Our proposed simulator, named Diet, claims to solve all that by providing suitable abstractions that shy the user away from behemoth calculations and focus on purely the logic building and quantum phenomenology in the higher dimensional space.
Diet does this thanks to its simple, yet e ective, lean architecture that could be used to debug implementations quickly and provide outputs without unnecessary computational or memory overhead.It also provides the users with the exibility of adding gates accordingly to a quantum circuit with only a few commands.
e main contributions of this work are as follows: • e rst of its kind proposal for a simulator based on higher-dimensional state space, multi-valued logic, utilizing generalized quantum gates.
• Using sparse matrices and related algorithms at the core of all quantum operations to unlock potential speed-up.• Using GPU acceleration and e cient memory maps to process large matrices with considerable speedup.• Benchmarking multiple quantum circuits in qudit systems and showing overall simulation time for the di erent backends for the rst time to the best of our knowledge.• A full package with quantum circuit templates for fast prototyping and simulation.e structure of this article is as follows.Section. 2 describes the higher-dimensional quantum circuit and its classical simulation.Section. 3 proposes the higher-dimensional quantum simulator, Diet.Section. 4 analysis the e ciency of the proposed simulator with the help of benchmark circuits.Future scope of the proposed simulator is outlined in Section 5. Section.6 captures our conclusions.

PRELIMINARIES
In this section, rstly, we discuss about qudits and generalized quantum gates.Later we put some light on classical simulation of a higher-dimensional quantum circuit.

Higher-dimensional quantum circuits
Any quantum algorithm can be expressed or visualized in the form of a quantum circuit.Commonly for binary quantum systems, logical qubits and quantum gates comprise these quantum circuits [2].
e number of gates present in a circuit is called gate count and the number of qubits present in a Diet: A Classical Simulation Platform for bit-dit Hybrid antum Systems 1:3 circuit is known as qubit cost.In this work, we mainly deal with qudits and generalized quantum gates since our simulator is based on higher-dimensional quantum computing.

dits.
A logical qudit that encodes a quantum algorithm's input/output in -ary or multi-valued quantum systems is o en termed as data qudit.Another sort of qudit used to store temporary ndings is the ancilla qudit.e unit of quantum information in -dimensional quantum systems is qudit.In the  dimensional Hilbert space H  , qudit states can be substantiated as a vector.
e vector space is de ned by the span of orthonormal basis vectors {|0 , |1 , |2 , . . .| − 1 }.In qudit systems, the general form of quantum state can be stated as where 2.1.2Generalized antum Gates.In this section, a brief discussion on generalized qudit gates is exhibited.e generalization can be described as discrete quantum states of any arity in this way.Unitary qudit gates are applied to the qudits to evolve the quamtum states in a quantum algorithm.It is required to take into account one-qudit generalized gates such as NOT gate (  ), Phase-shi gate (  ), Hadamard gate (  ), two-qudit generalized CNOT gate (  , ) and generalized multi-controlled To oli gate (   , ) for logic synthesis of quantum algorithms in -dimensional quantum systems.For be er understanding, these gates are described in detail: Generalized NOT Gate:   + , the generalized NOT can be de ned as   [20,29].It is also exhibited that this decomposition of To oli gate can be logarithmic in depth as compared to linear depth while using conventional approach of using generalized  gate. is depth reduction is also useful for implementing di erent Diet: A Classical Simulation Platform for bit-dit Hybrid antum Systems 1:5 algorithms in quantum computing.A generalized To oli decomposition in a -ary system using | state is shown in Figure 1.2.2 Classical simulation of a higher-dimensional quantum circuit is section highlights how a program that runs on a classical computer can resemble the evolution of a quantum computer.Before jumping on that let us discuss about the challenges that the current qubit-only classical simulators are facing to simulate qudit systems or higher-dimensional quantum circuits.
• For the state-of-the-art qubit-only simulators [4,11,31,32,35], the simulators need to act on an  qubit state with a 2  × 2  matrix.e dimension of the matrices of the unitary gates are also quite straight-forward as it is only based on qubit systems [24].Due to the engineering challenge of maintaining the dimension of the matrices automatically for the qubit-qudit hybrid systems, the current simulators are unable to provide a solution to simulate the higher-dimensional quantum circuits e ciently, which is addressed in this paper.

•
e other challenge is that it requires a signi cant amount of memory to store the higherdimensional quantum state vectors and to perform matrix multiplication to simulate the generalized quantum gates.Hence, the current simulators reach long simulation times and memory limitations very quickly due to its conventional memory management.In this paper, we also address this issue by designing unitary matrix simulator with various backends to simulate qudit systems e ectively.Before explaining our proposed simulator more elaborately, we would like to discuss about the technicalities of the classical simulation of a higher-dimensional quantum circuit.
To simulate a higher-dimensional quantum circuit, we rst need to specify the dimension of each qudit so that generalized quantum gates or quantum operations can act on a sequence of qudits e ectively [11].
is can be done through a method, which returns a tuple of integers corresponding to the required dimension of each qudit it operates on, as an instance (2, 3, 4) means an object that acts on a qubit, a qutrit, and a ququad.To apply a generalized gate to some qudits, the dimensions of the qudits must match the dimensions it works on.For example, for a single qubit gate, its unitary is a 2 × 2 matrix, whereas for a single qutrit gate its unitary is a 3 × 3 matrix.A two qutrit gate will have a unitary that is a 9 × 9 matrix (3 × 3 = 9) and a qubit-ququad gate will have a unitary that is an 8 × 8 matrix (2 × 4 = 8).e size of the matrices involved in de ning mixtures and channels follow the same pa ern.
A er simulating higher-dimensional quantum circuit by considering the dimension of qudits and generalized gates appropriately, the size of the resultant state is determined by the product of the dimensions of the qudits being simulated.For example, the state vector output a er simulating a circuit on a qubit, a qutrit, and a ququad will have 2 × 3 × 4 = 24 elements, Since, circuits on qudits are always assumed to start in the computational basis state |0 , and all the computational 1:6 Turbasu Cha erjee, Arnav Das, Subhayu Kumar Bala, Amit Saha, Anupam Cha opadhyay, and Amlan Chakrabarti basis states of a qudit are assumed to be |0 , |1 , . . ., | − 1 .Measurements of qudits are assumed to be in the computational basis and for each qudit return an integer corresponding to these basis states.us measurement results for each qudit are assumed to run from |0 to | like |0 to |1 for qubit systems.
3 QUDIET: A QUBIT-QUDIT HYBRID QUANTUM SIMULATOR e lean architecture of Diet has been laid out brie y in the subsequent subsections.Before that the ow on the user end can be summed in Figure 2. is gure shows the high-level description of Diet to understand the general features of the proposed simulator.First, the quantum algorithm is expressed as a quantum circuit with the help of either Diet's QASM speci cation or simple python console.Next, this needs to be compiled to a speci c quantum gate set of Diet.Finally, the quantum circuit is simulated with Diet to get the nal outcome of the given quantum algorithm.

Fig. 2. A user-level compilation flow of the
Diet, where the input is a quantum circuit and the output is the probable quantum states of the input circuit.Now that we have a high level understanding of the compiler, let's take a deep dive and look at the internals in the following subsections.

High level architecture
At its core, Diet is managed by two integral parts: e Moment and the OperatorFlow objects.ese two, however, are simple vector objects that behave like a stack, during the compiler's operation.is is portrayed in Figure 3.A circuit, in essence, is an OperatorFlow object at its heart, running on a Backend.Once a QuantumCircuit object has been instantiated, in the background an OperatorFlow object is also created, which consists of a single Moment object.
is Moment object can carry two types of objects: A InitState, which is the representation of an initial state of a quantum circuit or an QuantumGate, an abstract class, inherited by all quantum gates that are implementable by the simulator.
Once the user has implemented a quantum circuit, using the available commands, the measure all() function is invoked, thereby pushing a Moment carrying measurement gates to all quantum registers.e measurement gate is a symbolic gate that tells the compiler that a program routine has ended and can be executed.e execution occurs when the QuantumCircuit.run()method is invoked, using the circuit's speci ed Backend object which has several interfaces for plug and play operability.

The antum Circuit
e quantum circuit is represented by the QuantumCircuit class in the simulator.Whenever a new quantum circuit is invoked, a antumCircuit object is instantiated.is QuantumCircuit object takes the following arguments for instantiation: • qregs: e dimensions of the quantum registers is represented as a heterogeneous list of integer dimensions.In other words, the dimension of the quantum 'wire', in order, or as a  is is represented by an array of the same length as that of qregs.If none is provided, the registers gets automatically initialized to |0 's.
• backend: is represents the Backend on which the quantum circuit is to be executed.
ere are four Backend objects to choose from, and are elaborated in subsection 3.7.e default backend to be used is the SparseBackend • debug: A ag argument that forms the base of a debugger engine, implemented in a simple manner in this release cycle and shall be expanded upon in future versions for easy debugging and callback functions.
For example, in order to make a circuit with 3 qudits of dimensions 4,5,3 respectively, with initial states being |0 , |3 and |2 , that calculates using the CUDASparseBackend we just need the following lines of python3 code: is shows how easily we can simulate di erent dimensional qudits with this simulator.Let's now take a closer look at these initial states, or more generally, the quantum states that they represent and how does Diet handle them.

Representation of quantum states
Any quantum simulator is incomplete without their interpretation of quantum states.is preliminary version of Diet assumes that quantum states as state vectors. is comes with a caveat that Diet can only deal with states, as represented by an array or a vector list.is will of course be 1:8 Turbasu Cha erjee, Arnav Das, Subhayu Kumar Bala, Amit Saha, Anupam Cha opadhyay, and Amlan Chakrabarti improved upon in future releases where we hope to incorporate density matrices, tensor-network and ZX/ZH calculus based representations.But we shall stick to the notion that these state vectors would be represented by: erefore, a register of qudits | 1  2 . . .  would now need to be represented as: Unlike qubit-only quantum simulators, this presents an engineering challenge: An e cient method of storing these matrices and their computations in memory.On observation, we note that the number these quantum states are, in essence very sparse matrices with non-zero elements sca ered around them.erefore, it seemed natural to use the sparse array format for storing these state vectors.
In order to achieve this, Diet makes use of scipy's Compressed Sparse Column array implementation.We shall be improving on this format to reduce latency of Sparse Matrix Vector multiplication (SpMV) and General Matrix Vector multiplication (GeMV).is is elaborated in the Section V. However we have added support for numpy matrices for small circuits.
is technique is also used when representing quantum operators or quantum gates as we shall see in the following subsection.

One of the key features of
Diet is it's ability to construct generalized gates and operators for quantum computing using multi-valued logic automatically. is is a standout feature and there does not exist a simulator in known literature that allows the construction of single and multi-qudit quantum gates and operators with the ease as that of Diet. e Diet contains a limited gate set.ese are as follows, and their descriptions are available in the Section 2: (1) NOT Gate (XGate) (2) Phase-Shi Gate (ZGate) (3) Hadamard Gate (HGate) (4) CNOT Gate (CXGate) (5) User De ned Gate ( antumGate) (6) Measurement Gate (7) Identity Gate (IGate) Each of these quantum gates take into account two things: e qudit register it is acting on, and the dimension of the qudit register.Using a user-de ned acting register, these quantum gates are able to take into account the dimension of the acting register, and dynamically construct a gate , Vol. 1, No. 1, Article 1. Publication date: January 2016.
Diet: A Classical Simulation Platform for bit-dit Hybrid antum Systems 1:9 unitary at runtime.e XGate and the CXGate have an added functionality, that an arbitrary shi can be induced, as long as the shi has a value less than the dimension of the qudit register.e Measurement Gate in Diet is a special gate in that, it has no unitary associated with it.Up until this version, the Measurement Gate merely acts as ag that signi es the end of a quantum circuit.Any gate post measurement will be ignored by the simulator.
Once the unitary of a generic quantum gate has been created, it then utilizes the same sparse matrix format to store the data.Contrary to quantum states, however, quantum gates utilize scipy's Compressed Sparsed Row matrix implementation.is format will of course, be improved upon to not only facilitate SpMV and GeMV but also SpGeMM or Sparse Matrix-Matrix multiplication and GeMM, or General Matrix-Matrix multiplication [16].However we have added support for numpy matrices for small circuits.
e backends have been engineered to provide minimal speedup using naive algorithms, and has CUDA support for GPU executions.ese backends are elaborated in subsection 3.7.
erefore for demonstration, if we were to use the same quantum circuit as previously, in order to add a Hadamard gate acting on the rst qudit and a CNOT gate with a shi of plus 2, acting on the rst and the third qudit, we simply invoke the following lines of python3 code: qc.h(0) qc.cx((0,2), plus=2) Diet also decomposes To oli gates (Let's say, qc.Toffoli((0,1,2), plus=1) into suitably mapped higher order CXGate objects. is is given as follows: qc.cx((0,1), plus=1) qc.cx((1,2), plus=1) qc.cx((0,1), plus=2) As stated before, these form the basis of a Moment object, which we shall elaborate on next.

The Moment
e Moment object is a forms an abstraction between the QuantumGate and the OperatorFlow objects, in that it maintains the orientation of the quantum gates that are acting on the respective qudits and the OperatorFlow maintains the sequence of execution of the quantum operations.e Moment object is inherently an array of length equal to the breadth of the quantum circuit.e primary job of the Moment object is to maintain the position of an acting register on the intended qudit so that errors in evaluating Kronecker products are avoided when the quantum circuit is executed.
ere can be a single Moment object containing a list of InitState objects spanning across all the qudits in the quantum register and is initialized and pushed into the OperatorFlow object's stack whenever a quantum circuit is initialized.e OperatorFlow object can hold an arbitrary number of Moment objects, containing quantum gates across the breadth of the quantum register.
As per the Diet, whenever the user invokes a quantum gate onto a quantum circuit, the QuantumGate is pushed into the speci c register corresponding to the index in the Moment object.All other corresponding registers or indices in the Moment object shall contain the IdentityGate object.
Something to note here is that, the OperatorFlow and the Moment data structures only store the data when it is pushed.No kronecker product or matrix multiplication operation would be done until the Measurement gates would be pushed and the user invokes the run() method from the quantum circuit object.
Diet also performs preliminary optimizations at the logic level whenever a gate is pushed.Whenever a new gate is pushed into the OperatorFlow stack, the gate is enclosed in a Moment 1:10 Turbasu Cha erjee, Arnav Das, Subhayu Kumar Bala, Amit Saha, Anupam Cha opadhyay, and Amlan Chakrabarti object, while ensuring that an index in the Moment array corresponds to the acting qudit as speci ed by the user.All the other registers have an Identity gate acting on them.When pushing a Moment object containing a quantum gate into the OperatorFlow stack, the simulator checks if the any other immediate predecessor Moment has an Identity gate in them at the same position, if so, the Identity Gate is swapped out for the currently incoming quantum gate. is is done until it reaches an InitState object or it nds a QuantumGate object in any of its immediate predecessors.

The Operator Flow Stack
e OperatorFlow object is at the heart of the Diet: it maintains the order of execution of QuantumGate objects nestled inside the Moment objects.e OperatorFlow inherently maintains a vector list which acts like a stack during execution.
In order to run a quantum circuit, a measure all() is called.is places MeasurementGate objects, across all quantum registers, thereby raising ag variables inside a Moment object.Any Moment containing quantum gates pushed a er the Moment object shall be discarded, post the measure all() command.In order to execute the said circuit, the run() is invoked, thereby outpu ing the statevectors of the quantum states that have non-negative probabilities.
Under the hood, during this time, all circuits prior to the Moment containing the MeasurementGate objects are called to be executed in reverse order.is means that the OperatorFlow object will rst 'pop' out that Moment and evaluate the Kronecker product depending on the type of Backend, inside the Moment and store it in a variable.en it will advance onto the second-to-last Moment, evaluate the Kronecker product, and then store it in a separate variable.Now once these two Kronecker products have been evaluated, Diet will perform a SpGEMM or a GEMM operation, depending on the Backend selected for the circuit.When the matrix products have been evaluated, the two variable storing the Kronecker product is freed from memory. is continues until the Moment containing the InitState objects are reached where the last operation is and SpMV or a GEMV, depending on the Backend chosen.

In
Diet, acceleration can be achieved in two di erent ways.One is through a GPU, i.e., hardware acceleration, and the other is by using sparse matrices, i.e., algorithmic or so ware acceleration.
ese accelerations are delivered through di erent Backends, like CudaBackend.To access the GPU as a host for hardware acceleration, we use CuPy, which is a GPU equivalent for NumPy and SciPy.In terms of so ware acceleration, SciPy is used instead of NumPy, because of its useful interface for Sparse matrices.e very basis of Diet, lies in the availability and choice of Backends.For example, nearly dense matrices have showed no acceleration when run on the SparseBackend.On the contrary, it is a be er choice to use CudaBackend instead and ignore the sparse like Backends when dealing with nearly dense matrices.A more detail discussion on Backends is carried out in next.
Diet is a model-level library, providing high-level building blocks for developing quantum circuit algorithms.e low-level operations such as dot product, kronecker product, etc. ese low-level operations are interfaced through the class Backend. is enables one to use the best backend option based on the type of the circuit and the hardware accessible.
Right now, the backends accessible for use are NumpyBackend, SparseBackend, CudaBackend and CudaSparseBackend 3.7.1 NumpyBackend.NumpyBackend is the default backend used when no backend type is explicitly de ned. is interfaces to the basic numpy operations, without any external optimization.Diet: A Classical Simulation Platform for bit-dit Hybrid antum Systems 1:11 3.7.2SparseBackend.SparseBackend interfaces to scipy's sparse module instead of interfacing to numpy's ndarray.It stores only the nonzero elements of the matrix and reduce the computation time by eliminating operations on zero elements.In cases, this can reduce the matrix size exponentially, showing an overall speedup in execution runtime and memory compression.
is speedup directly depends on the nature of the circuit, where the worst scenario is the arrays being mostly dense.In that case, the SparseBackend will perform nearly like a NumpyBackend.e CUDASparseBackend optimizes the operations using sparse matrices, followed by hardware(GPU) optimization.

Output and Interpretability of results
Output Representation of a quantum circuit is still a less explored domain, especially when dealing with large circuits.
Diet additionally provides some small contributions in terms of Output Representation and Interpretability for the sake of be er research experience.
Diet comes with two types of output representation, O T and two types of output method, O M .O T is the way of output representation, which has two types, print and state.e O T .
provides the raw output state as a binary array.Whereas the O T .provides the output state as a ket string.On the contrary, O M dictates whether the output would provide the probability distribution or the amplitude of the quantum states.By default, a quantum circuit returns the nal ket representation along with the distribution probability, which looks like {|1110100000000 : 1.0 } for an instance.

Example Workflow
Let us now, take an example to understand the ow of work in Diet. e circuit to be executed is shown in Figure 4.In order to do this, we must rst create the quantum circuit by specifying the dimensions of the circuit lines in the quantum circuit, this is done by the following lines of python code: qreg dims = [2,3,3] init states = [0,0,0] backend = SparseBackend qc = QuantumCircuit(qregs=qreg dims, init states=init states, backend=backend)

1:12
Turbasu Cha erjee, Arnav Das, Subhayu Kumar Bala, Amit Saha, Anupam Cha opadhyay, and Amlan Chakrabarti ese lines of code do the following: (1) e InitState objects are created for each of the circuit lines.Each of these InitState objects have the following matrices: Now that the circuit has been initialized with the desired states, we can now add in the required gates.ese gates are invoked by calling the respective methods of the circuit, which then creates the respective objects of the quantum gates.ese quantum gates, at the time of their creation, take into account the dimensions of the acting register, among other factors, to construct the correct unitary automatically and push it into the Moment object, which is then pushed into the OperatorFlow object. is is done by the following lines of code: qc.h(0) qc.cx((0,1), plus=2) ese lines of code does the following: (1) e rst line of the above snippet detects the dimension of the acting register.Since the dimension of the acting register is 2, it shall construct a 2 × 2 unitary suitably as follows: Once done, it will assign it to the instance variable of the HGate object.(2) Before pushing the HGate object to the Moment list, Diet will create Identity gates tailored to the dimensions of the qudit register and push it into the Moment object.
(3) Next, Diet will create the CXGate object using a generated unitary and then place the Identity gates before pushing it into the Moment object.e state of the OperatorFlow object and its internals are given as follows in Figure 5.Note that the tensor and inner products are not evaluated at the time of creation.In order to begin the process of execution, we invoke the following lines of code: qc.measure all() qc.run() ese will add measurement operators, interpreted here as ag variables, to signify the end of all operations within a circuit.e circuit is then executed with the run() command, which begins calculating the tensor and the inner products using the backend speci ed. e nal output is as follows: Build elapsed: 0.0001919269561767578s Execution elapsed: 0.0010845661163330078s [ {'|000>': 0.7071067811865475}, {'|120>': 0.7071067811865475} ] e nal simulation result comes with loading-time and execution-time of the given circuit as shown in the above example.We also obtain the nal output quantum states as |000 and |120 with amplitude 0.7071067811865475 for the example circuit.With the increasing usage of quantum circuit description, QASM ( antum Assembly Language) [33] was introduced.rough Diet's QASM, one can declare the qubits or qudits and can describe the operations (gates) on those qubits or qudits to be run on Diet.For ease of understanding, a sample QASM program on Diet is presented as following:

end
In this QASM program, we declare 3 qudits, one is qubit (x0), one is qutrit (x1) and last one is qutrit (x2).NOT, Hadamard and phase gate are applied on qubit x0.en a generalized NOT gate with +1 is applied on qutrit x1 followed by a generalized NOT gate with +2 is applied on qutrit x2.A generalized CNOT with +1 is on x0 and x1 and a generalized CNOT with +2 is on x1 and x2.e most widely used QASM, .., OpenQASM [8] can be converted to the Diet's QASM form with the help of inbuilt lexer that is available in Diet to make it more user-friendly.
Benchmarking with state-of-the-art simulators.We have taken 21 benchmark circuits as an initial benchmarking, ranging from 3 qubit-qutrit to 7 qubit-qutrit in the form of QASM from [36] to verify our proposed simulator.To simulate all the 21 circuits, To oli gate is decomposed with intermediate qutrits as discussed earlier to get the algorithmic advantage.e simulation results are shown in Table 1.
e complete simulation time is based on three di erent parameters, (i)

1:14
Turbasu Cha erjee, Arnav Das, Subhayu Kumar Bala, Amit Saha, Anupam Cha opadhyay, and Amlan Chakrabarti prepossessing-time; (ii) loading-time; and (iii) execution-time.We run these circuits with two backends, Numpy and Sparse.e maximum run-time (loading-time + execution-time) of these circuits is 0.3 seconds, which is akin to [11], albeit the total simulation-time is much lower since the prepossessing-time is much higher for [11] as gates are needed to be de ned manually based on dimensions.e exact prepossessing-time of [11] can never be determined since it is manual and very complicated to be de ned mathematically.In our case, Diet being fully automatic, the prepossessing-time is negligible.We further simulate more larger circuits on our proposed simulator.e results are shown in Table 2. ese 17 medium-sized circuits are also taken from [36].It is exhibited through numerical simulation that these circuits are well executable with Sparse-cuda backend due to its dense nature as compared to other backends.It can also be noted that if these 38 qubit-only circuits without decomposition from Table 1 and 2 are simulated on Diet, the performence time is same as [4,11].We have employed the multiplication of 3 × 2 as an example to illustrate our simulator's e ciency in designing a quantum multiplier with intermediate qutrit.In Figure 6(a), in light of the preceding example, a multiplier circuit has been provided in accordance with [36], in which all the qubits are initialized with |0 .In this circuit, the rst four qubits ( 0 - 3 ) are the input qubits, where the rst two qubits ( 0 and  1 ) represent the number 3 by applying two NOT gates on them and the other two qubits ( 2 and  3 ) represent the number 2 by applying NOT gate on qubit  2 .Subsequently, using To oli gates, we conduct a multiply operation on these qubits and store the result in ancilla qubits ( 4 - 7 ).Now, using CNOT gates on an ancilla qubit  8 , we execute addition.Lastly, to obtain the resultant output of 3 × 2, we need to measure the qubits ( 4 ,  7 and  8 ).Additionally, each of the To oli gates shown in Fig. 6(a) are realized with the help of the intermediate qutrit method as shown in Fig. 6(b) to achieve asymptotic advancement of the circuit.Our numerical simulation on Diet also yields 3 × 2 = 6 appropriately.Our simulation results further show that if we use Numpy backend, then the loading-time is 34.631 (ms) and execution-time is 136.532(ms).For Sparse beckend, the loading-time is 17.157 (ms) and the execution-time is 1.865 (s) for the multiplier of 3 × 2 circuit, whose width is 9 and depth is 15.

Fig. 1 .
Fig. 1.Generalized To oli in -ary quantum systems -the control qudits in red circles activate on | − 1 and those in the blue do ed-circles activate on | .

7 Fig. 3 .
Fig. 3. High level overview of the Diet quantum simulator.This figure shows the di erent parts of the quantum simulator, namely the InitState objects, with the subscripts representing the dimensions of the respective qudits, and the QuantumGate objects with the  3 +2 gate being a generalized version of the CNOT gate with the target acting on a qutrit whereas the control being a qubit.The QuantumGate objects are enclosed using the Moment object and the Moment objects are enclosed within the OperatorFlow object.

3. 7 . 3
CUDABackend.CudaBackend interfaces with CuPy's Numpy Routine, using numpylike dense matrices, accessing the GPUs with the purpose of runtime speedup only.3.7.4CUDASparseBackend.CUDASparseBackend interfaces with CuPy's Scipy Routine, using sparse representation of the matrices and then accessing the GPUs with the purpose of runtime speedup.

Fig. 4 .
Fig. 4. A quantum circuit to be simulated using Diet Moment object is initialized and then the InitState objects are pushed into the Moment. is object is then pushed into the OperatorFlow object's stack.

13 Fig. 5 .
Fig. 5.The OperatorFlow and its internals before the circuit is compiled

controlled To oli Gate: We expand the generalized CNOT or 𝐼 𝑁𝐶𝑅𝐸𝑀𝐸𝑁𝑇 further to work over 𝑛 qudits as a generalized Multi-controlled To oli Gate or 𝑛-qudit To oli gate 𝐶 𝑛
We have used   in the 'rectangle' ( ) box to represent the generalized phase-shi gate.Generalized Hadamard Gate: e superposition of the input basis states is produced via the generalized quantum Fourier transform, also known as the generalized Hadamard gate,   .e generalized quantum Fourier transform or generalized Hadamard gate, produces the superposition of the input basis states.We have used   in the 'rectangle' ( ) box to represent the generalized Hadamard gate.e ( × ) matrix representation of it is as shown below : ) mod  , if  =  − 1, and = | | , otherwise, where 1 ≤  ≤  − 1.In schematic design of the generalized CNOT gate,   , , we have used a 'Black dot' (•) to represent the control, and a 'rectangle' ( ) to represent the target.'  + ' in the target box represents the increment operator.e ( 2 ×  2 ) matrix representation of the generalized CNOT   , gate is as follows:   , =   0  0  . . .0  0    0  . . .0  0  0    . . .0   , .For    , , the target qudit is increased by  (mod ) only when all  − 1 control qudits have the value  − 1, , where 1 ≤  ≤  − 1.In schematic design of the generalized Multi-controlled To oli Gate,    , , we have used 'Black dots' (•) to represent all the control qudits, and a 'rectangle' ( ) to represent the target.'  + ' in the target box represents the increment operator.e (  ×   ) matrix representation of generalized Multi-controlled To oli (MCT) gate is as follows: + | = |( + ) mod  , where 1 ≤  ≤  − 1.For visualization of the   + gate, we have used a 'rectangle' ( ). '  + ' in the 'rectangle' box represents the generalized NOT.Generalized Phase-Shi Gate   is the generalized phase-shi gate represented by a ( × ) matrix is as follows, with  =

Table 1 .
Loading and Execution Times (in milliseconds) of benchmark circuits on the Numpy and Sparse Backend

Table 2 .
Loading and Execution Times (in milliseconds) of some benchmark circuits on Sparse-cuda Backend