Optical Implementation of 2 × 2 Universal Unitary Matrix Transformations

Unitary operations are a specific class of linear transformations that have become an essential ingredient for the realization of classical and quantum information processing. The ability of implementing any n‐dimensional unitary signal transformation by using a reconfigurable optical hardware has recently led to the pioneering concept of programmable linear optical processor, whose basic building block (BB) must be correctly designed to guarantee that the whole system is able to perform n×n universal (i.e., arbitrary) unitary matrix transformations. Here, it is demonstrated that the present architectures of the BB do not fulfil the universal unitary property (at least) in 2×2 optical processors, limiting the number of unitary matrix transformations that may be generated. Aiming to solve this fundamental constraint, the theoretical tools required to analyze and design 2×2 universal unitary optical circuits and their corresponding BBs are presented. The consequences of this mathematical framework are explored, obtaining a simple route to implement different BB architectures, all of them guaranteeing a true universal unitary functionality in the resulting 2×2 optical processors. These findings may pave the way to revisit the design of high‐dimensional unitary optical processors, unleashing the potential of programmable integrated photonics technology.


Introduction
Unitary operators are linear transformations between complex vector spaces which preserve the norm of the input vector. [1][2][3][4] Such a kind of transformations have become a mathematical tool of paramount importance in fundamental and applied research, for instance, enabling the implementation of any linear mapping in a simple and elegant fashion, [1,5] explaining underlying aspects of quantum mechanics (such as the unitary nature of the time evolution of a particle in a closed Hermitian quantum DOI: 10.1002/lpor.202000473 system [6,7] ) and offering an energyefficient scheme to perform signal processing applications. [8][9][10][11] In particular, in recent years, unitary signal processors have experienced an increasing interest within the scope of programmable integrated photonics (PIP), since basic optical devices such as directional couplers (DCs), multimode interferometers (MMIs) and phase shifters (PSs) inherently induce analog unitary transformations in the electromagnetic waves (regardless of their classical or quantum nature), provided that the insertion loss of such devices is negligible. [12][13][14] In this vein, PIP has emerged as an entire new field of research in classical and quantum information processing, bringing the promise of alleviating the saturation of Moore's law in electronics and leading to a suitable platform to implement neuromorphic and quantum computation using current technology. [11,14] In this context, a myriad of works have been reported motivated by the idea of proposing a unique optical hardware, the so-called programmable linear optical processor, which must be able to generate any n-dimensional unitary transformation (an indispensable piece to implement any multidimensional linear mapping via the singular value decomposition). [5,11,[14][15][16] This requires, as a necessary and sufficient condition, that the transfer function of the system T n is an n × n arbitrary (or universal) unitary matrix, that is, the U(n) Lie group (an algebraic group composed by all the n × n unitary matrices along with the matrix multiplication operation [2,3] ) must be completely described by T n when varying the value of its entries (encoded by parameters) in the complex plane. [2][3][4]17] Along this line, since the proposed architectures of T n are constructed by combining basic building blocks (BBs), which perform 2 × 2 unitary transformations, [15,16,[18][19][20][21] it is of capital importance a correct design and implementation of this system (also termed in the literature as unit cell, [19,22,23] programmable photonic analog block [14,24] or reconfigurable beamsplitter [25] ) to guarantee the universal unitary property in T n for any n ≥ 2 (however, note that the BB does not have to perform universal unitary transformations by itself [20,21,25] ).
In this work, it is shown that the existing architectures of the BB [16,[19][20][21][22][23][25][26][27][28][29][30][31][32][33][34] lead to optical processors which do not satisfy the unitary universality (at least) for the case n = 2. To overcome this Figure 1. 2 × 2 universal unitary optical processor. Two (classical or quantum) optical wave packets A 1 and A 2 at the input waveguides may be linearly transformed into any pair of optical wave packets B 1 and B 2 at the output waveguides with the same global energy as that of the input via the relation (B 1 , B 2 ) T = T 2 (A 1 , A 2 ) T , with the system transfer function T 2 being an arbitrary matrix of U(2) (the yellow lines represent the matrix nature of the transformation). Since the system input e A = A 1ê1,⟂ + A 2ê2,⟂ and the system output e B = B 1ê1,⟂ + B 2ê2,⟂ can always be illustrated as two different points on the surface of the Bloch sphere, T 2 must be able to induce any (no observable) global phase shifting along with a rotation of an angle around an arbitrary unit vectorn which allows us to move from e A to e B , i.e., T 2 ≡ e j Rn( ). Given that the arbitrary nature ofn hampers the direct implementation of Rn( ) (red arrow) using mainstream integrated optical devices (PSs, DCs and MMIs), we take advantage of the Euler rotation theorem (Equation (2)) to describe Rn( ) as a concatenation of three rotations around two specific unit vectorsr 1 ,r 2 ∈ ℝ 3 (a particular example is depicted takinĝ r 1 =ẑ andr 2 =ŷ , see blue arrows).
fundamental restriction, we dig into the foundations of the U(2) group to analyze and design compact 2 × 2 universal unitary optical circuits and their corresponding BBs. To further unfold the potential of our results, we also uncover the minimal circuit architecture of the BB, defined as the BB implementation requiring the lowest number of basic devices.

Preliminary Concepts: 2 × 2 Universal Unitary Matrix Transformations
As illustrated in Figure 1, a 2 × 2 universal unitary optical processor is a system which linearly transforms two (classical or quantum) optical wave packets A 1 and A 2 at the system input into any pair of optical wave packets B 1 and B 2 at the system output with the same global energy as that of the input via the matrix relation (B 1 , B 2 ) T = T 2 (A 1 , A 2 ) T , with the system transfer function T 2 being an arbitrary matrix of U(2) (the superscript T accounts for the matrix transposition).
In complete analogy with a path-encoded quantum bit where the standard states |0⟩ and |1⟩ may be respectively identified with the unit vectorsê 1,⊥ andê 2,⊥ of Figure 1 (which account for the transverse direction of the electric field strength propagated by the upper and lower waveguides), the system input e A = A 1ê1,⊥ + A 2ê2,⊥ and the system output e B = B 1ê1,⊥ + B 2ê2,⊥ can always be illustrated as two different points on the surface of the Bloch sphere (also known as Poincaré sphere in classical polarization optics, although here there is no connection between the Cartesian axes of the Bloch sphere and the state-ofpolarization of light. The wave packets are always located inê 1,⊥ andê 2,⊥ ).
4 independent real parameters (degrees of freedom): , and two components ofn (the three components cannot be independent parameters sincen must satisfy the normalization condition |n| 2 = n 2 x + n 2 y + n 2 z = 1). Unfortunately, the arbitrary nature ofn hampers the direct implementation of T 2 using mainstream integrated optical devices (PSs, DCs, and MMIs). This technological difficulty can be circumvented by taking advantage of the Euler rotation theorem, which states that R̂n( ) can be factorized as a concatenation of three rotations around two specific unit vectorsr 1 ,r 2 ∈ ℝ 3 which must be linearly independent or, equivalently, nonparallel (note that the orthogonality is a particular case of such condition) [7,35,36] R̂n ( ) = R̂r Now, the four independent real parameters of T 2 are encoded in and the angles 1,2,3 ∈ [0, 2 ]. As shown below, selecting adequate values inr 1 andr 2 , we will be able to implement T 2 using PSs, DCs and MMIs (the implementation of each degree of freedom may require more than one basic device).
On the other hand, bearing in mind that the optical circuits associated to two different SU(2) systems (e.g., T can be cascaded by describing the transformations R̂r 1 ( 4 ) and R̂r 1 ( 3 ) using a single rotation matrix R̂r 2 (while the other four rotations remain invariant), the BB of a universal SU(2) processor can be defined via the following transfer function In this way, the final R̂r 1 transformation of Equation (2) may be employed to connect the BBs of adjacent SU(2) systems, providing simultaneously the input and output rotations required by their corresponding T 2 matrices (the same definition of T B can also be extrapolated to a more general scenario where the BBs are connected in a mesh configuration to construct n × n optical processors with n > 2, as discussed in Section 5).

Existing Proposals of the Building Block
Basically, two popular optical architectures of the BB have been proposed in the state of the art, sketched in Figure 2. [16,[19][20][21][22][23][25][26][27][28][29][30][31][32][33][34] Both structures are composed by fixed couplers (MMIs or symmetric DCs) and PSs. Each coupler implements a perfect 50:50 beamsplitter and each PS encodes a tunable real parameter of the BB transfer function. Considering only forward propagation (from left to right) and assuming negligible wave reflection, the following unitary matrices ( , ∈ [0, 2 ]) are the transfer functions of the BBs shown in Figure 2a,b, respectively (see Section S2 in the Supporting Information). The simplest unitary optical processor that may be built from each BB with the aim of performing universal SU(2) transformations is also depicted in Figure 2. In the former case (Figure 2a), the SU(2) processor is implemented by its own BB without resorting to other additional elements, as reported in ref. [30] In the latter case (Figure 2b), one PS must be added at the input of the BB to implement universal SU(2) operations, as explicitly indicated in ref. [25,34] (although in ref. [16] it is suggested that the additional PS must be included at the output of the BB, note that the same SU(2) processor is finally found because of its BB is the reversal version of the BB illustrated in Figure 2b). Accordingly, the corresponding transfer functions T 2a(b) of the above SU(2) systems are found to be T 2a = T Ba and T 2b = e j ∕2 T Bb R̂z(± ), with ∈ [0, 2 ] and the sign − (+) when the additional PS encoding the parameter is integrated in the upper (lower) path, see Figure 2b and Section S2 in the Supporting Information. In the following, let us analyze if T 2a and T 2b are arbitrary matrices of the SU(2) group.
It is straightforward to demonstrate that T 2a is not an arbitrary matrix of SU(2) because the number of real parameters in T 2a is 2, lower than the number of independent real parameters required by the dimension of SU(2). This reason directly precludes the universal unitary property in the optical circuit shown in Figure 2a. Using the Euler rotation theorem, we can find specific rotations of the Bloch sphere that cannot be generated with T 2a . For instance, performing the factorization T 2a = e j( + )∕2 R̂x(− ∕2)R̂z( − )R̂x(− ∕2), we observe that the rotations around the x-axis of the Bloch sphere are fixed to a specific angle (− ∕2 rad) and, consequently, we may infer that there exist R̂x transformations that cannot be generated, e.g., R̂x (2 ) = −I, where I is the identity matrix.
Contrariwise, T 2b is parametrized with the number of degrees of freedom ({ , , }) required by the dimension of SU(2). However, note that this is a necessary, but not a sufficient condition to guarantee the unitary universality because of such degrees of freedom must independently account for three arbitrary rotations of the Bloch sphere obeying the relation given by Equation (2). Along this line, the following noteworthy features of T 2b should be taken into consideration when the PS encoding the parameter is added at the upper input of the BB: 1) Since det(T 2b ) = e j( + + ) ≠ 1, T 2b ∉ SU(2). Therefore, we must impose the additional condition + + = 2 m (m ∈ ℤ) on the set of parameters { , , } to generate SU(2) transformations. As a consequence, the number of degrees of freedom in T 2b is reduced from 3 to 2, lower than the ones required by the group dimension. Here, we find the first proof which indicates that T 2b is not an arbitrary matrix of SU (2). Likewise, it should be noted that an external synchronization is required to guarantee that the PSs associated to , and fulfill the aforementioned condition. 2) Taking = 2 m − − , then it follows that T 2b = R̂z (− − )R̂y( − )R̂z( + ), with 3 ≡ − − , 2 ≡ − and 1 ≡ + being the rotation angles (see Section S2 in the Supporting Information). Since 1 = 2 − 3 , T 2b does not encode three independent rotation angles, as required by the Euler rotation theorem. This implies that T 2b cannot completely describe the SU(2) group. Specifically, T 2b cannot generate an arbitrary rotation around the z-axis of the Bloch sphere. In a universal SU(2) matrix of the form R̂z( 3 )R̂y( 2 )R̂z( 1 ), with three independent arbitrary rotation angles, we will be able to generate any rotation around the z-axis by taking 2 = 0 given that R̂z( 3 )R̂y(0)R̂z ( 1 ) = R̂z ( 3 )R̂z ( 1 ) = R̂z ( 3 + 1 ). In this way, varying the values of 1 and 3 , we will generate infinite different R̂z transformations. Nonetheless, setting 2 = 0 ( = ) in T 2b , we obtain a single R̂z transformation: T 2b = R̂z (− − )R̂z ( + ) = R̂z (0) = I. Hence, there exist rotations around the z-axis that cannot be described by T 2b , e.g., R̂z (2 ) = − I. The same results can equivalently be validated by means of two different ways: i) verifying that T 2b cannot be restated as a rotation matrix of the Bloch sphere R̂n( ) around an arbitrary unit vectorn ∈ ℝ 3 , (ii) demonstrating that the corresponding Lie algebra (2) is not completely generated when applying the logarithmic mapping to T 2b , see Section S2 (Supporting Information) for more details.
Alternatively, as commented above, the PS associated to may be integrated in the lower input of the BB. This scenario gives rise to similar conclusions to those of the above system. In such circumstances, we must also impose the same mathematical condition on the set of parameters of T 2b to induce SU(2) transformations ( + + = 2 m, m ∈ ℤ), reducing the number of degrees of freedom from 3 to 2 and, therefore, being inviable to generate specific SU(2) transformations, in particular, arbitrary rotations around the y-axis (see Section S2 in the Supporting Information).
So far, we have extensively discussed the main architectures of the BB reported in previous works, demonstrating that they lead to SU(2) processors where the unitary universality is not hold. In the next subsection, we will provide a simple route to overcome this fundamental constraint.

Universal SU(2) Optical Processor and Building Block
Taking a closer look at Equations (2) and (3), two interesting remarks can be inferred. First, note that an infinite number of different 2 × 2 universal unitary processors and BBs may be constructed by taking different rotation vectorsr 1 ,r 2 . Each dupla (r 1 ,r 2 ) will lead to an explicit architecture of the universal unitary processor and the BB, which will differ from other possible schemes in the number and the kind of integrated basic devices. Second, observe that there is a one-to-one correspondence between T 2 and T B once the rotation vectors are selected. Hence, the design of the BB suffices to specify the optical circuit of the universal U(2) (or SU(2)) processor based on the same rotation vectors.
In order to uncover a compact implementation of the BB, we will use the following design criteria: i) we must select rotation vectors which give rise to an optical circuit of the rotation matrices R̂r 1 ( 1 ) and R̂r 2 ( 2 ) encompassing the lowest number of basic devices (N), ii) these basic devices must perform unitary signal operations and must be highly compactable and scalable in integrated photonics using current technology. The resulting scheme of the BB fulfilling the above criteria will be referred to as the minimal circuit architecture (MCA). Basic optical devices satisfying the second criterion are PSs, DCs and MMIs, whose transfer matrices may induce rotations around the Cartesian axes of the Bloch sphere, as detailed below. Accordingly, the natural option to design the BB-MCA isr 1 ,r 2 ∈ {x,ŷ,ẑ}.
The matrix R̂x( ) can be implemented by using a tunable synchronous DC, with PSs integrated in its arms which allow us to change the mode-coupling coefficient of the DC inducing the same effective index modulation in the fundamental mode of each waveguide. The rotation angle may be controlled via ( = 2 L, where L is the length of the PSs, see Chapter 2 of ref. [14] for a more technical discussion about the tunable DC). Hence, the number of basic devices required to implement R̂x( ) is N̂x = 3 (1 DC and 2 PSs, see Figure 3a). Furthermore, the matrix R̂z( ) is the transfer function of a system composed by two parallel uncoupled waveguides, each of which integrates a PS inducing a phase shifting of ± ∕2 rad in the optical signals. Thus, R̂z( ) requires N̂z = 2 basic devices (2 PSs, see The former can be implemented by means of two 50:50 beamsplitters to induce the R̂x(± ∕2) transformations (using MMIs or symmetric DCs) and 2 PSs to generate the R̂z(− ) matrix (N̂y = 2 + 2 = 4). The latter requires 4 PSs to perform the R̂z(± ∕2) transformations along with the optical circuit shown in Figure 3a to produce the R̂x( ) matrix (N̂y = 4 + N̂x = 7). As seen in Figure 3c, the former factorization encompasses a lower N̂y but the latter leads to the most compact architecture of R̂y( ) because of only a single coupler is required.
Once we have discussed the optical implementation of the basic rotation matrices, let us retrieve the initial goal of this subsection: the design of the BB-MCA. In line with the first design criterion indicated above, we must select rotation vectorŝ r 1 ,r 2 ∈ {x,ŷ,ẑ} which give rise to a BB architecture integrating the lowest number of basic devices (N = N̂r 1 + N̂r 2 ). Therefore, it is straightforward to infer that the most compact design of the BB requires to selectr 1 =ẑ andr 2 =x , with N = N̂z + N̂x = 5. Figure 4a depicts the optical circuit of the BB-MCA, whose transfer function is given by the matrix:  The architecture of the universal SU(2) processor is completed by adding the optical circuit associated to the rotation matrix R̂z( 3 ) at the output of the BB (Figure 4a). As a result, we obtain a 2 × 2 system with transfer function T 2 = R̂z ( 3 ) T B = R̂z ( 3 )R̂x( 2 )R̂z( 1 ) being able to perform any SU(2) transformation. Here, in contrast to the scheme of Figure 2b, it is not required an external synchronization between the PSs to perform SU(2) transformations given that det(T 2 ) = 1, ∀ 1,2,3 ∈ [0, 2 ]. The same optical circuit can be employed to perform universal U(2) transformations by including two additional PSs at the outputs (or inputs) in order to implement a global phase shifting e j . Along this line, it should be noted that can optionally be encoded along with 3 (or 1 ) in the corresponding PSs, see Figure 4b. However, in such a case, a mutual synchronization (e.g., via software [37,38] ) is essential between these PSs to be able of tuning 3 (or 1 ) and independently.
As commented above, additional BB candidates may be explored by selecting different rotation vectors in Equation (3). As an illustrative example, in Figure 4c it is shown a BB based on fixed couplers (50:50 beamsplitters implemented via MMIs or symmetric DCs), whose transfer function is directly obtained by replacing R̂x( 2 ) with R̂y( 2 ) in Equation (7). Nevertheless, the manufacturing of this BB will approximately require the double footprint than that of the BB depicted in Figure 4a because of a twice number of couplers must be integrated in this BB architecture.

Conclusion
Overall, these results demonstrate that the programmable linear optical processors built from the existing architectures of the BB do not furnish a 2 × 2 universal unitary functionality and, in order to circumvent this fundamental limitation, we blaze a trail for designing BB schemes that guarantee the unitary universality in 2 × 2 optical processors. The theoretical framework presented here is a first research step to: i) establish the fundamentals of analog optical gates and, [11,14] ii) build a systematic approach for the design of integrated photonic signal processing devices and subsystems that can be directly translated into commercial development kits. With the obvious differences, we expect that the future availability of such an approach can render similar benefits as those ripped by the science and technology of electronic integrated circuit design through the exploitation of the principles of Boole's algebra.
By virtue of the Euler rotation theorem, it is shown that the optical architectures of the BB proposed in previous works (Figure 2) lead to 2 × 2 optical processors that cannot describe three independent rotations around two nonparallel axes of the Bloch sphere and, as a result, such schemes cannot induce arbitrary SU(2) transformations. In contrast, in our proposal (Figure 4), three independent rotations around two Cartesian axes of the Bloch sphere are correctly encoded by the transfer function of the optical circuits, an arbitrary matrix of SU(2), guaranteeing the complete description of this Lie group while leveraging the existing mainstream integrated optical components. Moreover, it is worth mentioning that the kind and the number of basic devices required to implement a specific architecture of the BB can be accommodated by using different rotation vectors in Equation (3). Table 1 summarizes the most suitable and compact BB schemes that can be implemented by using PSs and couplers (MMIs, symmetric DCs or tunable synchronous DCs). In the same line, other basic devices such as tunable asynchronous DCs (inducing rotations around a reconfigurable arbitrary unit vectorn of the Bloch sphere [14] ) and optical resonators (that may perform the same functionality as that of couplers and PSs [39] ) might be explored to construct novel structures of the BB. In any case, a BB architecture must be built from basic devices with negligible insertion losses or, otherwise, the unitary nature of the transformations will not be preserved, leading to energy-inefficient optical processors requiring amplification stages.
On the other hand, it is natural to wonder about the possibility of designing a more compact BB than the architecture proposed in Figure 4a by using a different unitary factorization from the Euler rotation theorem. Diverse 2 × 2 unitary factorization techniques can be found in the mathematical literature, [40][41][42][43][44] but all of them lead to optical schemes of the BB integrating a higher number of basic devices than in our proposal (see Section S3 in the Supporting Information).
Finally, note that these results may establish a starting point to revisit the n × n unitary universality for the case n > 2 in current programmable linear optical processors (based on well-known optical architectures of high-dimensional unitary Table 1. Suitable BB architectures proposed in this work and number of integrated basic devices. The design of the basic rotation matrices (Equation (6)) is also included for clarity in the results. A 2 × 2 universal unitary optical system may be built from these BBs by adding a R̂z ( 3 )  matrix transformations, where the BBs are connected in a mesh configuration [15,16] ) by combining the concept of multidimensional rotation matrices [4] along with different factorization algorithms of the U(n) Lie group. [18,[40][41][42][43][44] In such a scenario, our definition of the BB (Equation (3)) also guarantees the unitary universality of the U(2) processors integrated in the whole system, provided that the first rotation in T B is performed around the z-axis of the Bloch sphere (note that the PSs that implement this rotation are integrated in uncoupled waveguides (Figure 3b) and, consequently, such a scheme can set the connection of a given BB with two independent BBs of the mesh and may simultaneously generate a R̂z transformation along with a global phase shifting).

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.