Evaluation of a stacked-FET cell for high-frequency applications (invited paper)

This paper presents the design, electromagnetic simulation strategies and experimental characterisation of a two-stage stacked-FET cell in a 100 nm GaN on Si technology around 18.8 GHz, suited for Ka band satellite downlink applications. A good agreement is found between the electromagnetic simulations and the measured performance on the manufactured prototype, thus demonstrating that a successful voltage combining architecture can be obtained in the frequency range of interest with the selected topology, based on a symmetric fork-like connection between the transistors. This proves the effectiveness of an appropriate electro-magnetic simulation set-up in correctly predicting the crosstalk, which typically affects this structure, leading to a correct stacking operation


| INTRODUCTION
2][3][4] The currently available technologies that allow electronic circuits to operate at higher frequencies, such as Silicon (Si), Gallium Arsenide (GaAs) and Gallium Nitride (GaN), do not provide enough power at device level to comply with the communication systems requirements for power amplifiers (PAs).][7] There are two ways of combining power at MMIC level, the parallel (current summing) and series (voltage summing) connection of several devices. 4Current combining is widely adopted for its simplicity, but has some clear disadvantages from the design standpoint, especially in terms of impedance level, which theoretically decreases by a factor N with respect to the optimum impedance of the individual device, when N devices are combined.Additionally, the combining efficiency (and thus the overall PA gain) tends to decrease when a large number of devices is combined, due to suboptimal current phase alignment along the chain.Voltage combining, often referred to as stacking, has the advantage of increasing the impedance level (theoretically by a factor N, with respect to the optimum impedance of the individual device, when N devices are stacked) and the power gain (which in linear units is also N times higher than that of the individual transistor cell).Compared to parallel combining, stacking devices has the additional advantage of compactness, because the devices can be connected one to the other by minimum-size structures, or even directly with no additional passive structure, but this comes at the price of an increased design complexity and more critical stability issues.The principle scheme of a N-stage stacked amplifier is shown in Figure 1. 8 The first device at the bottom (input) of the stack is in common source (CS) configuration, whereas all the following (N − 1) stages are made up by identical transistors in a pseudo-common gate (CG) configuration, with the main RF signal path from the source to the drain terminal, but with an additional signal path from the source to the gate terminal, which is connected to a finite capacitance.This distinguishes a stacked PA from a cascode, whose gate capacitors are instead short circuits at RF frequencies, and makes it more robust to breakdown-related failures, since the gate voltage can follow the RF variations.The dc current is the same along the whole stack, while the dc drain voltage is N times the individual supply voltage of each transistor.The RF current is instead approximately constant along the stack but not exactly so, due to the intentional RF conduction through the gates.However, this contribution is typically lower than the drain-source one, and the theoretical optimum load resistance of the i-th stage can be expressed as i × R opt , where R opt is the optimum load of the CS stage alone.When the operating frequency is much lower than the cut-off frequency of the devices, reactive parasitic effects are negligible and the optimum loading condition can be ensured by properly choosing the value of the gate capacitors C gi .At high frequency, however, reactive compensation is also required, which can be implemented in different ways 9 and is generically indicated in the principle scheme of Figure 1 by the MN block between adjacent stages.The stacked topology has initially been explored in Si technology, and then exported in the compound semiconductors technologies with low breakdown voltage, such as GaAs.Recently, it is becoming interesting also in GaN, since the technology scaling to reach increasingly higher frequencies has brought about a reduction of the operating voltage and, consequently, of the optimum load impedance.Stacked-FET cells can be employed in combined, Doherty and distributed PAs. 10,11tacked-FET structures at high frequency are easily affected by signal misalignment along the structure due to dispersive parasitic effects, causing significant power and gain reduction.The simulation of stacked structures is complicated, and often affected by crosstalk, which is not accurately predicted by circuit-level simulations only.Electromagnetic (EM) simulations are mandatory and finding a reliable set-up is not trivial.For this reason, it is convenient to design, realise and characterise the stacked cell and perform load pull on it, to check the predictive capabilities of the model and, when possible, to tune the load to be then realised by the matching networks in the final PA. 12 In this paper, we present the simulation and experimental characterisation of a two-stage stacked cell, which has been designed at 18.8 GHz, targeting the downlink of the Ka satellite band (17.3-20.3GHz).Initially, the individual active device is selected and characterised through load pull, to assess its non-linear foundry model in different bias conditions.The differences observed between these measurements and simulations are accounted for in the design of the stacked cell, whose optimum loads are selected as a trade-off between simulated and measured ones for the individual device.For the connection of the transistors, a symmetric and compact topology is selected.This topology is known to be affected by some degree of crosstalk, 13,14 which at higher frequency may completely hamper the design.Here, we compare the results from different EM simulation set-ups with measurements, and we conclude that it is possible to set up EM simulations in a relatively simple way to accurately predict the measured performance and it is possible to design a stacked cell at this frequency with the expected performance, correctly predicting crosstalk and other parasitic effects.As a case study, a 100 nm gate length GaN on Si process is adopted.The typical power level per MMIC achievable in GaN is around 5-10 W, thanks to the power density of around 2-4 W/mm that allows to reach such power levels combining a reasonably low number of devices. 15Given the relatively high operating voltage, FET stacking in GaN tends to be attractive for two devices, not more.Foreseeing its application to a PA whose final stage is made up by two stacked cells (either combined in parallel, or in a Doherty configuration), with a target output power of around 5 W, the individual stacked cell outputs around 3 W, also accounting for the losses in the power combiner.Therefore, one of the largest peripheries for which the non-linear device model is validated, namely the 8 × 100 μm, has been selected as individual cell to design a two-stage stacked structure.The design is carried out at 18.8 GHz, targeting operation from 17.3 GHz to 20.3 GHz, corresponding to the downlink of the Ka satellite band.The bare device, shown in Figure 2, has been characterised in a load pull system at 18.8 GHz with the selected bias point: V D = 11 V, I D = 50 mA.
The resulting output power and efficiency contours at 18.8 GHz are compared to the simulated ones in Figure 3.While the position of the measured efficiency contours is very well predicted by the non-linear device model, although there is a slight discrepancy (around 3%) in the value, the output power contours show an appreciable difference.The optimum load at around 1 dB gain compression is shifted towards higher impedances according to measurements, with a maximum output power of 33.6 dBm, that is, 1.1 dB lower with respect to the value predicted by simulations (34.7 dBm).The associated efficiency is 55.3% in simulation and 57.6% in measurements.The simulated and measured optimum loads for power are (8.5 + j4.3)Ω and (12 + j6)Ω, respectively.
Figure 3(C) shows the comparison between simulated and measured performance at 18.8 GHz on the optimum load for power determined through measured load pull ([12 + j6]Ω).The saturated efficiency predicted by simulations (61.6%) is 4% higher than the measured one in this condition, while the simulated saturated power (34.4 dBm) is still higher, though slightly sub-optimum.
Finally, in order to predict how much the optimum load moves with frequency in the bandwidth targeted for this design, load pull simulations have also been performed at the band edges (17.3 GHz and 20.3 GHz). Figure 4 shows the power and efficiency contours at the three frequencies of interest, and the optimum load trajectory is summarised in Figure 5.It stands that the optimum load for power is almost constant with frequency on the target band, whereas the one for efficiency varies more.For this design, the stacked cell will be optimised for power, thus allowing to adopt a constant load during the design phase.Such value is a trade-off between the simulated and measured prediction, that is, (10 + j5.5)Ω.

| Topology choice
The structure adopted to connect the drain of the CS stage to the source of the CG stage at layout level is a minimumsize fork, with short and relatively thin arms, that exits from the CS drain and reaches both ends of the source pad of the CG, in a symmetrical fashion.This solution, shown in Figure 6, offers both symmetry and compactness, which are both attractive features, especially for high-frequency MMIC implementations, but is also known to be affected by some degree of crosstalk between the drain-to-source connection (main signal path) and the gate-to-ground path through the gate capacitor C g2 (refer to the schematic of Figure 1).An analogous connection topology has already been adopted in Reference 11 and in Reference 14 to assess the feasibility of FET stacking at 36 GHz in the same technology adopted here.It was proved that, at such high frequency, this fork structure with underlying gate line is affected by a crosstalk so heavy that it completely hampers the stacking principle of operation.Alternative connection strategies do exist, such as the ones in References 13,16,and 17.The former is claimed to be less affected by crosstalk, as it relies on an asymmetric and unilateral drain-to-source connection, which requires no overlapping with the gate line of the CG stage.However, this topology is less compact and introduces a usually bulkier passive network between the active devices, which brings along more significant parasitic effects to be compensated to avoid signal-phase misalignment along the stack.The latter is somehow similar to the one adopted here, in that it also requires the drain-to-source line to pass over the gate line with air bridges, and also requires modification of the standard device layout provided by the foundry (mainly the removal of the gate pad) to accommodate for straight air bridges rather than resorting to the fork structure.For the above-mentioned considerations, it was chosen to assess the feasibility of the fork structure around 18.8 GHz.

| Design strategy
For the chosen interconnection between the devices, shown in Figure 6, size constraints are dictated by the maximum current it has to withstand, while trying to maintain compactness.The common gate capacitor C g2 and the stabilisation resistor R g2 (see Figure 11 for the complete schematic) are splitted symmetrically at the top and bottom of the devices.They are reached from the gate thanks to lines passing under bridges on the fork that connects the CS drain to the CG sources.The fork itself, together with the C g2 -R g2 series, provides the required interstage matching and ensures that each transistor stage works on its optimum load.The optimum load for the individual transistor cell Z opt,CS has been chosen based on the load pull results previously presented, leading to an overall optimum load of the stacked cell equal to (10 + j10)Ω.Note that this deviates from the value 2Z opt,CS predicted by the simplified theory, 9 and it has been determined through optimisation.The bottom side of the gate line is then connected to the dc feed path for V G2 , containing series resistors that contribute to enforcing low-frequency stability, whereas V G1 and V DD will be provided through the RF probes by means of suitable bias tees.The small signal stability is also ensured by means of a series RC-shunt RL stabilisation network at the input of the CS stage.

| Simulation set-ups
A critical issue when designing stacked PAs at high frequency is the accurate simulation of the very closely spaced passive structures, to account for their mutual interactions and coupling.This requires extensive use of EM simulations, but it also poses the question of what is the most truthful set-up, in terms of port calibration as well as number of circuit element grouping.For instance, in the case of the fork interconnection adopted here, the calibration of the ports connecting to the source of the CG stage on the two sides is an issue. 14If one wants to simulate the whole fork-gate linegate RC block together, the two ports S1_CG and S2_CG face each other, thus preventing the adoption of TML calibration since the transmission line connected to each would intersect the other, as shown in Figure 7(A).For cases such as this, the Keysight ADS software provides a calibration type called zero-length TML (TML0), which assumes a concentrated (zero-dimensional) element is connected to the port rather than a distributed one.In this case, given that an active device is going to be connected to S1_CG and S2_CG, it is not trivial to determine whether TML or TML0 would be most suitable.Therefore, to calibrate ports S1_CG and S2_CG, either TML0 is adopted or the passive structure must be somehow split into several blocks to be simulated separately, in such a way that appropriate line de-embedding and TML calibration can be performed at all ports.The latter, however, has the drawback of failing to account for some degree of coupling between the different elements, which is known to be quite relevant in these architectures.
In this design, during the optimisation of the passive structures, two simulation set-ups have been compared (see Figure 7).The set-up of Figure 7(B) is based on the designers' experience, since it seems to provide a reasonable tradeoff between accounting for coupling and allowing for TML port calibration.The adopted strategy consists in grouping as many elements together but leaving out the 90 bends close to ports S1_CG and S2_CG.This is compared to the F I G U R E 6 Schematic and layout of the device interconnection in the stacked cell simulation set-up in Figure 7(C), which groups all the interstage passive elements in a single block and uses TML0 calibration for all the ports connected to an active device node.The foundry provides the floating-source three-terminal non-linear models for the selected transistor.Therefore, it is not necessary to resort to any source via de-embedding to extract the three-terminal model from the CS one. 18Figure 8 compares the simulated and measured scattering parameters on 50 Ω with the two set-ups.No significant difference is visible in S 11 , S 12 , S 21 , while a discrepancy exists mainly in S 22 beyond 5 GHz.This mainly affects the stability of the cell, which turns out to be more critical according to set-up TML0 than TML.Since stability is known to be a critical issue for stacked PAs, it is chosen to enforce unconditional stability in the worst case, that is, to adopt the TML0 set-up.
Load pull simulations are then performed on the stacked cell at 17.3, 18.8 and 20.3 GHz to determine the optimum load, which inevitably deviates from the theoretical one.They are shown in Figure 9, and the optimum loads versus frequency are summarised in Figure 10.It can be noticed that the optimum load of the stacked cell is more heavily frequency dependent than the optimum load of the individual transistor (see Figure 4), especially when the optimum for power is considered.Furthermore, the optimum load at 18.8 GHz determined through load pull ([9.5 + j7]Ω) is slightly different from the one initially considered during the design phase.

| MEASUREMENTS
The designed cell has been manufactured and characterised in small and large signal conditions.Its schematic and microscope photograph are shown in Figure 11.The scattering parameters have been measured by adopting a Keysight E8361A PNA Network Analyser, on 50 Ω, from 500 MHz to 30 GHz in different bias conditions.Figure 12 shows the comparison between simulated and measured scattering parameters for a class AB bias point: I D = 50 mA, V G2 = 9 V V D = 22 V.A very good agreement can be observed up to the working frequency band in terms of S 11 , S 21 , S 12 , whereas the simulated S 22 correctly captures the measured low-frequency behaviour and deviates more visibly with increasing frequency.
The large signal characterisation has been performed by adopting a non-linear bench with load pull capabilities up to 18 GHz adopting an active loop, and up to 26.5 GHz with a passive tuner.As an initial assessment, a power sweep at 18.8 GHz is performed on 50 Ω, without driving the cell in strong compression to avoid damages to the devices, since this loading condition is very far from the optimum one and will therefore result in a strong phase misalignment of the current and voltage signals, as well as in a different load line slope.The adopted class AB bias point is I D = 50 mA, V G2 = 9 V and V D = 22 V.The simulated and measured results are compared in Figure 13.The behaviour from low drive is well predicted.However, while there is a good agreement in the dc drain current, the measured power and efficiency are lower than the simulated ones at medium power drive, as expected from the preliminary evaluation of the individual transistor, where the model was shown to overestimate the performance near saturation.
The present experimental set-up limitations do not allow to perform active load pull at the desired frequency on the cell, thus preventing the characterisation on the optimum load predicted by the simulations.Using a passive load pull system, due to losses, the closest impedance to the expected optimum load ([9.5 + j7]Ω at 18.8 GHz, as determined through simulated load pull on the complete cell) that can be synthesised at 18.8 GHz is (22 + j9)Ω.The CW characterisation is therefore performed on this load, and compared to the simulations on the same load in three working points: They only differ by the gate bias voltage of the common source stage, whereas the gate bias voltage of the common gate stage is fixed to 9 V in all cases, because it is optimised for operation near saturation.The comparison of the simulated and measured performance is reported in Figure 14.Similar to what already observed on 50 Ω, the dc drain current is very well captured by simulations, as well as the output power, gain and efficiency at low and medium drive power.The saturated power is overestimated by simulations by around 0.8 dB in class AB and B, and correspondingly the efficiency results 8%-10% higher.This is in line with the performance of the individual device, whose non-linear model estimated a saturated power and efficiency 1 dB and 4% higher, respectively, than the measured ones on the same load (see Figure 3(C)).Furthermore, an additional factor contributing to this discrepancy to some extent might be the heating of the devices due to their very close proximity, which cannot be predicted by a standard electrical or even electromagnetic simulation.This is also suggested by the lower difference in power and efficiency visible in class C, where the simulated and measured saturated power is basically the same and the corresponding efficiency differs by less than 5%.
Finally, an a posteriori verification has been made on the predictive capabilities of the two EM set-ups initially considered.To investigate their effect on the simulated large signal performance, the power sweeps on the loading condition used in measurements at 18.8 GHz have been compared in the two cases (see Figure 15).The achieved power level is comparable, while the efficiency is lower in the selected case (TML0) than with the TML set-up.This can be explained by observing the loading conditions of the two devices in the stack: switching from TML0 to TML, both the common source and the common gate devices see a loading impedance that ensures roughly the same power but is on a different efficiency circle (5% lower), as shown in Figure 16.This suggests that the discrepancy between simulated and measured performance at saturation is not significantly affected by the choice of EM set-up.

| CONCLUSION
The paper presents the design and EM simulation strategies, and the experimental validation, for a compact stacked cell working at high frequency.It is demonstrated that a successful design can be obtained around 18.8 GHz with the selected topology, based on a symmetric fork-like connection.Despite its known critical issues in terms of crosstalk and stability, correct stacking operation can be ensured provided that an appropriate EM simulation set-up is employed.The resulting manufactured prototype achieves performance reasonably close to the best achievable with the selected technology in the selected frequency range.

[
Correction added on 31 March 2021, after first online publication: the article title has been corrected in this version.]

1
Scheme of a N-stage stacked PA

F
Microscope photograph of a 8 × 100 μm bare device I G U R E 3 Comparison of simulated and measured load pull on a 8 × 100 μm bare device at 18.8 GHz, 25 C: (A) output power and (B) efficiency contours.Simulated and measured power sweeps on the measured optimum load for power (12 + j6)Ω (C)

4
Simulated load pull on a 8 × 100 μm bare device at 17.3, 18.8 and 20.3 GHz, 25 C, (A) output power and (B) efficiency contours at 1 dB gain compression U R E 5 Optimum loads for output power and efficiency versus frequency determined through simulated load pull on the 8 × 100 μm bare device at 25 C

8 F 1 0
U R E 7 Unfeasible TML calibration set-up for the fork structure (A), and adopted EM simulation set-ups: (B) TML and (C) TML0 Simulated scattering parameters of the stacked cell from 100 MHz to 40 GHz adopting the two different EM simulation setI G U R E 9 Simulated load pull on the stacked cell at 17.3, 18.8 and 20.3 GHz, 25 C, (A) output power and (B) efficiency (bottom) contours at 1 dB gain compression Simulated load pull on the stacked cell at 25 C, optimum loads for output power and efficiency versus frequency F I G U R E 1 1 Scheme and microscope photograph of the parallel stacked cell

2
Simulated and measured scattering parameters of the stacked cell from 500 MHz to 30 GHz in class AB (I D = 50 mA, V G2 = 9 V V D = 22 V) Input power, dBm F I G U R E 1 3 Comparison of simulated and measured CW performance on the stacked cell in class AB (I D = 50 mA) at 18.8 GHz, 25 C on 50 Ω class AB (I D = 50 mA,

5 6
Simulated power sweeps of the stacked cell terminated on (22 + j9)Ω according to the two simulation set-ups Loading conditions of the two transistors according to the two simulation set-ups on (22 + j9)Ω compared to the simulated (A) power and (B) efficiency load pull contours at 18.8 GHz, 25 C