Anomalous Loss Reduction Below Two-Level System Saturation in Aluminum Superconducting Resonators

Superconducting resonators are widely used in many applications such as qubit readout for quantum computing, and kinetic inductance detectors. These resonators are susceptible to numerous loss and noise mechanisms, especially the dissipation due to two-level systems (TLS) which become the dominant source of loss in the few-photon and low temperature regime. In this study, capacitively-coupled aluminum half-wavelength coplanar waveguide resonators are investigated. Surprisingly, the loss of the resonators was observed to decrease with a lowering temperature at low excitation powers and temperatures below the TLS saturation. This behavior is attributed to the reduction of the TLS resonant response bandwidth with decreasing temperature and power to below the detuning between the TLS and the resonant photon frequency in a discrete ensemble of TLS. When response bandwidths of TLS are smaller than their detunings from the resonance, the resonant response and thus the loss is reduced. At higher excitation powers, the loss follows a logarithmic power dependence, consistent with predictions from the generalized tunneling model (GTM). A model combining the discrete TLS ensemble with the GTM is proposed and matches the temperature and power dependence of the measured internal loss of the resonator with reasonable parameters.


Introduction
Two-dimensional (2D) planar high internal quality factor (Q i ) superconducting resonators have been widely fabricated and investigated in recent times for applications such as single photon detectors, [1] kinetic inductance detectors, [2] and quantum buses in quantum computing technology [3].Tremendous progress has been made in terms of design, fabrication and measurement techniques, which has led to orders of magnitude increase in coherence time and improved quantum fidelity of the quantum gates [3][4][5].In microwave measurements, although all qubits are operated at an excitation frequency well below the superconducting gap energy, microwave photons can be absorbed by quasiparticles, which in turn interact with the phonon bath, creating non-equilibrium distributions of both quasiparticles and phonons [6][7][8][9].This process affects the population of quasiparticles, in addition to pair-breaking process induced by cosmic rays, [10] higher order microwave harmonics, and stray infrared radiation [11][12][13].These nonequilibrium quasiparticles are one limiting factor on superconducting resonator Q i and qubit coherence, which can reduce both the qubit relaxation time (T Qubit

1
) and the coherence time (T Qubit 2 ) [14].Another comparable loss mechanism due to two-level systems (TLS) is also ubiquitous in 2D superconducting resonators [15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30].Despite the elusive microscopic origin of the TLS (some recent works suggesting hydrogen impurities in alumina as one candidate for TLS [31,32]), TLS can be simply modeled as electric dipoles that couple to the microwave electric field.In general, TLS are abundant in amorphous solids and can also exist in the local defects of crystalline materials.They are found in three kinds of interfaces in the superconducting resonators: the metal-vacuum interface due to surface oxide or contaminants; the metal-dielectric substrate interface due to residual resist chemicals and buried adsorbates; and the dielectric substrate-vacuum interface with hydroxide dangling bonds, processing residuals, and adsorbates [33].To address these issues, different kinds of geometry of coplanar waveguide (CPW) structure have been proposed and fabricated, with more care given to the surface treatment to alleviate the TLS losses [34].For example, a trenched structure in the CPW helps to mitigate the metal-dielectric TLS interaction with the resonator fields [35,36].These efforts have improved the 2D resonator intrinsic quality factor to more than 1 million in recent realizations of high-Q i resonators [35][36][37][38][39]. Nevertheless, TLS still exist even in extremely high Q i 3D superconducting radio frequency cavities used in particle accelerator applications [40].Recently, other sources of TLS loss have been proposed based on quasiparticles trapped near the surface of a superconductor [41].Clearly TLS loss is a universal issue in superconducting resonators.However, at microwave frequencies, this loss was long thought to be constant under low microwave power and low temperature below TLS saturation [16-18, 28, 42, 43].Measurements in this regime were limited due to the constraints of noise levels in both electronic equipment and the thermal environment.Therefore, experimental investigation of TLS at low temperatures and microwave excitation are important, and would assist the superconducting quantum information community to understand its effect on operating quantum devices.
We have designed a 2D half wavelength resonator with a tapering geometry that gradually shrinks the signal line width w from 50 µm down to 1 µm at the center where many three-junction flux qubits could be hosted and strongly coupled for the study of the collective behavior of quantum meta-materials.Analogous to cavity quantum electrodynamics, qubits serve as artificial meta-atoms with mutual coupling [44][45][46][47][48] and can be read out through the dispersive frequency shift of the cavity [49][50][51].Theoretical publications discussing the physics of qubit arrays coupled to the harmonic cavities predict a number of novel collective behaviors of these meta-atoms [52][53][54].In this paper, we report our finding on the TLS loss in the low power and low temperature limit of this particular design of capacitively-coupled half-wavelength resonator, without the qubits.The technique of very low power microwave measurement with low noise to enhance the signal-to-noise ratio (SNR) is critical for measuring this TLS behavior.

Experimental Methods
Aluminum (Al) half-wavelength (λ/2) CPW resonators on sapphire substrates were designed with a center line width w = 50 µm and spacing s = 30 µm (the distance between center conductor line and ground plane as illustrated in Fig. 1(b) to maintain the characteristic impedance near 50 Ω in the meander part.At the center of the resonator a tapering structure narrows the center line width down to w = 1 µm and spacing to s = 12 µm, which gradually increases the characteristic impedance to 100 Ω

Experimental Data
The measured transmitted signal (S 21 (f )) has a fundamental (λ/2) resonance peak around f = 3.644 GHz at the fridge base temperature when sweeping the frequency, f .The complex S 21 (f ) signal is fitted to an equivalent circuit model of a two-port resonator capacitively coupled to external microwave excitation [9,55].
where |S 21,in | and |S 21,out | are the net loss or gain in the transmission of the input and output line, respectively.Q L is the loaded quality factor.Q c is the coupling quality factor representing the dissipation to the external circuit, i = √ −1, f 0 is the resonance frequency of the half-wavelength (λ/2) CPW resonator, ϕ is the phase and C 0 is an offset in the complex S 21 plane due to background contributions.[55] The internal quality factor, Q i , inversely proportional to the the internal loss, δ = Q −1 i , is extracted from the identity 1/Q L ≡ 1/Q i + 1/Q c .The absorbed power P ab of the resonator is characterized by the average number of circulating microwave photons in the cavity on resonance, which can be estimated using the approximation [9,56] for a two-port device, where ℏ is the reduced Planck constant, and ω 0 = 2πf 0 is the angular frequency of the resonance.i ) at its first harmonic frequency of a Aluminum co-planar waveguide resonator on sapphire substrate measured at different circulating photon numbers ⟨n⟩.Some of the error bars are smaller than the data point such as those for the high power and temperature measurements.
Figure 2(a) illustrates the temperature dependence of the fractional resonant frequency shift from the resonance frequency at lowest temperature, (f 0 (T ) − f 0 (6mK))/f 0 (6mK), for different circulating microwave photon numbers inside the CPW resonator, where 6 mK is the measured fridge base temperature.The resonance frequencies start at their maxima at the fridge base temperature and then show local minima around 60 mK.This phenomenon seems to be independent of the average circulating photon number and can be explained by the standard tunneling model of TLS [43].Upon further increasing the temperature above 150 mK, the resonance frequencies quickly decrease due to the thermal quasiparticles, which increases the real and imaginary parts of the surface impedance of the superconducting resonator.The inset focuses on the low temperature regime and shows a very small power dependence that is qualitatively similar to the strong field correction to the frequency shift in STM proposed by Gao, which predicts smaller frequency shifts for higher power [57].
The temperature dependence of the measured internal loss is shown in Fig. 2(b).For high power measurements (⟨n⟩ > 10 6 ), the loss is constant at low temperatures (below 150 mK) which is expected for the typical non-interacting TLS.At higher temperatures, the loss increases due to thermal quasiparticles.For low power measurements (⟨n⟩ < 10 6 ), starting from the minimum temperature, the loss has an unusual increase at low temperatures, from the base temperature to a peak at 40 mK.The loss then drops with increasing temperature following the equilibrium value of the population difference in TLS [20,58].Similar to the high power measurements, the loss rises again above 150 mK due to thermal quasiparticles.The observed loss decrease with decreasing temperature from 40 mK to 10 mK has not been explicitly acknowledged and discussed in prior work of microwave superconducting resonators until recently [59].Indications of an upturn in Q i (T ) has been attributed to poor SNR and therefore treated as not statistically significant [58,60].

Frequency shifts
The power and temperature dependent frequency shifts have primarily been attributed to TLS and the dynamics of quasiparticles.These two mechanisms could overlap and become difficult to distinguish in the operation of many superconducting devices, including resonators and qubits, [14].A simple model that combines both quasiparticles and TLS contribution in one equation describes the resonance frequency ∆f data in Fig. 2, [25,34,61] where ζ = hf 0 2k B T , f 0 is the resonance frequency as a function of the temperature, δ 0 is the zero temperature and zero power loss tangent from the TLS, Ψ(•) is the digamma function, α = L kinetic /L total is the kinetic inductance fraction of the CPW resonator, N 0 is the single spin density of states, ∆ S0 is the aluminum superconducting gap at zero temperature, and I 0 (•) is the 0th order modified Bessel function of the first kind.The first term in Eq. 2 represents the frequency shift caused by the TLS mechanism [20,23] and the second term is the frequency shift due to quasiparticles using the Bardeen-Cooper-Schrieffer (BCS) model for k B T, hf 0 ≪ ∆ S0 , and written explicitly in terms of quasiparticle number density n qp [61], including both thermal and non-equilibrium quasiparticles.However, the model with only thermal quasiparticle (valid for T ≪ T c ) seems to match the measurement sufficiently well, where N 0 = 10 47 J −1 m −3 ≈ 1.74 × 10 4 µeV −1 µm −3 is the single spin density of states at the Fermi level [7,13].
The fit to the frequency shift data is shown in Fig. 3, and the extracted fitting parameters indicate that the aluminum superconducting gap at zero temperature is ∆ S0 ∼170 µeV , a value close to the BCS gap approximation which is 1.76k B T c with transition temperature T c = 1.12 K.The values of the other fitting parameters are α ≈ 0.014, and δ 0 = 9.6 × 10 −6 .The values of α and δ 0 are consistent with other results on a variety of similar superconducting resonators [23,27,34,62].

Internal loss
Since the temperature dependent internal loss is dominated by the well-known thermal quasiparticles above 150 mK, this analysis focuses only on the low temperature data.The power dependence of the loss Q −1 i (T ) is shown in Fig. 4 at different temperatures below the onset of thermal quasiparticle effects.Clearly, the loss shows a gradual power dependence above the low-power saturation, similar to previous experimental observations [27,37,63,64], and is not consistent with STM shown as the dashed curves.
To account for the slower power dependence, many improvements on the STM have been proposed, such as introducing more than one species of TLS in the dielectrics, [65][66][67][68][69] and accounting for the nonuniform field distribution in the resonator [24].In addition, there is another approach that generalizes the STM to include a random telegraph noise on the TLS energy level due to strong interactions between a few TLS [70][71][72], resulting in the generalized tunneling model (GTM) that can produce the logarithmic power dependence shown as the black dotted line in Fig. 4.However, none of the existing models predicts a strong temperature dependence of loss below the TLS saturation.To interpret this unusual loss reduction in our aluminum resonators at low power and low temperature, we go beyond the assumption of a uniform distribution of TLS and invoke the discrete TLS contribution to the loss at low temperatures.A simple modification that sums over the discrete and detuned TLS responses near the resonance as in Eq. ( 14) is proposed.When combined with GTM, this model reproduces the full power and temperature dependence of the loss data: the gradual power dependence at high power as well as the observed anomalous temperature dependence of loss for ⟨n⟩ < 10 2 and T < 50 mK.It should be emphasized that the discrete TLS assumption is independent of GTM.Attempts to apply the discrete and detuned TLS formalism to the modified versions of STM are summarized in Supplementary Material Section S4.To lay the foundations of the proposed model, the following sections introduce key concepts of STM and GTM and derive several expressions used in the final model.

Conventional model for TLS loss
The TLS formalism is based on a simple model for a single TLS that can be described by the Hamilto- where ∆ is the asymmetry of the double well potential and ∆ 0 is the tunneling barrier energy between the potential wells [20].A typical resonator hosts an ensemble of TLS with different values of ∆ and ∆ 0 with their (assumed continuous) distribution function given as P (∆, ∆ 0 ) = P 0 /∆ 0 , where P 0 ≈ 10 44 J −1 m −3 is the density of states for TLS.The distribution function is uniform in ∆ in the conventional TLS model, but could take on a very weak dependence ∝ ∆ µ with µ ∼ 0.3 for a system of very strongly interacting TLS, [72][73][74] such as the case assumed in GTM.For simplicity and generality, the following model uses the conventional distribution function, which is constant in ∆.The fit with non-zero µ can be found in the Supplementary Material Section.S4.3.The dynamics of a single TLS can be described by the linearized Bloch equations of the pseudospin ⃗ S(t) (see Supplementary Material Section S2 for details) that is characterized by the following four rates: Rabi frequency Ω ∝ | ⃗ E|, the frequency of the driving field ω, the splitting between the two eigenenergies of TLS ε = ∆ 2 + ∆ 2 0 , and the longitudinal and transverse relaxation rates of the TLS Γ 1,2 which are defined as  as a function of power (measured by photon number ⟨n⟩ on lower axis, and Rabi frequency Ω on upper axis) at different temperatures for an Aluminum resonator on a sapphire substrate.The scatter plots are experimental data points, and the dashed lines are the fitting curves from the STM given in Eq. (8).There is a large deviation from STM power dependence at high power above TLS saturation power.The power dependence is more gradual than the STM prediction, and the loss has very weak temperature dependence, which resemble the logarithmic power dependence predicted by GTM.The black dotted line is the power dependence at high excitation power from GTM.A constant background loss is assumed for all the fits.
Eq.( 3) describes the longitudinal relaxation rate dominated by the phonon process where γ L and γ T are the longitudinal and transverse deformation potentials, respectively, v L and v T are the longitudinal and transverse sound velocities, ρ is the mass density, and Γ max 1 is the maximum Γ 1 for the TLS with energy splitting ε, when ∆ 0 = ε.Eq.( 4) defines the transverse relaxation rate where Γ ds is the dephasing rate of the resonant TLS energy level ε, caused by its interactions with thermally activated TLS whose ε ≲ k B T , and is valid for low temperature measurement (T < 1 K) [75].We note that µ = 0 for the conventional TLS model used here, and Γ ds ∼ 10 6 Hz dominates over Γ 1 ∼ 10 3 Hz in the typical cryogenic measurement of amorphous dielectrics [72].Therefore, Γ 2 is often approximated as Γ ds and is proportional to T .
In STM, the resonant dielectric response of a single TLS is expressed as [15,17,71]: where m = tanh(ε/(2k B T ))/2 is the equilibrium value of ⟨S 0 z ⟩.The single TLS loss corresponds to the imaginary part of the response function in Eq.( 5) which is in the form of a Lorentzian in ε/ℏ centered at ω with a width: where For a typical TLS with ε/h ≈ 5 GHz and at reasonably low temperatures and powers, the width of its response w ≈ Γ 2 ∼ 1 MHz ≪ ω.Due to this sharp Lorentzian response function, the total loss is dominated by the resonant TLS whose energies ε ∼ ℏω.
The total dielectric loss is simply the integral of the single TLS contribution Eq.( 5) over the distribution of the TLS [17,20,57].
where ϵ r ϵ 0 is the permittivity of the host dielectric material, P (ε, ∆ 0 ) is the distribution function of coherent TLS, obtained from P (∆, ∆ 0 ) with a change of variable from ∆ to ε, d 0 is the maximum transition electric dipole moment of the TLS with energy splitting ε, ⃗ E is the applied microwave electric field on the TLS dipole, and θ is the angle between the applied electric field and the TLS dipole moment.
Evaluating this integral leads to the famous STM prediction of TLS loss [43] where Ω c ∝ Γ max 1 Γ 2 is the critical Rabi frequency that characterizes the saturation of TLS.The loss is expected to have an inverse square root dependence on power after the TLS saturation, δ TLS ∼ Ω −1 ∝ P −0.5 ab for Ω ≫ Ω c , which is much faster than observed in the data in Fig. 4. In fact, if one fits the data with a general power law [27] where the square root in the denominator of Eq.( 8) is replaced by a fitting parameter, the resulting exponent is around -0.15, indeed a slower power dependence than predicted in STM.

Effect of fluctuators on TLS loss
The dephasing rate Γ ds introduced in Eq.( 4) describes the spectral diffusion resulting from an average of weak interactions among TLS [20,72], which cannot incorporate stochastic and discrete strong interactions following a Poisson process, such as those from fluctuators [70][71][72][76][77][78].Fluctuators can be modeled as incoherent TLS whose Γ ph 2 ≥ ε, as opposed to the coherent TLS introduced above in STM [72].If strongly coupled with the coherent resonant TLS, the fluctuators can move the latter in and out of resonance with a jump rate γ and effectively create a random telegraphic noise on the energy level ε → ε + ξ(t).The fluctuators can be modeled as following a thermally activated tunneling process with rate γ , where E a is the activation energy.For a uniform distribution of E a ∈ [E a,min , E a,max ], the distribution of the fluctuator rates is thus P (γ) ∼ 1/γ in an exponentially wide range [γ min , γ max ], with γ min = γ| Ea=Ea,max ∼ constant in T and [71,72].The random telegraphic noise with a slow jump rate γ happens infrequently during the measurement time, and thus cannot be averaged over to contribute to the spectral diffusion as in Eq. ( 4).The exact solution to the Bloch equation will depend on the relationship between γ, Ω, Γ max 1 , and Γ 2 .Γ max 1 is abbreviated to Γ 1 for clarity in the following discussion, which mainly focuses on the interaction between fluctuators and one resonant TLS.Thus, the distribution of values of Γ 1 for an ensemble of TLS is not invoked until the last step of integration to calculate the loss, and is not relevant to the fluctuatorsinduced effect.When the jump rate γ is slow compared to the dynamics of the resonant TLS characterized by the rates Ω, Γ 1 , Γ 2 , the stationary solution similar in form to Eq.( 5) can still be used with the substitution ε → ε + ξ.After averaging over the distribution of the fluctuator jumps ξ, the response of a single TLS weakly coupled to low-γ fluctuators is obtained (See Supplementary Material Section S3.1): which has the same form as Eq.( 5) but with the width of the Lorentzian widened by Γ f ∝ T ≲ Γ 2 due to the weakly-coupled low-γ fluctuators [72].For a continuous distribution of TLS such as P (ε, ∆ 0 ), the total internal loss is calculated by integrating Eq.( 9) over the distribution function P (ε, ∆ 0 ) and P (γ) in the range γ ∈ [γ min , Γ 1 ] .
Clearly, the last fraction in the integral is a Lorentzian which evaluates to a constant after integration over ε, resulting in the same prediction for internal loss as the STM [72].
On the other hand, when the dynamics of the resonant TLS is dominated by a fast jump rate, γ ≳ Ω, Γ 1 , Γ 2 , a probabilistic description of the resonant TLS must be adopted.Instead of directly solving the linearized Bloch equations Eq.(S2,S3), the master equation or the evolution equation of the probability distribution ρ( ⃗ S) of the Bloch vector ⃗ S = (⟨S where d ⃗ S/dt is given in Eq.(S2, S3) with a time independent ε where the random jumps ξ(t) are dropped since the fast jumps are averaged out over a long time.(See Supplementary Material Section S3.2).The TLS loss is then extracted by solving for the average y component of ⃗ S, ⟨S 1 y ⟩ = ρ⟨S 1 y ⟩d ⃗ S, and integrating the solution over the probability distribution of fast fluctuator jump rates P (γ) ∝ 1/γ where γ ∈ [max(Ω, Γ 2 ), γ max ], and the distribution of resonant TLS energies P (ε, ∆ 0 ).The loss has a logarithmic dependence on power : This expression explains the high power limit of the data in Fig. 4 where the losses from different temperatures converge to a linear trend in the linear-log plot.This high γ fluctuator loss will saturate to a constant value ∼ m ln(γ max /Γ 2 ) once Ω ≲ Γ 2 (See Supplementary Material Section S3.2).Thus, it will not affect the low power behavior of the TLS loss.More complicated is the case of intermediate jump rates where γ ∼ Ω, Γ 1 , Γ 2 .A similar master equation as in Eq.( 11) needs to be solved with ε → ε + ξ k for each different fluctuator state k.The jumps are no longer ignored since their rates are close to the other dynamics (Ω, Γ 2 , Γ 1 ) in the system.After obtaining the average solution ⟨S 1 y ⟩ for the TLS with energy levels ε k , the same recipe for the loss calculation can be applied, namely, integrating the average solution over P (γ) and then integrating over the distribution of the coherent TLS P (ε, ∆ 0 ) .The loss is then ( See Supplementary Material Section S3.3) where and n is the probability that a given TLS is resonant, and is typically small for a system of many (∼ 10) fluctuators (see supplementary material for [71]), and thus ignored in the final model.γ h,l are the upper and lower bounds of the jump rates and are defined such that These limits translate to a range for power Ω where the power dependence of the loss is dominated by this model: 2 )/(2Ω 2 )], a faster logarithmic power dependence than Eq. ( 12).At higher powers, the loss becomes constant ∼ m ln(2).At lower power, the loss saturates to another constant m ln(Γ 2 /Γ 1 ).In summary, the three different fluctuation rates correspond to three different power ranges for the power dependence of the loss.In the high power limit Ω ≳ Γ 2 , the effect of fluctuators that induce large γ dominates and leads to a logarithmic power dependence; in the intermediate power regime Γ 2 ≳ Ω ≳ √ Γ 1 Γ 2 , the fluctuators with intermediate γ give rise to a faster logarithmic power dependence, but meanwhile the saturation of TLS just as in STM has a comparable or even stronger power dependence and overlap in the same power regime; and finally in the low power limit Ω < √ Γ 1 Γ 2 , the typical TLS saturation in STM is recovered as the contributions from all three different types of fluctuators become constant in power.The above description qualitatively matches our experimental observation in Fig. 4.

Fit to the internal loss measurements
Although the power dependence of our data as in Fig. 4 agrees with the effect of fluctuators in the GTM, the original model does not reproduce the observed temperature dependence.The GTM predicts the same temperature dependence of the TLS loss in the low power limit as in STM [72] shown as the orange dashed curve in Fig. 5 (b), which clearly deviates from the extracted low power loss of TLS.To reconcile this difference, we propose a simple modification to the TLS model to account for the discrete coherent TLS near the resonance.Consider the discrete form of the integral in the TLS loss for low γ fluctuators, Eq. ( 10), where the index n denotes the coherent TLS near the resonance and ∆ε is the average energy spacing in the TLS spectrum.We believe that Eq.( 14) is justified since the number of coherent TLS inside the resonator bandwidth is ∼ 1 for a TLS volume around 100µm 3 [79], and many previous works have observed individual TLS in microwave resonators [80][81][82][83].For the TLS exactly on resonance, ε = ℏω, its loss δ TLS ∼ Γ −1 2 ∝ T −1 at low power, and is the classic result for the single TLS model in STM [84].However, this stands in clear contrast to the observed reduction in loss at low temperature in Fig. 2 (b) and Fig. 5.
It is thus required that the TLS is not always on resonance (ν = ε 0 /ℏ − ω ̸ = 0 where ε 0 stands for the energy level of the coherent TLS closest to resonance), a reasonable assumption given the sparse TLS distribution in the frequency spectrum for a small volume of TLS-inhabiting dielectrics.Mathematically, the width of the Lorentzian in the summation w = Γ 2 √ 1 + κ + Γ f dictates the transition from the low temperature reduced loss to the high temperature equilibrium result.For small w, a discrete sum will deviate from the integral since the Lorentzian is under-sampled.While for a Lorentzian with large w, a discrete sum with the same sampling rate will approximate the integral better.Specifically, at low powers (κ ≪ 1), w = Γ 2 + Γ f increases with the temperature and w = Γ 2 + Γ f ∼ ν marks the transition temperature between the two regimes.For low temperatures (w ≪ ν), the Lorentzian term becomes roughly proportional to w = Γ 2 + Γ f which gives the almost linear temperature dependence of loss.For higher power, w increases with κ, which pushes the transition temperature lower and suppresses the low temperature reduction in loss.And eventually at high powers (κ ≫ 1) such that w > ν for all temperatures, the equilibrium temperature dependence m = (1/2) tanh (ℏω/(2k B T )) in STM is recovered in the entire temperature range.The same discrete summation can be applied to Eq. ( 13) for intermediate γ fluctuators.On the other hand, Eq. ( 12) for high γ fluctuators is only modified with the substitution Γ 2 → Γ 2 2 + ν 2 due to the sparse TLS assumption (See Supplementary Material Section S3.2).The final model that combines all three contributions is able to reproduce the full temperature (T = 8 − 110 mK) and power (⟨n⟩ = 10 −1 − 10 8 ) dependence of the loss shown as the solid curves in Fig. 5 (a).
The fit shows reasonable agreement with the data, with root mean squared error RMSE = 0.0124.There are in total 10 fitting parameters, fewer degrees of freedom compared to fitting the data from different temperatures individually.The different contributions to the loss below TLS saturation power are plotted in Fig. 5 (b) illustrating that the discrete TLS coupled to low and intermediate γ fluctuators are responsible for the loss reduction.The different rates in the model determined from the fit are summarized in Fig. 5 (c).The numerical values for Γ 1,2 and γ max,min are typical for TLS in amorphous materials [72].The rates also satisfy the following assumptions in the model: Γ 2 ≳ Γ f , and γ max ≫ Γ 2 .In addition, the low temperature loss reduction occurs around 40 mK as expected, when Γ 2 + Γ f < ν, the width of the response is smaller than the detuning between TLS and the resonance.The other quantities extracted from the fit are listed below: the volume of TLS-inhabiting dielectrics, 10 µm 3 , the intrinsic TLS loss, δ T LS 0 = 3.85 × 10 −6 , the other loss, δ other = 1.29 × 10 −5 , and the minimum fluctuator rate γ min = 4.5 × 10 −2 Hz.

Discussion
The discrete and detuned TLS formalism will not affect the high γ-fluctuator contribution to internal loss, since the width of Lorentzian in the calculation of loss of high γ fluctuators w is widened by γ such that w ∼ γ > ∆ε/ℏ (See Supplementary Material Section S3.2), which is indicated by the almost flat region in the green dotted curve at low temperature in Fig. 5 (b).However, the loss from intermediate γ fluctuators could be subject to the low coherent TLS density but to a lesser degree than that from the low γ fluctuators, since although the bandwidth of their response ∼ Γ 2 is the same (See Supplementary Material Section S3.3), there are many intermediate-γ-fluctuator-induced sublevels for one TLS in one Rabi cycle which effectively increases the density of available TLS energy levels.In order to avoid over fitting, this effect was not included in the model where the same density of states for TLS are assumed for those coupled to intermediate γ fluctuators and the low γ fluctuators.Thus, the same ∆ε value is shared for the two different contributions.This simplification could lead to an underestimation of the loss in the intermediate power region, as illustrated by the deviation between the fit and data from ⟨n⟩ = 10 2 to 10 6 .
The discrete TLS formalism only approximates the effect of a sparse TLS spectral density where despite the spectral diffusion with a width Γ 2 , and the random telegraph noise characterized by the rate γ, the coherent TLS spends most of its time detuned from the resonance.The assumptions of even energy spacing between TLS, ∆ε, and constant energy levels, are convenient for numerical evaluation of the model, but are not necessary to reproduce the loss reduction at low temperature.Two other estimations of the probability of the TLS being on resonance, as well as the number of strongly coupled fluctuators that can bring a detuned TLS into resonance, are given in Supplementary Material Section S3.1.Both calculations show that for any TLS with a spectral width Γ 2 and a detuning to the resonance ν, the TLS becomes less likely to be on resonance once Γ 2 (T ) < ν with decreasing temperature, qualitatively agreeing with the experimentally observed loss reduction at low temperature.
The treatment above is largely classical where the TLS are treated as dipoles under classical field.A quantum mechanical approach that studies the Jaynes-Cummings model of a single TLS strongly coupled to a photon predicts a linear temperature dependence of the loss similar to our observation [84].However, it should be noted that the photon frequency in our measurement (3.64 GHz) corresponds to weak photon-TLS coupling, since the Rabi frequency from the effective field of a single photon is much weaker than the relaxation rates Γ 1,2 .Additionally, the loss from strongly coupled TLS is predicted to show saturation in power at ⟨n⟩ ∼ 1, clearly lower than the observed saturation in the data at ⟨n⟩ ∼ 10 which corresponds to the weak coupling regime and reproduces the classical result [84].
Although the fluctuators significantly affect the TLS internal loss, they should have limited effect on the frequency shifts [72].The proposed discrete and detuned TLS formalism would not modify the STM frequency shift prediction either, because unlike the Lorentzian response function that governs the internal loss, the response function for frequency shift does not have a resonant shape and is not sensitive to the reduced sampling from the discrete TLS.
Ever since the importance of TLS interactions in amorphous solids was recognized by Yu and Leggett [85], there have been numerous experimental works demonstrating evidence of TLS interactions, [70,76,86] and theoretical works treating the interacting TLS beyond STM [87][88][89], with a recent example by Burin and Maksymov where they used a similar Master equation formalism [28].However, the fluctuations in the energy levels are averaged over to form the spectral diffusion, unlike the fluctuators introduced by Faoro and Ioffe [71,72], and the loss is predicted to have a power dependence faster than STM by a logarithmic factor, contrary to our observation.
At higher temperatures (above 150 mK), the quasiparticle effects become important, which corresponds to the upturn in loss in Fig. 2. The quasiparticles loss is related to its density n qp as, where where n noneq is the non-equilibrium quasiparticle density.Similar to the fit for frequency shift, the model with only n th matches our data with the same set of fitting parameters ∆ 0 = 170 µeV and α = 0.014.A calculation of the increased quasiparticle density including both thermal and non-equilibrium quasiparticles at high photon numbers in the half wavelength resonator based on Mattis-Bardeen equations [90,91] can be found in Supplementary Material Section S5.However, the results lack any strong temperature or power dependence below 100 mK.Note that this calculation includes the dynamics of the non-equilibrium quasiparticle finite lifetime due to recombination and trapping, with and without photon illumination [92][93][94][95].

Conclusion
We have designed and fabricated capacitively-coupled half wavelength superconducting aluminum microwave resonators with minimum critical dimension of 1 µm in the center conducting line of the CPW.The temperature and power dependence of the resonator Q i deviate from the classical standard tunneling model results.At high applied powers, the internal loss shows logarithmic power dependence, a signature of the generalized tunneling model with fluctuators.At powers below TLS saturation, the internal loss decreases from 50 mK down to the fridge base temperature.We attribute this behavior to the detuning between TLS and the resonance frequency in a discrete TLS ensemble.Upon cooling, the single TLS response bandwidth, proportional to Γ 2 ∝ T 1.3 , decreases.When the bandwidth drops below the detuning between TLS and the resonance frequency defined by the CPW resonator, the resonant TLS response decreases and contributes less to the internal loss.The generalized tunneling model is revisited and modified with the discrete TLS formalism resulting in a comprehensive fit to the measured loss in the entire low temperature and low power range, with a reasonable set of parameters.

Supplemental Material for Anomalous Loss Reduction Below Two-Level System Saturation in Aluminum Superconducting Resonators
The Supplemental Material discusses the experimental details and measurement setup, reviews the standard and generalized tunneling models of two-level systems, reviews the master equation formalism for the fluctuators, presents an overview of the nonequilibrium quasiparticle and phonon model used to describe the nonequilibrium quasiparticle loss in the resonator, and includes the finite element simulation of the resonator to determine the electric field coupled to the TLS dipoles.Fig. S1: Schematic of the microwave measurement setup for the study of the aluminum λ/2 resonator.The VNA at room temperature sends a signal from port 1 to the cryostat.The signal is attenuated at each stage of the cryostat before passing through the low-pass filter (LPF) and entering the device under test (DUT).The DUT is surrounded by a Cryoperm magnetic shield.The output signal also passes through a low-pass filter before going through 0-dB attenuators that thermalize the coaxial cable center conductor.The signal passes through an isolator and is amplified at the 4 K stage and at room temperature, before entering the VNA in port 2.

S2 Conventional model for TLS loss
The TLS formalism is based on a simple model for a single TLS that can be described by the Hamiltonian, H TLS = 1 2 −∆ ∆ 0 ∆ 0 ∆ , where ∆ is the asymmetry of the double well potential and ∆ 0 is the tunneling barrier energy between the potential wells [9,20].Here ε = ∆ 2 + ∆ 2 0 is the energy splitting of the two eigenvalues of H TLS .A typical resonator hosts an ensemble of TLS with different values of ∆ and ∆ 0 .In the standard tunneling model (STM), a uniform distribution of the ensemble in ∆ and log uniform distribution in ∆ 0 is assumed [20].Whereas in the generalized tunneling model (GTM), the distribution takes on an additional weak dependence of ∆ due to a logarithmic correction to the density of states for a system of very strongly interacting TLS [72][73][74].The distribution function can be mathematically expressed as where P 0 ≈ 10 44 J −1 m −3 is the density of states for TLS, ∆ max ∼ ε max ∼ 100K [72], and the exponent µ approximately characterizes the logarithmic correction to the density of states of TLS due to strong TLS interactions [73].
The dynamics of a single TLS can be described by the linearized Bloch equations of the pseudospin ⃗ S(t) = ⃗ S 0 (t) + ⃗ S 1 (t), where ⃗ S 0 is the solution to the homogeneous system without the external field, and ⃗ S 1 is the linear solution with frequency ω of the driving field.Under the rotating wave approximation, the Bloch equations become: [20,57,71] with the Rabi frequency Ω characterizing the absorbed power, Ω ∝ √ P ab , and is defined as where S + = S 1 x + iS 1 y , Γ 1 and Γ 2 are the two phenomenological rates that describe the longitudinal (S z ) and transverse (S x,y ) relaxations, and m = tanh(ε/(2k B T ))/2 is the equilibrium value of the ⟨S 0 z ⟩.In STM, the dielectric response of a single TLS can be obtained from the stationary solution to ⟨S + ⟩ [15,17,71]: The single TLS loss corresponds to the imaginary part of the response function in Eq.(S5) which is in the form of a Lorentzian in ε/ℏ centered at ω with a width: The total dielectric loss is simply the integral of the single TLS contribution Eq.(S5) over the distribution of the TLS Eq.(S1) [17,20,57].
where ϵ r ϵ 0 is the permittivity of the host dielectric material, P (ε, ∆ 0 ) is obtained from Eq.(S1) with a change of variable from ∆ to ε, d 0 is the maximum transition electric dipole moment of the TLS with energy splitting ε, ⃗ E is the applied microwave electric field on the TLS dipole, and θ is the angle between the applied electric field and the TLS dipole moment.
For a typical TLS with ε/h ≈ 5 GHz and at reasonably low temperatures and powers, the width of its response w ≈ Γ 2 ∼ 1 MHz ≪ ω.Due to this sharp Lorentzian response function, the total loss is dominated by the resonant TLS whose energies ε ∼ ℏω.Before analytically evaluating the integral in Eq.(S7), the expressions for Γ 1,2 need to be introduced.
The longitudinal relaxation rate (Γ 1 ) of a single TLS is dominated by the phonon process: [18,42,43,57] where γ L and γ T are the longitudinal and transverse deformation potentials, respectively, v L and v T are the longitudinal and transverse sound velocities, ρ is the mass density, and Γ max 1 is the maximum Γ 1 for the TLS with energy splitting ε, when ∆ 0 = ε.
The transverse relaxation rate (Γ 2 ) is defined as Γ ds is the dephasing rate of the resonant TLS energy level ε, caused by its interactions with thermally activated TLS whose ε ≲ k B T , and is valid for low temperature measurement (T < 1 K) [75].
After substituting the expressions for Γ 1 , Γ 2 and Ω from Eqs. (S8, S9, S4) into the integral for the loss Eq.(S7) and using µ = 0, the famous STM prediction of TLS loss is obtained, [43] where Ω c ∝ Γ max 1 Γ 2 is the critical Rabi frequency that characterizes the saturation of TLS.The loss is expected to have an inverse square root dependence on power after the TLS saturation, δ TLS ∼ Ω ∝ P −0.5 ab for Ω ≫ Ω c , which is much faster than observed in the data in Fig. 4.This leads us to look at the generalized tunneling model with the inclusion of fluctuators.

S3 Master equation formalism for fluctuators
The theoretical work on master equation of the fluctuator coupling to TLS is based on [71,72].The following discussion summarized their findings in a coherent manner and derives several key results based on the model.Much like the STM, the dynamics of a single TLS is described in the similar Bloch equations with an additional time dependent fluctuation of the TLS energy level ξ(t) due to the fluctuators' interaction with the coherent TLS [20,57,71]: The solution to the Bloch equations depends on the relation between the transition rates induced by fluctuators γ, whose distribution follows P (γ) ∼ 1/γ in an exponentially large range [γ min , γ max ], and the other inherent dynamics: Ω, Γ 1,2 , ω − ε/ℏ.To systematically present the predictions from the different solutions to the mater equations under different limits, this section is structured according to the three regimes of the rate γ, namely high γ > Ω, Γ 2 , intermediate γ ≈ Ω or Γ 2 , and low γ < Γ 1 .

S3.1 Low γ fluctuators
For low γ fluctuators (γ min < γ < Γ 1 ), the single TLS response is given by the stationary solution to Eq.(S2,S3), since the TLS dynamics is much faster than the jump rates γ.We can further separate the analysis for two different types: the common weakly-coupled fluctuators which shift the energy level of TLS by a small energy ξ and result in a widening of the spectral width of the TLS, and the rare strongly-coupled fluctuators whose ξ is large enough and produce large stochastic jumps on the TLS energy level ε.The weakly-coupled fluctuators induce detunings that follow a Lorentzian distribution with width Γ f ∝ (k B T /ℏ)(k B T /ε max ) µ ≲ Γ 2 similar to the spectral diffusion [72]: The single TLS imaginary response under a single weak fluctuator can be obtained from Eq.( S5) by replacing ε with ε + ξ(t) [72]: where κ = Ω 2 /(Γ 1 Γ 2 ).Integrating Eq.(S12) over the distribution of detuning P (ξ(t)) and the distribution of the fluctuator rate P (γ), one can obtain the single TLS loss: The second fraction is in the form of a Lorentzian with width Γ 2 √ 1 + κ + Γ f , and the first term is the same as the STM formula.If a continuous distribution function for TLS P (ε, ∆ 0 ) is assumed, one can obtain the total internal loss by integrating Eq.(S13) over P (ε, ∆ 0 ), resulting in the same internal loss as in STM.
The distribution P (ξ(t)) may not apply to the strongly-coupled fluctuators which completely shift the TLS in and out of resonance, since there are typically only a few strong fluctuators in surface dielectrics [72].They contribute to the imaginary part of a single TLS response as a random telegraph noise (i.e.ξ(t) in Eq.(S12) ).The internal loss is still in the form of the STM loss Eq.(S5), but could be reduced by the strong fluctuators occasionally moving the coherent TLS in and out resonance.
Three methods were used to estimate the average effect of the low γ fluctuators on the internal loss.Intuitively, we could estimate the total loss by summing up the contribution from all the TLS near the resonance.The key assumption is the low density of the resonant TLS which justifies the discrete treatment.An estimation based on [79] yields ∼ 1 TLS in the bandwidth of the resonance for a volume of TLS-inhabiting dielectrics around 30 µm 3 , which supports our assumption.The first method is thus a summation of all the single TLS loss in the form of Eq.(S13) near the resonance with ε = ...ℏ(ω + ν) − 2∆ε, ℏ(ω + ν) − ∆ε, ℏ(ω + ν), ℏ(ω + ν) + ∆ε, ℏ(ω + ν) + 2∆ε, ... , and is given in Eq. (14).A finite detuning less than the average energy spacing in the TLS spectrum, ℏν < ∆ε, is included such that the TLS closest to resonance will not give a diverging response as T → 0, and instead contributes to the loss as mΓ 2 ∆ε/(ℏν 2 ) ∝ T 1+µ at low temperature.This detuning is a consequence of a sparse TLS distribution where ∆ε/ℏ ≳ Γ 2 .Thus, any given TLS is rarely on resonance due to the low density of states.We should emphasize that this treatment is independent from the master equation formalism for the fluctuators, and can be applied to STM with low TLS density where the use of distribution function P (ε, ∆ 0 ) is inappropriate.
The second method assumes that the detuning ν is not much larger than Γ 2 so that the distribution P (ξ(t)) for the weakly-coupled fluctuators can still be applied to the case ξ/ℏ ∼ ν ≲ Γ 2 .The probability of a TLS with a detuning ν to be on resonance under the influence of the low γ fluctuators can be calculated by integrating P (ξ(t)): which also leads to a monotonic increasing temperature dependence below 50 mK as in the data.
The third method estimates the number of strongly-coupled fluctuators capable of moving the coherent TLS in and out of resonance.Since the energy drift caused by fluctuators is directly related to the interaction energy, which is dipoledipole like, U (r) ∼ r −3 , we could find a volume around the near-resonant TLS which hosts fluctuators that interact strongly enough.For Γ 2 < ν, the TLS is detuned, and the criterion for being a strong fluctuator is that the induced energy shift can bring the TLS in resonance such that ℏ(ν − Γ 2 ) < U (r) < ℏ(ν + Γ 2 ).This volume is a spherical shell with inner and outer radius defined by the bounds on the interaction energy U (r).For Γ 2 > ν, the TLS is already in resonant, and the strong fluctuators are those that move the TLS out of resonance, with U (r) > ℏ(Γ 2 − ν).This volume is a sphere with radius defined by U (r) = ℏ(Γ 2 − ν).The total number of strong fluctuators can then be estimated as: where the density of states for TLS from Eq.(S1) is used, and the energy range is set to the thermal energy k B T .The two expressions Eq.(S14,S15) are plotted below as a function of Γ 2 ∝ T 1+µ .
The x axis of Fig. S2, Γ 2 is normalized by the detuning ν.When Γ 2 increases to ν, the probability for TLS to be resonant under the weak fluctuators (Blue curve in Fig. S2) approaches a constant 2/π arctan(Γ 2 /Γ f ), which is denoted by the horizontal line where Γ f = 0.8Γ 2 .The interpretation of the number of strong fluctuators (Yellow curve in Fig. S2) is more nuanced.When Γ 2 < ν, the coherent TLS itself is detuned, the strong fluctuators shift the TLS in resonance, and therefore their number is positively correlated with the TLS response.However, for Γ 2 > ν, the strong fluctuators shift the already resonant TLS out of resonance, and thus their number is negatively correlated with the TLS response.Consequently, as Γ 2 or temperature increases, the resonant response becomes stronger.The divergence near Γ 2 ∼ ν represents the failure of this estimation in the range where any arbitrarily small shift can move the TLS in and out of resonance, and the strong fluctuator volume goes to infinity.In reality, this volume is bounded by the host material for TLS.For Γ 2 ≫ ν, the case of TLS exactly on resonance is recovered, and the number of strong fluctuators is clearly independent of temperature.A numerical estimate of N strong flucutators (Γ 2 /ν → ∞) gives ≲ 1 for typical weakly interacting TLS [72].In both estimates, the detuned TLS becomes resonant more often as Γ 2 or temperature increases, which qualitatively explains the increase in loss in temperature at low temperatures in the measurement.
It should be noted that if applied to TLS exactly on resonance (Γ 2 /ν → ∞), the above arguments will lead to the conclusion that low γ fluctuators present no temperature dependence for both the strong and weak low γ fluctuators, which forms the basis of the claim that no anomalous temperature dependence on the loss is expected in the original GTM work [72].
and high γ fluctuators.The dynamical effect is still described by a similar master equation Eq.(S16) for high γ fluctuators with ρ replaced by ρ k for each different state k of the fluctuators.Moreover, the fast random telegraph noise is not averaged out, and the dynamics of the transverse component can be treated as stationary in a rotating wave approximation (d⟨S + ⟩/dt = 0).The resulting master equation to be solved is then [71] where ⟨S 0 z ⟩, ⟨S 1 y ⟩ are abbreviated as z and y, and , and γ kn is the transition rate between the fluctuator state k and n.Γ 1 and m are not subscripted with k since their ε k dependence contains a slowly varying function in the form tanh[ε k /(2k B T )] that should not vary much near the resonance.z k can be solved from Eq.(S20) and substituted in Eq.(S21) to obtain y k where n k is the probability for a TLS to have energy ε k , γ = n (γ kn + γ nk ) is the total effective jump rate for fluctuator state k.The single TLS y-component response with the energy level ε k can then be calculated from integrating Eq.(S22) over the probability distribution P (γ) ∝ 1/γ where when the TLS is on resonance (ε k = ℏω).The upper and lower bounds for the intermediate fluctuators are obtained from the TLS dynamics as in Sec.S3.2 such that The total loss can then be obtained from summing up the contribution from all the different TLS energy levels just as in Eq.( 14).
If one assumes that the energy drift induced by the individual fluctuator follows the same Lorentzian distribution P (ξ) for the weak fluctuators in Sec.S3.1, the total drift will follow a Lorentzian distribution with a wider width P (ξ) ∝ N Γ f /((N Γ f ) 2 + ξ 2 ) since there are N fluctuators affecting the TLS in the case of intermediate fluctuators.If we assume that this width N Γ f is comparable to the detuning ν , then the broadening of the TLS energy levels will ensure the existence of resonant TLS.Therefore, one can apply the continuous distribution of TLS P (ε, ∆ 0 ).The final result of the internal loss after the integration is then The loss takes on a logarithmic power dependence in the intermediate power range 2 )/(2Ω 2 ) .This loss saturates to a constant ∼ m ln Γ 2 2 + ν 2 /Γ 1 for lower powers, and to a small constant ∼ m ln(2) for higher powers.
In summary, the power dependence of the loss can be separated into three different regimes where in the high power limit Ω ≳ Γ 2 , the effect of fluctuators that induce large γ dominates and leads to a logarithmic power dependence; in the intermediate power regime Γ 2 ≳ Ω ≳ √ Γ 1 Γ 2 , the fluctuators with intermediate γ give rise to a faster logarithmic power dependence, but meanwhile the saturation of TLS just as in STM has a comparable if not stronger power dependence and overlap in the same power regime; and finally in the low power limit Ω < √ Γ 1 Γ 2 , the typical TLS saturation in STM is recovered as the fluctuator contributions from all three different regimes become constant in power.The above description matches our experimental observation in Fig. 4.
The final fit to the data can be obtained from summing up all three contributions from different regimes of fluctuators: the discrete summation for the low and intermediate γ fluctuators, Eq.( 14) and Eq.(S24) respectively, and the logarithmic power dependent loss from the high γ fluctuators Eq.(S19), and is shown in Fig. 5.

S4 Alternative models for fitting TLS loss
None of the existing TLS models shows a strong temperature dependence of loss below TLS saturation.Hence alternative models need to be combined with the discrete TLS formalism to account for the observed temperature dependence of loss.Here we examine two models commonly adopted to explain the slow power dependence above TLS saturation power, as well as an alternative version of the discrete GTM fit used in the main text in which an energy dependent density of states is assumed with µ ̸ = 0.

S4.1 Two TLS model
The two TLS or, by extension, multiple TLS model is a popular alternative in explaining the slower power dependence than STM.The multiple species of TLS could be attributed to nonuniform field distribution in the resonator [64], different dipole moments of the TLS [69], and different microscopic origins of the TLS [67].The exact model for TLS loss is simply a sum of two different STM loss contributions: where m n = tanh(ε n /(2k B T )) and its subscript explicitly denotes its dependence on ε n .The difference between the two species of the TLS is encoded in their intrinsic losses δ T LS,1 0 , δ T LS,2 0 which accounts for different participation ratios, and the factor η which could be due to different field strength experienced by the TLS dipoles, different dipole moments, or different relaxation rates of the two species of TLS.The resulting fit in Fig. S3 shows a systematic deviation between the data and the fit, with RMSE = 0.021.There are 9 fitting parameters in this model.The resulting relaxation rates from 10 2 10 3 10 4 10 5 10 6 10 7 10 8 [Hz]  the fit are Γ 2 = 4.7 × 10 5 to 2 × 10 7 Hz and Γ 1 = 6.8 × 10 3 to 10 4 Hz.The losses from the different contributions are determined as δ other = 1.29 × 10 −5 , δ T LS,1 0 = 6.9 × 10 −6 , and δ T LS,2 0 = 6.8 × 10 −6 .The other fitting parameters are volume of the TLS-inhabiting dielectrics 2µm 3 , the asymmetry ratio η = 18, and the exponent of the power dependent density of states of TLS µ = 0.2.We should note that this fit might be improved by including more species of TLS.However, this multiple-TLS model lacks a physical motivation and is not considered in this work.

S4.2 Power law fit
Another well-known phenomenological fitting method introduces a free power law dependence to the saturation power term 1 + κ [27].The model when combined with the discrete formalism becomes where β is the exponent for the free power law dependence, and β = 1 corresponds to the STM prediction.The discrete free power law fit has a large deviation from the data as shown in Fig. S4, with RMSE = 0.036.The 8 fitting parameters are the power law exponent β = 0.38, the exponent in the energy dependent density of states of TLS µ = 0.2, the volume of TLS-inhabiting dielectrics 2.6µm 3 , the loss from other mechanisms δ other = 1.29 × 10 −5 , the intrinsic TLS loss δ 0 = 8 × 10 −6 , and the two relaxation rates are Γ 1 = 700 to 10 3 Hz and Γ 2 = 3.3 × 10 4 to 1.4 × 10 6 Hz.Moreover, neither of the above alternative models provide a clear interpretation of the physics in the TLS loss, since the phenomenological fitting parameters could have different microscopic origins, i.e. nonuniform field distribution, or different dipole moments or relaxation rates among several distinct ensembles of TLS.Not only does the discrete form of GTM in this work generate better fits to the measured loss, but it also provides a clear physical explanation for the observed loss: the low spectral density of coherent TLS and their interactions with the fluctuators are responsible for the observed temperature and power dependence.

S4.3 Fit with energy dependent density of states
In Faoro and Ioffe's original work on GTM, an energy dependent density of states (DOS) for the TLS is assumed [72], where µ ∼ 0.3 and ∆ max ∼ ε max ∼ k B (100K).The small exponent µ approximately characterizes the logarithmic reduction of the density of states of TLS at low energy in a system of interacting dipoles [73].The energy dependent DOS will lead to a small correction in the linear temperature dependence of the dephasing rate, Γ ds ∼ 10 −3 (k B T /ε max ) µ k B T /ℏ.The model with an energy dependent density of states where µ is a free fitting parameter is used to fit to our data and is shown in Fig. S5.
The fit shows reasonable agreement with the data, with RMSE = 0.014, slightly larger than that of the µ = 0 fit in the main text.There are in total 11 fitting parameters.The different contributions to the loss below TLS saturation power are plotted in Fig. S5 (b).Similar to the fit in the main text, the discrete TLS coupled to low and intermediate γ fluctuators are responsible for the loss reduction.Different rates in the fit are summarized in Fig. S5 (c), with typical numerical values for TLS in amorphous materials [72].The rates also satisfy Γ 2 ≳ Γ f , γ max ≫ Γ 2 .In addition, the low temperature loss reduction occurs around 40 mK as expected, when Γ 2 + Γ f < ν, the width of the response is smaller than the detuning between TLS and the resonance.The other quantities extracted from the fit are listed below: the volume of TLS-inhabiting dielectrics, 18.5 µm 3 , the intrinsic TLS loss, δ T LS 0 = 5.5 × 10 −6 , the other loss, δ other = 1.29 × 10 −5 , and the minimum fluctuator rate γ min = 2.9 × 10 −4 Hz, and the exponent for the energy dependent density of states µ = 0.2.

S5 Non-Equilibrium Quasiparticle Treatment and n qp and Q qp Estimates
The aim of this section is to examine the non-equilibrium quasiparticle density by simulating the physics using the nonequilibrium energy-dependent distribution of quasiparticles.This model (scattering model) was established by considering the coupled quasiparticle and phonon systems.[6] The details of this model are described in references [7] [8] and [9] and employ numerical methods to discretize the distribution of quasiparticles f (E) at energy E and the phonon distribution n(Ω) at energy Ω by solving the kinetic equations in steady state for a given flux of microwave photons and higher frequency radiation.

Figure 1 :
Figure 1: An SEM image of the aluminum CPW resonator on a sapphire substrate.(b) (c) Zoom-in SEM images of the left and right capacitive couplers.(d) AFM image highlighting the tapered center conductor with a 1 µm wide center trace near the center of the resonator, and (e) AFM topography image highlighting the 5 µm wide capacitive coupler from (b) or (c).Note that the AFM probe scanning direction is 45 degrees with respect to the center-line direction to reduce AFM scanning artefacts.(f) Line scan profile of AFM image to show thickness of the center line in (d).(g) Line scan profile of the capacitive coupler.Both line scans show an Al film thickness of 70 nm.

Figure 2 :
Figure 2: (a) Temperature dependent first harmonic resonant frequency shift ∆f /f 0 (6mK), with ∆f = f 0 − f 0 (6mK) of the λ/2 Aluminum co-planar waveguide resonator on sapphire substrate measured at different excitation powers (average photon numbers).Here f 0 (6mK) is the resonance frequency measured at the base temperature for each excitation power .(b) Temperature dependent loss (inverse of intrinsic quality factor, Q −1i ) at its first harmonic frequency of a Aluminum co-planar waveguide resonator on sapphire substrate measured at different circulating photon numbers ⟨n⟩.Some of the error bars are smaller than the data point such as those for the high power and temperature measurements.

Figure 3 :
Figure 3: Temperature dependent fundamental (λ/2) mode resonant frequency f 0 (T ) of the Al CPW resonator on sapphire substrates at an external microwave excitation creating around one circulating photon.The inset highlights the low temperature regime where the frequency shift is dominated by the TLS mechanism.The dots are experimental data and solid line is the model fit to Eq. (2).

1 Figure 5 :
Figure5: a) The least squares fit of the discrete GTM, together with a constant background loss, to the full power and temperature dependence of the measured internal loss below 150 mK b) Plot of δ 0 TLS (T ) extracted from the average of the low power loss below TLS saturation in Fig.4.The orange dashed curve is the temperature dependence of STM loss below saturation power ∝ tanh (ε/(2k B T )).The purple dash-dotted (light blue densely dotted) curve is from the discrete summation of individual TLS contributions for low (intermediate)-γ fluctuators at zero applied power.The green dotted curve is the temperature dependent low power limit of the TLS loss induced by high γ fluctuators.The blue solid curve is the sum of contributions from the low, intermediate, and high γ fluctuators.c) Comparison of the temperature dependent rates determined from the least squares fit.

Fig
Fig. S1 schematically shows the setup of the cryogenic transmission measurement with the VNA to characterize the resonator response.A variety of attenuators (produced by XMA) are used on each cryogenic stage to thermalize the center conductors of the coaxial cables.The total attenuation in the input line is -66 dB.Both the input line and output line on either side of the device are filtered by commercial microwave low-pass filters.The input line has a Marki Microwave low pass filter (FLP-1460) with 3-dB cutoff frequency at 14.6 GHz and the output line has another Marki Microwave low pass filter (FLP-1250) with 3-dB cutoff frequency at 12.5 GHz.The output line goes through the cryogenic isolator (QUIN-STAR Technology QCI-G0301201AM) with working frequency band 3-12 GHz, and then the signal is amplified by 36 dB using a commercial high-electron mobility transistor amplifier (Low Noise Factory LNF-LNC0.314A with typical noise temperature 4.2 K) at the 4K stage, and then the transmitted signal is further amplified by 37 dB using another room temperature amplifier (HEMT) (Low Noise Factory LNF-LNC2 6A with typical noise temperature 50 K at ambient temperature).

Fig. S3 :
Fig. S3: The discrete two TLS model fit to the loss data.

Fig. S4 :
Fig. S4: The discrete power law fit to the loss data

1 Fig
Fig.S5: a) The least squares fit of the discrete GTM, together with a constant background loss, to the full power and temperature dependence of the measured internal loss below 150 mK.b) Plot of δ 0 TLS (T ) extracted from the average of the low power loss below TLS saturation in Fig.4.The orange dashed curve is the temperature dependence of STM loss below saturation power ∝ tanh (ε/(2k B T )).The purple dash-dotted (light blue densely dotted) curve is from the discrete summation of individual TLS contributions for low (intermediate)-γ fluctuators at zero applied power.The green dashed curve is the temperature dependent low power limit of the TLS loss induced by high γ fluctuators.The blue solid curve is the sum of contributions from the low, intermediate, and high γ fluctuators.c) Comparison of the temperature dependent rates determined from the least squares fit.
Fig.S6(a) and Fig.S6 (b)show the calculated quasiparticle distribution and phonon distribution, respectively, for three selected circulated numbers of photons in the half wavelength resonator.This calculation is performed by assuming the fridge base temperature T b = 10 mK, the resonator drive frequency 3.6442 GHz (ℏω = 23 µeV ), superconducting energy gap ∆ S0 = 188 µeV , Q i = 10 5 , Q c = 1.5 × 10 6 which we obtain from fitting the resonance, resonator center conductor volume 8.6 × 10 −14 m 3 and effective temperature T eff = 189 mK due to stray light illumination and radiation which creates enhanced phonon generation, as described by the Parker model[93].Table SI lists all of the parameter values used in this model.The black dashed line in Fig.S6(a) indicates the thermal distribution of quasiparticles without any microwave excitation at T b = 10 mK.At low circulating photon numbers (⟨n⟩ = 1 or 325 ), the quasiparticle distribution is enhanced significantly above the T b thermal distribution, and the phonon distribution is also enhanced.At high circulating photon numbers, jumps appear in the electron and phonon distributions every ℏω because microwave drive at high power significantly affects the distributions.Big step jumps at E = 3∆ S0 and Ω = 2∆ S0 are due to pair breaking and recombination