MARVEL analysis of high-resolution rovibrational spectra of 13 C 16 O 2

A set of empirical rovibrational energy levels, obtained through the MARVEL (measured active rotational-vibrational energy levels) procedure, is presented for the 13 C 16 O 2 isotopologue of carbon dioxide. This procedure begins with the collection and analysis of experimental rovibrational transitions from the literature, allowing for a comprehensive review of the literature on the high-resolution spectroscopy of 13 C 16 O 2 , which is also presented. A total of 60 sources out of more than 750 checked provided 14,101 uniquely measured and assigned rovibrational transitions in the wavenumber range of 579 – 13,735 cm (cid:1) 1 . This is followed by a weighted least-squares refinement yielding the energy levels of the states involved in the measured transitions. Altogether 6318 empirical rovibrational energies have been determined for 13 C 16 O 2 . Finally, estimates have been given for the uncertainties of the empirical energies, based on the experimental uncertainties of the transitions. The detailed analysis of the lines and the spectroscopic network built from them, as well as the uncertainty estimates, all serve to pinpoint possible errors in the experimental data, such as typos, misassignment of quantum numbers, and misidentifications. Errors found in the literature data were corrected before including them in the final MARVEL dataset and analysis.


| INTRODUCTION
Carbon dioxide is a well-known trace species in the Earth's atmosphere, the recent increase in its concentration is associated with human activity, and it is involved in climate change. 1 Most of the spectral regions corresponding to the main isotopologue, 12 C 16 O 2 , are optically thick, meaning that increases in atmospheric concentration lead only to logarithmic increase in the radiative forcing associated with the so-called greenhouse effect (see, e.g., Reference 2).In the atmosphere of Earth, about 1.1 % of CO 2 is in the form of the 13 C 16 O 2 isotopologue. 3The corresponding spectral lines are not optically thick, increasing the importance of this isotopologue as a greenhouse gas.
While 13 C is only a minor constituent in the Earth's atmosphere, the 13 C to 12 C abundance ratio is known to vary significantly in the Universe.There are observations which suggest that at places the ratio might be as high as one third. 4The detection of CO 2 in the atmospheres of hot Jupiter exoplanets, 5 including a recent one on WASP-39b by the James Webb Space Telescope, 6 have all been made using low-to medium-resolution transit spectroscopy, which cannot distinguish between different isotopologues.Thus, these observations are unable to provide information on isotopic ratios.However, highresolution, cross-correlation studies performed from the ground have been shown to be capable of distinguishing different carbon isotopologues; for example, a pioneering study of CO of exoplanet TYC 8998-760-1 b suggested that the abundance of 13 C was more than 30 % of that of 12 C. 7 High-resolution studies of exoplanet spectra require especially accurate line positions.There are line lists available for hot CO 2 , [8][9][10][11] but these are generally not of sufficient accuracy for use in crosscorrelation studies for high-temperature objects.The most practical means of improving the accuracy of theoretical line lists is the introduction of experimental/empirical energy-level data, facilitating the development of improved line positions.Supplying empirical rovibrational energy levels for 13 C 16 O 2 was one of the motivations of the present project.
The MARVEL (measured active rotational-vibrational energy levels) [12][13][14][15] procedure, based on the theory of spectroscopic networks (SN), [16][17][18] is able to provide such highly accurate empirical energy levels.For this, we need a dataset of experimental line positions, which we created for the carbon dioxide isotopologue 16 O 13 C 16 O (636 in HITRAN parlance).MARVEL determines the SN representing all interconnecting rotational-vibrational energy levels and, based on an inversion process, yields empirical energy levels with appropriate uncertainties.Besides providing these data, a MARVEL analysis is able to identify incorrect quantum number assignments, overly optimistic uncertainty values, mistaken attributions, and many other types of errors.The empirical rovibrational energies can be used to check and improve existing theoretical models, as well as line lists, for example those generated within the ExoMol project. 19,20The joint utilization of the best empirical and theoretical data provides both completeness and the most accurate predictions of transition frequencies, see Ref-

| The MARVEL procedure
A MARVEL project begins by gathering, analyzing, and validating assigned lines of high-resolution spectra.Attributes of each line, besides their position, include a unique label for the upper and lower states, a measurement uncertainty value, and a unique tag identifier.The lines are then used to build a spectroscopic network, whereby each state corresponds to a node, and the nodes are linked by the observed transitions. 16This representation of the spectroscopic measurement results allows the determination of empirical energy-level values, together with their uncertainty estimates.The transitions form a well-connected network, with most transitions linked to the ground state via various paths.However, this connection is not always possible using experimental data alone.The missing lines may result in fragmentation of the principal component(s) of the SN.As a result, consistency of the lines of the floating components with the rest of the data cannot be established.This is the reason why such lines remain unvalidated at the end of a MARVEL analysis.
Since MARVEL is not based on a particular quantum-chemical model, it will "validate" forbidden or incorrect transitions when they are not in conflict with the rest of the data.For this reason, it is important to check for such transitions while building up the MARVEL input dataset.

| Quantum numbers and selection rules
There are two conventions in general use for assigning quantum numbers to the vibrational states of the linear molecule CO 2 .The standard, so-called Herzberg notation is based on the harmonic oscillator (HO) picture and uses four vibrational quantum numbers, (ṽ 1 , ṽl 2 2 , ṽ3 ), where ṽ1 , ṽ2 , and ṽ3 describe the symmetric stretch, bend, and antisymmetric stretch of the molecule, respectively, while l2 denotes the angular momentum associated with the bending mode and can take values of ṽ2 , ṽ2 À 2, ṽ2 À 4, …, 1 or 0. Complications induced by the well-known Fermi resonance between the ν 1 and 2ν 2 states of CO 2 led to the introduction of the so-called AFGL (air force geophysics laboratory) notation, 3,23,24 which is adopted here.
The AFGL notation groups the vibrational states into Fermi polyads and uses five vibrational quantum numbers, (v 1 v 2 ℓ 2 v 3 r), where r is the Fermi-resonance ranking index.In this notation, the Fermi polyads are determined by v 1 , ℓ 2 , and v 3 ; 25 v 2 is always equal to ℓ 2 , and r takes values from 1 to v 1 +1.To the best of our knowledge, there is no unambiguous conversion between the Herzberg and the AFGL conventions; hence, we used data from multiple datasets to match the Herzberg notation to AFGL.Since the first release of the HITRAN database, 23 it has been emphasized that when using older notations the order of some energy levels can change from one CO 2 isotopologue to the other, as shown by the work of Amat and Pimbert. 26 addition to the vibrational quantum numbers, there are two further quantum numbers required to label the rovibrational states of CO 2 : quantum number J, describing the overall rotation of the molecule, which takes values of J ≥ ℓ 2 , and parity, p, for which we use the rotationless parity denoted by e and f. 27 Unlike the vibrational quantum numbers, J and p are rigorously conserved (they are exact quantum numbers).For our MARVEL procedure we employ the AFGL notation and each rovibrational state has the label (J v 1 v 2 ℓ 2 v 3 r p).
States with ℓ 2 ¼ 0 all have parity e.In principle, states with ℓ 2 > 0 can be both e and f, but due to the Pauli principle, half the rotational levels are missing; to be present, ðJ þ v 3 þ ℓ 2 þ pÞ, where p ¼ 0 for e and 1 for f states, must be even.
It is standard to use the point group D ∞h labels to denote levels.
In this notation Σ, Π, Δ, … represent ℓ 2 ¼ 0, 1,2, …, and the other symmetry labels are included as: When building up the MARVEL input file, we tested for incorrectly labelled transitions in the dataset to ensure the correctness of all the lines.As part of this procedure, obeyance of dipole selection rules were checked.For the 13 C 16 O 2 isotopologue, they include vibrational, rovibrational, and rotational, selection rules. 25These selection rules, as well as the Pauli-principle constraint, were all used to verify the labels of the experimental transitions obtained from the literature.

| Resonances in CO 2
Various types of resonances affect the infrared spectra of carbon dioxide, such as Fermi, Coriolis, and ℓ-type resonances. 25,28ese resonance effects complicate the energy-level labeling due to occurrences of overlapping transitions, contributing to the complexity of the spectral patterns.Figure 1 shows the considerable effects Fermi-type resonances have on the experimental spectrum.
The MARVEL procedure is able to find discrepancies in the energy-level labels and can find transitions misassigned due to various types of resonances.Nevertheless, a comparison with the assignments present in the NASA Ames-2021 variational line list 29 and the effective Hamiltonian Carbon Dioxide Spectroscopic Databank (CDSD) line list 30 was performed.In accordance with recent studies that analyzed some of these effects, 31,32 our study verifies many of the reassignments made.
We also make further reassignments, not suggested previously (vide infra).

| Beat frequencies
4][35][36][37][38][39][40][41][42][43][44] A beat frequency is the result of mixing two frequencies.This kind of measurement is often made using the heterodyne measurement technique, which gives very accurate results.However, unlike absolute frequencies, which form the input of the standard MARVEL procedure, beat frequencies do not correspond to a specific line position and a pair of lower and upper states.Beat-frequency measurements connect four energy levels using just one frequency, which represents the difference between two transition frequencies.In an absolute frequency measurement, the frequency (ν) represents the difference between two energy levels, ν / E 2 À E 1 .For a beat-frequency measurement, the frequency (ν b ) corresponds to the difference between four energy levels, 15]45 Initial tests, which included beat frequencies as extra data in the MARVEL process, were found to lead to ill-conditioned matrices even when the four levels involved in the measured beats were well determined by the standard SN.Illconditioning occurs when the beat-frequency measurements have a lower uncertainty than the standard measurements determining the energy levels involved.Of course, this is precisely the case when beat-frequency data are of real interest.

| Data collection
The present study started by collecting and analyzing literature sources that discuss high-resolution rovibrational spectra of carbon dioxide.More than 750 sources were analyzed and given a tag, using Distribution of the Fermi ranking index, r, across the experimental spectral region covered, illustrating the overlapping of transitions.
T A B L E 1 Experimental sources of 13 C 16 O 2 rovibrational transitions, the wavenumber range they span, numbers of lines, uncertaintity information and the labelling scheme used in each source.Data from four sources [47][48][49][50] were excluded from our final dataset.Reasons for their exclusion is given in Section 3.4.

| Dataset construction
First, we constructed a dataset comprising the most reliable data, that is, self-consistent data which also have low experimental uncertainty.
This master dataset included nearly 70% of our gathered data, the remaining 30% included conflicting data, as well, which had to be analyzed carefully, line by line.For this purpose, we developed a code that automates the MARVEL input procedure and detects lines that produce conflicts with the master dataset.Such lines are referred to as "bad lines".A bad line is not necessarily incorrect, it simply shows the lack of self-consistency in the SN assembled, and the problem could be due to errors present in other lines.
Our first attempts to use this code produced over a thousand bad lines.These lines were then carefully analyzed in order to minimize their number.A large portion of the bad lines were due to misassignments, the rest were due to typos, illegal transitions, and possible misidentification of the isotopologue or molecule.
After reducing the number of bad lines to only 36, which were excluded from the final calculations, we began analyzing the uncertainties suggested by MARVEL.At that point, our SN contained floating components that contained around 300 transitions.44 lines from CDSD-2019 30 were used to link the floating components with the main component, reducing the number of lines in the floating components to less than 100.The remaining floating components are too fragmentary to link to the main network, we would need to include many more semiempirical lines than was deemed to be reasonable.The lines from CDSD-2019 were given an uncertainty of 0.0005 cm À1 .
Our final dataset contains rovibrational transitions collated from sources given in Table 1.Of the 20,754 experimental transitions gath- we were able to determine absolute energies for 6318 of them.
The 13 C 16 O 2 lines we have gathered cover the region from 579 to 13,735 cm À1 .Figure 2 illustrates the distribution of the collected data, using two vertical axes to help appreciate the amount of experimental data acquired compared to HITRAN2020. 51

| Comments on literature sources used
67Hahn: 52 This source provides the same bands twice in two sets of tables, with the second set of tables switching the assigned branch.
Our analysis shows that the first table (Table 1) provides the correct branch.No data from other tables were taken.
68ObRaHaMc: 53 Provides two sets of transitions, recorded at two different laboratories.The two datasets were included with distinct tags.
78BaLiDeRa: 56 This source uses Herzberg's notation.While updating the notation to AFGL, we had to split the 0311e-0310e band into 11112e-11102e and 30001e-11102e.This could be an assignment issue or a result of the difference in notation.82BaRiSmRa: 57 Uses Herzberg's notation.While updating the notation to AFGL, we had to split the 0311e-0110e band into 11112e-01101e and 30001e-01101e; and split the 3000-0110 band into 11112e-01101e and 30001e-01101e.This could be an assignment issue or a result of the difference in notation.
00TaPeTeLe: 58 Contains two transitions connected to an energy level, 25 2 0 0 5 2 e, not present in Ames-2021. 29As this level is present in HITRAN2020 51  08PePeCa: 32 Contains two lines which we suggest to reassign and correctly assigns 45 lines misassigned in 04DiMaRoPe, 59 9 of which were already corrected in 08PeDeLiKa, 31 and also correctly assigns 34 lines misassigned in 08PeDeLiKa. 31BeDeSuBr: 60 Contains 15 lines for which we suggest reassignments.
F I G U R E 2 Coverage of the transition data obtained from literature sources (see Table 1 for more details).The blue columns follow the left vertical axis, each column covers a region of 25 cm À1 .In the background, the spectrum from HITRAN2020 51 is given in orange, with the right vertical axis being the line intensity.

| Comments on literature sources not used
45NiYa: 47 The claimed uncertainty of this source is 0.07 cm À1 .It covers the region 2243-2329 cm À1 , which is well covered by other, higher-resolution studies.
66GoMc: 48 The lines provided are identical to those in a previous publication from the same group. 6182EsHuVa: 49 The lines provided are identical to those in another publication. 24ZhQuReHu: 50 The data show large discrepancies with other sources.Although it covers the region 912-937 cm À1 , which is not covered by any other source, no new energy levels are determined by this source.

| Relabeling of states
For the sake of unifying the notation of the energy levels across the entire dataset, we had to update the labels of 5149 lines collected from 21 sources, see Table 1.During the update, we found many lines whose assignment disagreed with the rest of the dataset.To check the assignments, these lines were compared to lines present in the Ames-2021 29 line list.We propose corrections for 148 misassigned lines, 18 of which are reported with the updated assignment within experimental accuracy for the first time.These lines are listed in Table 2.

| Dataset of empirical energy levels
The 14,101 unique rovibrational transitions gathered yielded 6318 empirical rovibrational energy levels for 13 C 16 O 2 .Table A1 in the Appendix summarizes the vibrational bands which could be determined based on the set of measured rovibrational transitions.
Figure 3 illustrates the distribution of the transitions used for the determination of each energy level.As usual, this is a heavy-tailed degree distribution.Figure 4 illustrates the uncertainty distribution of our empirical energy levels.While the overall uncertainty is satisfactory, more high-resolution measurements are needed to eliminate the outliers and to expand the data coverage.
T A B L E 2 Reassignments of lines suggested in this study.The J value and parity given for the lower state.† These are duplicate lines.

| Beat frequency comparisons
While we were unable to extend our SN with the available beatfrequency measurements, we were able to compare the originally measured frequencies with frequencies computed using our empirical (MARVEL) energy levels.Table 3

| Comparison with line lists
A comparison of our data with available line lists shows good overall agreement with both the CDSD-2019 30 and the Ames-2021 29 data.It should also be noted that our data show better agreement with CDSD-2019 than with Ames-2021.Figure 5 shows systematic differences between the Ames-2021 and the CDSD-2019 data and illustrates the good agreement between our dataset and CDSD-2019.The outliers highlighted in the figure were determined using only a single transition; we tried to find more experimental data in these regions but none was found.Figure 6 compares the energy-level coverage, as a function of J, between our dataset and that of Ames-2021.Evidently, we need a lot more experimental data for the region above the 10,000 cm À1 region.

| SUMMARY
This paper describes a comprehensive analysis of the highresolution, rovibrational spectroscopy literature available for the second most abundant isotopologue of carbon dioxide, 13  F I G U R E 4 Uncertainty distribution of the empirical rovibrational energy levels of this study.Our average uncertainty is 0.0024 cm À1 , with 179 outliers having an uncertainty above 0.01 cm À1 .
T A B L E 3 Comparison of beat-frequency measurements from Reference 33, in MHz, with results of the present study.The validated data cover the wavenumber range 579-13,735 cm À1 .
Our detailed analysis reveals (a) areas in the spectrum where there is a lack of data, (b) numerous inconsistencies in the vibrational assignment of some of the measured transitions (we report 18 possible errors for the first time), and (c) conflicting labels of higher-energy levels between experimental data 54,58 and theoretical line lists.
A comparison between our energy levels and those of Ames-2021 29 and CDSD-2019 30 shows significantly better agreement with CDSD-2019, highlighting the importance of fitting theoretical models using available experimental data.Further research is being carried out in our groups to analyze more isotopologues of CO 2 .Work is also underway to explore methods of including the beat frequency data into the MARVEL analysis procedure; initial attempts to do this show that there are numerical difficulties with ill-conditioned matrices which will need to be overcome before this can be done usefully.

APPENDIX
T A B L E A 1 Vibrational bands of 13 erences 21 and 22 for examples.Our critical evaluation of the existing empirical line positions also helps to identify spectral regions where more high-resolution experiments are needed.
ered, only 14,101 are unique.In fact, 10,665 transitions are measured only once, while there are 5 and 32 transitions measured 10 and 9 times, respectively.The principal component of our final SN contains 20,641 transitions, the other transitions form floating components.The experimentally measured transitions involve 6520 states; summarizes the results obtained and contains a comparison with the beat frequencies of Reference 33.As seen there, (a) in most cases MARVEL can reproduce the measured frequencies within the MARVEL uncertainties, computed from the uncertainties of the empirical energy levels, and (b) the MARVEL uncertainties are significantly larger than the uncertainties of the beat-frequency measurements.These observations show the importance and the utility of including beat frequency data in a spectroscopic network.

C 16 O 2 .
All the assigned transitions, altogether from 60 literature sources, have been extracted and verified using appropriate selection rules, F I G U R E 3 Number of transitions used for determining the energies of each state.

a
The beat frequency is given as the frequency of the sequence transition À the frequency of the reference transition.b MP = MARVEL predicted frequency.c MU = Uncertainty of the MARVEL predicted frequency.d The difference between the measured and the MARVEL-predicted frequency.theMARVEL algorithm, and a comparative analysis against available line lists.These extensive comparisons were performed to ensure the validity of the labelling of the states involved in the measured transitions.The conventions in use for the labelling of CO 2 lines were briefly reviewed as in several cases a conversion had to be performed.