Corresponding author: S. Kamata, Department of Earth and Planetary Science, The University of Tokyo, Hongo, Bunkyo-ku, Tokyo 113-0033, Japan. (email@example.com)
 Diverse geological characteristics found for the three major lunar provinces (i.e., the Feldspathic Highlands Terrane (FHT), the South Pole-Aitken Terrane (SPAT), and the Procerallum KREEP Terrane (PKT)) strongly suggest their distinctly different thermal histories. Quantitative differences among these provinces in their early thermal histories and crustal radioactive element concentrations, however, are highly unknown. One of the few observables that retain a record of the ancient lunar thermal structure is the viscoelastic state of impact basins. This study investigates the long-term evolution of basin structures using global lunar gravity field data obtained by Kaguya tracking and derives constraints for (1) the paleo-thermal state of impact basins and for (2) crustal column-averaged radioactive element concentrations for each province. Our calculation results indicate that impact basins in the central anorthositic region of the FHT (i.e., the FHT-An) require a very cold interior ( dT ∕ dr ≤ 20 K km − 1 on the surface). This result strongly suggests that the deep portion of the thick farside highlands crust is highly depleted in radioactive elements (Th ≤ 0.5 ppm), indicating that the Th-rich SPA basin floor crust is clearly different from the lower crust underneath the FHT-An and cannot be accounted for by simple exposure of the lower crust. Our analysis also indicates that the observed basin structure allows as high as ∼ 6 ppm of column-averaged Th concentration in the crust inside the PKT. These results indicate that radioactive element concentrations deep in the crust probably vary greatly region by region, similarly to those observed on the surface.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
 Lunar global remote-sensing observations have revealed that the geology and geochemistry on the lunar surface vary greatly region by region. Based on FeO and thorium (Th) concentrations on the surface, Jolliff et al. [ 2000] proposed three major geological provinces: the Procellarum KREEP Terrane (PKT), the Feldspathic Highlands Terrane (FHT), and the South Pole-Aitken Terrane (SPAT). The PKT is enriched in KREEP (potassium, rare earth elements, and phosphorous) material and has large FeO and Th concentrations (i.e., FeO > 10 wt% and Th > 5 ppm). Large FeO concentrations correspond to abundant mare volcanic deposits inside the PKT. In contrast, the FHT is characterized by low FeO and Th concentrations. The FHT is further classified into the FHT-An (the central anorthositic region) and the FHT-O (the outer region), and the former region is highly depleted in FeO and Th (i.e., FeO < 10 wt% and Th < 1 ppm). The SPAT corresponds to the South Pole-Aitken basin and has levels of FeO and Th concentrations between those of the other two regions [e.g., Prettyman et al., 2006]. Since concentrations of thorium, uranium, and potassium on the lunar surface and those for lunar samples are correlated to each other very well [e.g., Korotev, 1998; Prettyman et al., 2006], the concentrations of these radioactive elements are often represented by Th concentration alone.
 The large regional variation in radioactive element concentrations on the lunar surface has been discussed in relation to large-scale impacts and crustal thickness variation [e.g., Wieczorek et al., 2006; Kobayashi et al., 2012]. The causes for this large regional variation, however, are still controversial. For example, Haskin [ 1998] suggests that the horizontal distribution of Th on the FHT and on the SPAT may be controlled mainly by the distribution of Th-rich ejecta from the Imbrium basin. However, the SPAT may be simply an exposure of the lower crust; the variation in Th concentration on the farside surface may reflect the vertical stratification of Th in the lunar farside crust [e.g., Wieczorek et al., 2006]. The two above-mentioned contrasting models for Th distribution would lead to significantly different total crustal Th concentrations. Because the major heat source for long-term planetary evolution is the heat production due to the decay of long-lived radioactive elements [e.g., Turcotte and Schubert, 2002; Breuer and Moore, 2007], large differences in the crustal radioactive element concentrations may lead to significantly different lunar thermal history models. Further discussions on the nature of Th regional heterogeneity require more information on subsurface Th concentration, particularly that deep in the crust.
 An important observational constraint on the concentrations of heat-producing elements in the deep interior can be obtained from heat flux measurements. Heat flux measurements, however, were conducted at only two Apollo landing sites (Langseth et al. 1976). Both of these Apollo landing sites are inside or close to the PKT, and no heat flux measurements were conducted on or near the FHT-An and on the SPAT.
 The geologic record that directly reflects the thermal state of the upper part of the early Moon are large-scale topography and gravity anomalies [e.g., Solomon et al., 1982]. Most large-scale topographies and major sources for free-air and Bouguer anomalies on the Moon are impact basins. Impact basins are meteoritic craters larger than 300 km in diameter and are estimated to have been formed earlier than 3.7 Gyr ago [e.g., Wilhelms, 1987; Stöffler and Ryder, 2001]. Such large-scale topographies deform viscoelastically in geologically long timescales. The degree of this long-term deformation is determined by the effective viscosity of the upper crust and the upper mantle [e.g., Solomon et al., 1982], and viscosities for silicates depend chiefly on temperature [e.g., Karato, 2007]. Thus, the deformation state of impact basins is an important key for understanding the early thermal state of the upper part of the Moon.
 Previous studies have suggested that viscous relaxation had a major impact on the deformation of lunar impact basins and have estimated the lunar viscosity or the temperature at the base of the crust [e.g., Solomon et al., 1982; Arkani-Hamed, 1998; Mohit and Phillips, 2006]. Solomon et al. [ 1982] calculated the viscous deformation of impact basins assuming a steady state, uniform viscosity (i.e., 1024 Pa s) crust overlying an inviscid (i.e., 0 Pa s) mantle and showed that the viscously relaxed topography of the Orientale basin on the western limb of the Moon is similar to the observed degraded topography of the mare Tranquillitatis on the nearside. Arkani-Hamed [ 1998] analyzed the crustal structures for “mascon” basins on the nearside, such as Imbrium and Serenitatis, using a steady state, uniform Maxwell viscoelastic model, and estimated that the lunar mantle viscosity had been higher than 6 × 1024 Pa s between 3.6 and 3.0 Gyr ago. Mohit and Phillips [ 2006] calculated the viscoelastic deformation of impact basins using a steady state, six-layered Maxwell viscoelastic model and showed that the viscously relaxed topography of the Mendel-Rydberg basin on the western limb of the Moon is similar to the observed degraded topography of the Lomonosov-Fleming basin on the eastern limb of the Moon when a Moho (i.e., the boundary between the crust and mantle) temperature of ∼ 1350 K is assumed. Since the viscous and viscoelastic deformations depend strongly on crustal thickness [e.g., Solomon et al., 1982; Mohit and Phillips, 2006], high-resolution crustal thickness modeling is required for obtaining quantitative thermal constraints based on analyses of the basin deformation state. Estimates of lunar farside crustal structures, however, remained largely uncertain due to the lack of direct observation of the farside gravity field until just a few years ago. Consequently, previous studies mainly analyzed the crustal structures for impact basins on the nearside and on the limb of the Moon; farside basin structures have been largely unexplored.
 The accuracy of lunar farside gravity field data has been improved greatly by the Kaguya gravity field measurements using relay sub-satellites [e.g., Namiki et al., 2009]. Many farside basins have been found to have a narrow, positive free-air gravity anomaly in their centers surrounded by a broad negative anomaly. The surrounding negative free-air anomaly suggests a flat Moho inside the basin or an annulus of thickened crust. Either model indicates that crustal structures around these basins are not in isostasy and that the lunar farside had been much colder than the nearside. Using a steady state viscous fluid model by Solomon et al. [ 1982], Namiki et al. [ 2009] estimated that the farside Moho temperature was very low (e.g., 700–800 K) at basin formation ages.
 Although lunar interior models used in previous studies provide a good first-order approximation, many details of the deformation processes of impact basins are not fully incorporated in their calculations. More specifically, the elastic properties of the lithosphere, the vertical variation in the viscosity, and temporal change in the viscoelastic properties of the Moon due to the thermal evolution during basin deformation are not considered directly in the calculations. These factors may have played an important role in basin deformation [e.g., Turcotte et al., 1981; Zhong and Zuber, 2000; Kamata et al., 2012]. When these effects are considered, previously obtained constraints may change substantially, but such changes have not been quantitatively investigated.
 The purpose of this study is to obtain quantitatively accurate constraints for the early thermal state and subsurface radioactive element concentrations based on detailed viscoelastic deformation calculations and recent Kaguya geodetic data. The next section describes the procedures to achieve this aim. In section 3, we show the calculation results for viscoelastic deformation on a thermally evolving Moon model. Then, in section 4, we constrain the thermal state around the basin formation age. Finally, in section 5, we derive upper limits for column-averaged radioactive element concentrations in the lunar crust.
2 The Viscoelastic Deformation of Impact Basins and the Early Thermal State of the Moon
 In this study, we investigate the thermal state approximate to the period of the basin formation age so that we can reproduce present-day crustal structures around impact basins based on calculation of viscoelastic deformation. Large mantle uplifts underneath impact basins are estimated from positive free-air and Bouguer anomalies [e.g., Bratt et al., 1985; Neumann et al., 1996; Namiki et al., 2009; Ishihara et al., 2009]. These mantle uplifts relax viscously, and their heights decrease with time. The degree of this viscous relaxation is mainly controlled by temperature, as discussed above. Thus, if we can estimate the crustal structure immediately after the impact accurately, we can extract important information for the interior thermal state of the early Moon from the difference between the initial (i.e., immediately after the impact) and terminal (i.e., present-day) crustal structure.
 The “true” initial crustal structure immediately after a basin-forming impact is, however, very difficult to estimate accurately. The ratio of excavation depth to diameter for large impact basins may depend on many factors, such as the velocity and angle of impact and projectile density [e.g., Schultz and Anderson, 1996]. Consequently, we have to consider a variety of cases between the two end-members of deformation states for an impact basin with a small present-day mantle uplift. One end-member is that the basin had a large initial mantle uplift and had experienced substantial deformation. The other is that the basin had a small initial mantle uplift and had not experienced substantial deformation. The former and the latter correspond to hot and cold thermal states, respectively. Thus, quantitative estimation of the thermal structure at the time of basin formation is difficult.
 Even if we do not know the exact initial crustal structure, we can still derive a significant constraint on the thermal state for an impact basin with a large mantle uplift. This is because an impact basin with a large present-day mantle uplift requires a small degree of deformation. If we assume a very large degree of deformation (i.e., very hot interior), a very large mantle uplift would be required to reproduce the present-day crustal structure. This may lead to an unrealistic initial crustal structure; the Moho would go above the surface. Figure 1 is a schematic diagram for the initial minimum crustal thickness as a function of the initial thermal state. Such incipient thermal structures that would require initially “negative” crustal thicknesses need to be ruled out. Thermal conditions that can reproduce the present-day crustal structure with non-negative initial crustal thickness are deemed acceptable. We take the hottest condition from the accepted conditions as the upper limit for the lunar interior around the impact basin approximate to the time of its formation. For each basin, we estimate upper limits for both the initial surface temperature gradient and the initial Moho temperature. These two parameters are related to each other and are fundamental for describing the thermal structure of the upper part of planetary bodies. Based on the obtained thermal constraints, we further estimate the upper limit for column-averaged radioactive element concentration for each geological province (i.e., the FHT, SPAT, and PKT).
 It is noted that a lower estimate for the thermal gradient approximate to the basin formation age and that for column-averaged crustal radioactive element concentrations cannot be obtained with our analysis. Since a colder interior model gives a smaller degree of long-term viscoelastic deformation, an extremely cold interior condition requires an initial crustal structure that is almost the same as the present-day crustal structure. In such a case, the initial minimum crustal thickness is always positive, and we cannot necessarily rule out such a condition. If we assume an initial basin structure before viscoelastic deformation, we could estimate a “most likely” temperature structure and/or history. In this study, however, we attempt to obtain conservative observational constraints derived directly from the presence of the positive free-air anomalies without using major model assumptions on basin-forming events. In order to obtain conservative constraints for the early thermal state and for crustal radioactive element concentrations, we use a relatively “stiff” Moon model. See section A for effects on constraints for the early thermal state and those for crustal radioactive element concentrations due to different model assumptions and different parameter values.
 In the following section, we describe the crustal thickness model and the impact basins analyzed in this study. We then describe the procedure for deriving the upper limit for column-averaged crustal radioactive element concentrations in detail.
2.1 Present-day Crustal Thickness Model
 We use a crustal thickness model based on the STM-359_grid-03 topography model (an updated version of Araki et al. [ 2009]) and the SGM150j gravity field model (an updated version of Goossens et al. [ 2011]). We assume the crustal and mantle densities of 2820 and 3320 kg m − 3, respectively, which are consistent with density models used in viscoelastic deformation calculations. We apply a minimum-amplitude-type downward continuation filter, which is constrained to be 0.5 at degree 46. Other parameters are the same as those used by Ishihara et al. [ 2009].
2.2 Analyzed impact basins
 Table 1 lists lunar impact basins analyzed in this study. Namiki et al. [ 2009] classified major impact basins into three different types (i.e., Type I, Type II, and primary mascon basins) based on the characteristics of gravity anomalies using the SGM90d, which is expanded up to degree and order 90. Type I basins exhibit sharp central peaks of free-air anomaly with height approximately equivalent to that of Bouguer anomaly. In contrast, the height of central peaks in free-air anomaly for Type II and primary mascon basins is significantly smaller than that in Bouguer anomaly. The difference in Type II and primary mascon basins is the sharpness of the central gravity high; the former has a sharp (i.e., narrow) central peak. Several basins, such as Coulomb-Sarton, are not classified by Namiki et al. [ 2009] because higher spatial resolution is necessary. Using the SGM100h, which is expanded up to degree and order 100, Matsumoto et al. [ 2010] reexamined this classification for impact basins on the farside and on the limb, and further propose an additional type, a “nonmascon basin”, which has a small central Bouguer high. As discussed above, a large present-day mantle uplift underneath a basin is necessary in our analysis. Therefore, nonmascon basins are excluded from our analysis.
1 Superscripts indicate references for the center location and the main rim radius. (a) Wilhelms (1987), (b) Pike and Spudis (1987), and (c) Neumann et al. [ 1996], respectively.
2 The PKT, the FHT, and the SPAT indicate the Procellarum KREEP Terrane, the Feldspathic Highlands Terrane (An, the central anorthositic region; O, the outer region), and the South Pole-Aitken Terrane, respectively. The “N”s and “F”s in parentheses indicate the nearside and farside, respectively.
3 I, II, and PM indicate Type I, Type II, and primary mascon basins, respectively.
4 Azimuthally averaged crustal thickness at 1.5–2.5 the basin main rim radius.
† Thermal conditions, which always give initial negative crustal thicknesses, are not obtained in our analysis. See text for details.
 In addition to impact basins previously classified as Type I, Type II, or primary mascon basins, we analyze two nearside basins, Schiller-Zucchius (reported as “unclassified” by Namiki et al. [ 2009]) and Grimaldi (not classified). We classified these two basins as Type II basins (i.e., the central peak of the Bouguer anomaly is significantly higher than that of the free-air anomaly, suggesting a large mantle uplift) based on the SGM150j gravity field model.
 Although the Moscoviense basin has a large central Bouguer high ( ∼ 800 mGal) and is classified as Type II [Matsumoto et al., 2010], we did not analyze this basin because crustal thickness at its center is assumed to be zero in our crustal thickness estimation as the reference [cf. Ishihara et al., 2009]. Also, the South Pole-Aitken (SPA) basin is not analyzed in this study. Since a typical timescale for thermal diffusion over 1000 km of distance is ∼ 1010 year, lateral variation of temperature inside the basin, whose diameter is ∼ 2500 km [e.g., Wilhelms, 1987], may be large and have large effects on long-term basin deformation. Our thermal evolution calculations, however, assume 1-D thermal structure, and the horizontal variation in the thermal structure is not considered (section 3). Thus, further investigations using 3-D thermal models are required for detailed understanding of the long-term deformation of SPA.
2.3 Approach for Deriving the Upper Limit for Column-Averaged Crustal Radioactive Element Concentration
 To derive upper limits for column-averaged crustal radioactive element concentration, we compare thermal evolution calculation results (i.e., the time evolution of the surface temperature gradient) and thermal constraints (i.e., the upper limit for the surface temperature gradient) approximate to the basin formation age. The absolute formation age is estimated for a few impact basins. In particular, the formation age for Orientale, the youngest lunar impact basin, is estimated to be older than 3.72 Gyr [e.g., Stöffler et al., 2006]. Thus, most impact basins would have been formed between this timing and the time of the formation of the crust. The age of the anorthositic crustal formation (i.e., the solidification of the lunar magma ocean (LMO)) is estimated to be 4.54–4.42 Gyr (Elkins-Tanton et al. 2011). These ages lead to an upper estimate for the time between the basin formation times and anorthosite crust formation of 0.82 Gyr. Figure 2 is a schematic diagram showing the time evolution of the interior thermal state. Since the lunar interior cools with time, the surface temperature gradient decreases with time (section A). Consequently, the surface temperature gradient at 0.82 Gyr (after the solidification of the LMO) needs to be smaller than the initial surface temperature gradient for an impact basin. Since the surface temperature gradient increases with increasing radioactive element concentration in the crust, we can obtain the upper limit for column-averaged crustal radioactive element concentrations based on thermal constraints.
3 Numerical Calculations
 First, we calculate the thermal evolution of the upper part of the Moon under different crustal conditions by varying many parameters, such as its pre-impact thickness, radioactive elements concentrations, and thermal conductivities. Second, using the thermal evolution calculation results and experimentally determined flow laws for silicates, we obtain time-dependent viscosity profiles. Third, for each time-dependent viscosity model, we calculate viscoelastic deformation over billions of years induced by a wide variety of loading conditions.
3.1 Calculation Conditions
 We use the spherical polar coordinate system in order to take the curvature effect into account. We assume that the Moon consists of an anorthositic crust and a peridotite mantle underneath. We use continuously varying radial (i.e., vertical) profiles for density, temperature, and viscosity. For other parameters, such as elastic moduli, thermal conductivity, and heat capacity, we assume a uniform value for each layer (i.e., the crust and mantle), as listed in Table 2.
 We assume a 1740 km lunar radius and a 500 kg m − 3 density jump at the lunar Moho. Different crustal thickness Hcrust values between 30 and 90 km are used in the calculations. We use density profiles that satisfy the Adams-Williamson condition:
Here ρ is density, r is radius, g is gravitational acceleration, and κ is bulk modulus, respectively. We assume g ∼ 1.62 m s − 2 on the lunar surface. Figure 3 shows the density profile for the model for Hcrust = 50 km. Crustal and upper mantle densities are ∼ 2820 and ∼ 3320 kg m − 3, respectively.These values are similar to those used in recent lunar crustal thickness models [e.g., Zuber et al., 1994; Ishihara et al., 2009].
3.1.1 Thermal Evolution
 Thermal conduction is the dominant heat transport mechanism for the upper most layer of planetary bodies [e.g., Turcotte and Schubert, 2002]. In order to obtain a first-order estimate of the temperature structure of the upper part of the Moon, we solve the one-dimensional equation of heat conduction:
where cp is specific heat, T is temperature, t is time, k is thermal conductivity, and Q is heat production rate, respectively.
 The boundary condition at the surface is T = 250 K, a typical temperature immediately below the surface regolith layer (Langseth et al. 1976). The boundary condition at the lunar center is given by q = 0 W m − 2. Here q is heat flux and given by
 We consider the time as about 4.5 Gyr ago, when major crystallization occurred from the lunar magma ocean [e.g., Shearer et al., 2006], as the initial condition (i.e., t = 0). The temperature profile at t = 0 for the model for Hcrust = 50 km is shown in Figure 3. Here, we use the pressure-dependent (i.e., depth-dependent) solidus of peridotite (Vlaar et al. 1994) for the mantle. For the crust, we assume an initially isothermal crust in order to reduce the number of calculation parameters. More specifically, the crustal temperature from immediately below the surface to the Moho is assumed to be the solidus temperature of peridotite at the Moho. Different thermal conditions at t = 0 do not change our conclusion significantly (section A1).
 The melting curves in the crust and mantle are assumed to be given by the pressure-dependent solidi of anorthite (Goldsmith, 1980) and that of peridotite (Vlaar et al. 1994), respectively. When the calculated temperature exceeds the melting curve, the temperature is reset to the solidus, and the excess heat is added to the latent heat (Reynolds et al. 1966).
 The effective thermal conductivity k in the lunar crust is one of the most important parameters for determining the temperature structure of the upper part of the Moon. The thermal conductivity of anorthosite is about 1.5–2.0 W m − 1 K − 1 [e.g., Turcotte and Schubert, 2002], and these values are consistent with lunar crustal thermal conductivities estimated from Apollo measurements [e.g., Keihm and Langseth, 1977]. In this study, we calculate thermal evolution models using k = 1.5 and 2.0 W m − 1 K − 1 for crustal thermal conductivities (Table 2). For the mantle, we use k = 3.0 W m − 1 K − 1. In our calculations, different k values for the crust are used while only a single k value for the mantle is used. Consequently, when we discuss calculation conditions in the following, k indicates thermal conductivity in the crust unless otherwise noted.
Table 2. Model Parameters
W m − 1 K − 1
kJ kg − 1 K − 1
 We consider radiogenic heating due to the decay of long-lived radioisotopes, such as 232Th, 238U, 235U, and 40K, for the heat source. Thus, heat production rate Q is given by
where CX is terminal (i.e., present-day) concentration for radioisotope X, QX is heat production rates per unit mass, and is half life, respectively. Here the unit of t and τXhalf is year. It is noted that t is not time before present but time after 4.5 Gyr ago. We assume present-day thorium concentrations CTh = 0.1–5.0 ppm for the crust and CTh = 25 ppb for the mantle (Warren, 2005). For each value of CTh, we calculate the concentrations CU of uranium and the CK of potassium using linear relations to CTh. In this study, we use linear relations between concentrations for Th, U, and K that are determined based on Kaguya data [Kobayashi et al., 2010; Yamashita et al., 2010] for crustal values. For the mantle, we use those based on lunar sample (including mare basalts) analyses (Korotev, 1998). In our calculations, different CTh values for the crust are used while only a single CTh value for the mantle is used. Therefore, when we discuss calculation conditions in the following, CTh indicates that in the crust unless otherwise noted. The values of unit-mass heat production rates, half lives, and isotopic ratios are taken from Turcotte and Schubert [ 2002].
 The rheological parameters used in this study are listed in Table 3. In this study, we assume a “dry” Moon. A flow law of dry silicates is written as
where is strain rate, A is pre-exponential factor, σ is stress, d is grain size, E ∗ is activation energy, Rg is the gas constant, η is effective viscosity, and m1 and m2 are constants, respectively [e.g., Karato, 2007]. Depending on temperature, grain size, and stress, silicates deform through different deformation mechanisms: Dislocation creep becomes dominant under high-stress conditions, while diffusional creep becomes dominant under low-stress conditions [e.g., Karato, 2007]. Similar to the model used by Nimmo and Watters [ 2004], we use the rheology of dry anorthite in the dislocation creep regime for the crust and the rheology of dry olivine in the diffusion creep regime for the mantle. In order to calculate crustal viscosity, we use stress σ = 20 MPa, which is a typical stress in the crust [e.g., Wieczorek and Phillips, 1999; Mohit and Phillips, 2006]. To calculate mantle viscosity, we assume the grain size d = 1 mm. The use of different crustal stresses and mantle grain sizes does not change our conclusion significantly (section A2).
 The viscosity in our numerical calculations is limited between 1019 and 1029 Pa s in order to keep time steps between 1 and 100 yr, which enable us to finish calculation within a practical calculation time. This restriction of viscosity does not affect the terminal amplitude of topography calculated in our calculations since deformation timescales considered in this study ( 106– 109 yr) are much shorter than the corresponding Maxwell time for the maximum viscosity ( 1010 yr) and are much longer than that for the minimum viscosity ( 100 yr) [e.g., Zhong and Zuber, 2000; Mohit and Phillips, 2006].
 It is noted that our thermal evolution model is purely conductive. During the early lunar evolution, the thermal state may be controlled by solid-state mantle convection, and effective heat transportation due to convection may significantly reduce temperature for the deep mantle [e.g., Cassen et al., 1979]. The thermal state of the upper part of the Moon, however, would be mainly controlled by heat conduction, and basin deformation is chiefly controlled by the viscosity structure of the upper part of the Moon. Consequently, our conductive model would give a good first-order approximation for the thermal state and would be sufficient for our analysis. In order to obtain a first-order assessment for the effect of thermal convection on the temperature structure, we calculated the thermal evolution assuming that the effective mantle thermal conductivity becomes 10 times the nominal conductivity when the temperature exceeds 1273 K (Toksöz and Johnston, 1974). We found that although deep mantle temperature is significantly lower than that obtained by our reference model, temperature structure near the surface are almost the same. Consequently, the use of thermal convection models would not change our conclusions significantly.
3.1.2 Viscoelastic Deformation
 When the viscosities of the crust and mantle do not depend on time and can be described by using a small number of layers, viscoelastic deformation can be described as a superposition of deformation modes, which decay exponentially with respect to time [e.g., Peltier, 1974]. Hence, the normal-mode method is commonly used for analyses of “intermediate-timescale” (i.e., ∼ 104 year) deformation on the Earth, such as postglacial rebound [e.g., Peltier and Andrews, 1976]. However, since we consider geologically long timescales (i.e., ∼ 109 year), the effect of thermal evolution (i.e., planetary cooling) during deformation needs to be incorporated into the calculation. In addition, in order to calculate surface temperature gradient, we use interior models with a large number of layers. In such cases, the normal-mode method cannot be used, and the initial-value method is required (Kamata et al. 2012). Kamata et al. [ 2012] develop a computationally efficient spectral scheme with second-order precision in time for a spherically symmetric Maxwell body with a time-dependent viscosity. In this study, we use this scheme and calculate long-term deformation under a wide variety of parameter conditions. The governing equations are as follows [e.g., Takeuchi and Saito, 1972; Peltier, 1974]:
where ∇ i is spatial differentiation in direction of i, σ is stress tensor, e is strain tensor, ϕ is gravitational potential, P is hydrostatic pressure, ρ is density, κ is bulk modulus, μ is shear modulus, η is viscosity, δ is Kronecker delta, and G is the gravitational constant, respectively. Here we use the summation convention. The equation system is linearized and expanded into spherical harmonics. See Kamata et al. [ 2012] for details of the formulation.
 We consider a surface load and a Moho load of harmonic degree n = 2–70. The corresponding wavelengths are about 4500–150 km. We consider loading ages (i.e., basin formation ages) tform of 100–800 Myr after the solidification of the magma ocean (i.e., tform = 4.4–3.7 Gyr ago). Since lunar impact basins are estimated to be older than 3.7 Gyr (Stöffler and Ryder, 2001), our calculation range covers the formation ages of most major impact basins.
3.2 The Time Evolution of Topographies
 Figure 4 shows typical examples for the time evolution of surface and Moho topographies. Here we show results for a surface loading case and those for a Moho loading case with a harmonic degree (n = 20), crustal thickness (Hcrust), and thermal properties (CTh,k, and tform). Here note that surface topography is not always normalized by initial surface topography or that Moho amplitude is not always normalized by initial Moho amplitude. Rather, topographies are normalized with the initial topography at the boundary with a load. For example, for surface loading cases, both surface and Moho topographies are normalized with the initial surface topographic amplitude. For Moho loading case, in contrast, both topographies are normalized with the initial Moho topographic amplitude. When the crust is in isostasy, (the surface topography)/(the Moho topography) is about − 0.177. This figure illustrates that there are two major deformation stages around 103– 104 year and 106– 107 year. Throughout the entire period after the loading, the undulation amplitude of the boundary with a load decreases monotonically with time. In contrast, the undulation amplitude of the boundary without a load (i.e., the surface for Moho loading cases and the Moho for surface loading cases), which is initially flat, increases during the first stage. Then the boundary without a load moves back toward its initial flat state during the latter stage. Consequently, these boundaries deform in the same direction in the former mode and in the opposite directions in the latter mode. These results are quantitatively consistent with previous studies using models with two density boundaries [e.g., Zhong, 1997; Mohit and Phillips, 2006; Kamata et al., 2012].
 In order to quantify the effect of thermal evolution during deformation, we calculate the time evolution of surface and Moho topographies using a time-independent viscosity model. In the time-independent model, the viscosity profile is fixed at its initial state throughout the calculation. Results are shown in Figure 4, illustrating that the thermal evolution affects deformation only for ≥ 108 year after the loading. The terminal topographic amplitudes normalized by initial loading amplitudes are the most important values in this study since these values are used for recovering the initial topographies (section 4). Differences in the normalized terminal amplitudes between the time-dependent and time-independent models are up to ∼ 10%, indicating that the thermal evolution during deformation has a non-negligible effect on the long-term deformation of lunar impact basins.
 This result, however, is significantly different from previous viscoelastic deformation calculation results assuming a time-dependent thermal structure under Q = 0 (i.e., no heat production) (Kamata et al. 2012). Without any heat production, no deformation occurs after the time of 10 7 year, and the difference between the terminal topographic amplitude for a time-dependent model and that for a time-independent model can be up to ∼ 50%. The large differences between calculation results obtained in this study and those by Kamata et al. [ 2012] come from radiogenic heating in the crust. As discussed in section A1, the upper thermal structure assumed in this study does not change rapidly and is sustained for a long time due to radiogenic heat production in the crust. Because of this, the terminal topographic amplitude for the time-independent (i.e., steady state) thermal model does not differ significantly ( < 10%) from that for the time-dependent (i.e., thermally evolving) thermal model. Thus, calculation results obtained in this study indicate that a steady state thermal model would give a good first-order estimate on the terminal topographic amplitudes when crustal heating due to radiogenic heating is significant. These contrasting results between models with and without radiogenic heating further demonstrate the importance of long-term crustal heating on the deformation of lunar impact basins.
 Figure 5 summarizes the normalized terminal Moho amplitude for Moho loading cases (i.e., the terminal/initial ratio for the height of mantle uplift). Note that the terminal Moho amplitude depends not only on the initial thermal state but also on the cooling rate. The effect of cooling rate, however, is only minor, as noted above. As a result, although the terminal Moho amplitude does not decrease with increasing initial temperature monotonically, the initial thermal state strongly controls the terminal Moho amplitude. As clearly illustrated in Figure 5, a very hot interior causes substantial deformation and cannot maintain mantle uplifts for billions of years. For example, a 70 km crust requires the initial surface temperature gradient to be less than ∼ 20 K km − 1 to maintain mantle uplifts. Similar results are found for different harmonic degrees (n). As shown in section 4.2, this value is the upper limit for the surface temperature gradient around impact basins with a large central mantle uplift on the thick farside crust, such as Freundlich-Sharonov and Hertzsprung. Furthermore, Figure 5 illustrates that initial Moho temperatures of less than ∼ 1200 K are required for maintaining mantle uplifts under all crustal thickness conditions. Consequently, the initial Moho temperature of ∼ 1200 K is the upper limit around impact basins with a large central mantle uplift both on the nearside and the farside.
4 Thermal Constraints Approximate to the Basin Formation Age
 Here, we estimate surface and Moho topographies approximate to the basin formation age using Kaguya data and viscoelastic deformation calculation results. In other words, we “recover” initial crustal structures for major lunar impact basins that can reproduce their present state under different thermal conditions. Then we measure the minimum crustal thickness for these recovered initial crustal structures. Note that the term “initial” indicates the timing of the basin formation (i.e., tform = 100–800 Myr) and not the starting time of the thermal evolution calculation for the Moon (i.e., t = 0; 4.5 Gyr ago). In the following, the term “initial” indicates the timing of the basin formation unless otherwise noted. Based on a non-negative crustal thickness condition, we constrain the surface temperature gradient and the Moho temperature approximate to the basin formation ages. Since we only consider “long-term” deformation, our “initial topographies” are those created after rapid motions (i.e., < 104 year from the basin-forming impact), such as dynamic rebound and brittle deformation.
4.1 Estimating the Initial Crustal Structure
 The procedure for estimating the initial surface and Moho topographies of a basin consists of the following six steps. (1) First, we create the azimuthally averaged cross section of the present-day crustal structure within 2.5 times the basin main rim radius for each basin. The basin main rim radii are listed in Table 1. Figure 6 shows the cross section of Hertzsprung. (2) We define “a reference horizontal distance,” ℓ = ℓref, which is 1.5–2.5 times the basin main rim radius. Surface and Moho positions at this ℓ = ℓref (denoted as r = rS and rM, respectively) are assumed to be their “unperturbed” positions. Here r is the radial distance from the lunar center. (3) We calculate the reference crustal thickness from these reference values; Hcrust = rS − rM. Table 1 lists the mean value and the standard deviation of Hcrust determined from different values of ℓref. (4) We expand surface topography and Moho topography within ℓ ≤ ℓref into spherical harmonics of degrees 2–70. Since we consider azimuthally averaged topographies, only zonal components of spherical harmonics (i.e., harmonic order of zero) are used. The obtained coefficients give the terminal amplitudes of surface and Moho topographies for each harmonic degree. (5) Using the viscoelastic calculation results (i.e., ratios of initial to terminal amplitudes), we calculate the initial surface and Moho coefficients for each harmonic degree. (6) The initial surface and Moho topographies are obtained from the superposition of spherical harmonics with initial amplitudes. The above procedures are repeated for values of ℓref spanning the range 1.5 –2.5, and the mean and standard deviation of recovered surface and Moho topographies are calculated.
4.2 Analysis Results
 Figure 7 shows two examples of the recovered initial crustal structures for Hertzsprung, calculated under two different values for CTh. Other calculation conditions are the same. The errors in recovered topographies result from the error in the reference (i.e., surrounding) crustal thickness Hcrust. For both of the calculation conditions, the mantle uplift at the initial state is higher than that at the terminal (i.e., present-day) state (compare with Figure 6). While the initial Moho for Figure 7a is below the initial surface, that for Figure 7b near the basin center goes above the initial surface. In other words, the initial minimum crustal thickness for Figure 7b is negative. Such a high mantle uplift is unrealistic, as discussed in section 2; a too soft (i.e., too hot) interior is assumed in Figure 7b. Consequently, the initial thermal structure of Hertzsprung needs to be colder than that assumed for Figure 7b. This provides an upper limit for the thermal gradient or CTh around Hertzsprung.
 Figure 8 shows initial minimum crustal thickness for Hertzsprung as a function of (a) initial surface temperature gradient (dT ∕ dr)S and (b) initial Moho temperature TM, respectively. The errors derive from those in the surrounding crustal thickness, Hcrust (Table 1). A hotter interior leads to an initially larger mantle uplift and a smaller initial minimum crustal thickness. When (dT ∕ dr)S > 24 K km − 1, initial minimum crustal thicknesses are always negative. Thus, (dT ∕ dr)S = 24 K km − 1 is the upper limit for the initial surface temperature gradient for Hertzsprung. However, (dT ∕ dr)S < 24 K km − 1 does not always give a positive crustal thickness within the error bounds. A positive initial minimum crustal thickness is guaranteed only when (dT ∕ dr)S < 20 K km − 1. Thus, we take 20–24 K km − 1 as the upper limit for (dT ∕ dr)S around Hertzsprung. Similarly, TM < 1250–1350 K are necessary for Hertzsprung.
 Obtained thermal constraints are summarized in Table 1 and Figure 9. Here thermal constraints for surface temperature gradient and those for Moho temperature are determined with resolution of 1 K km − 1 and 50 K, respectively. The effect of mare basalt loading is considered for Imbrium, Serenitatis, Crisium, Smythii, Humorum, Nectaris, Grimaldi, and Orientale. As discussed in section A3, mare basalt does not change thermal constraints significantly. Figure 9 shows the clear regional dependence for the upper limit of the initial surface temperature gradient, and the dominant cause for this dependence is the regional variation in crustal thickness. As discussed in section 3.2, no clear regional dependence for the upper limit of the initial Moho temperature is found. Also, we found no clear correlation between the thermal constraint and the (relative) basin formation age estimated based on crater chronology [e.g., Stöffler et al., 2006].
 We obtained large values of (dT ∕ dr)S and TM for Type I basins. This is because their present-day mantle uplifts are smaller than those of other types of basins, and larger deformation is necessary for achieving negative crustal thickness. Moreover, even the hottest condition in our calculation does not result in negative crustal thickness for three Type I basins; thus, significant thermal constraints cannot be obtained for these basins. As discussed in section 2, this result does not necessarily indicate that crustal temperatures around these basins were extremely high at their formation ages, but it does indicate that Type I basins are not appropriate for constraining thermal structure in our analysis. Consequently, in the next section, we use thermal constraints only for primary mascon basins and Type II basins. Note that a typical diameter for Type I basins is smaller than that for primary mascon basins and Type II basins (Table 1). When we estimate crustal thickness, we reduce amplitudes for high-degree coefficients of gravity field data, using a downward continuation filter. This could lead to underestimation of the current height of mantle uplift for small impact basins. If we use more accurate, high-resolution gravity field data, we can use a much weaker downward continuation filter than that used in this study [e.g., Wieczorek et al., 2006]. Accordingly, more accurate and detailed gravity field data obtained by the GRAIL mission [e.g., Zuber et al., 2012] will enable us to conduct similar analyses for Type I basins.
5 Radioactive Element Concentrations in the Crust
 In this section, we calculate upper limits for column-averaged crustal radioactive element concentrations based on thermal constraints obtained in the previous section. The concentrations of radioactive elements in the crust are important for investigating not only the thermal state of the solidified crust but also the nature of the lunar magma ocean (LMO) solidification; they can serve as a tracer for the residual liquid of the LMO because these elements are incompatible elements and concentrate in melt [e.g., Warren, 1985].
5.1 The Feldspathic Highlands Terrane
 Up to now, the lunar lower crust underneath the FHT has been considered to be rich in radioactive elements compared to the surface [e.g., Jolliff et al., 2000; Wieczorek et al., 2006]. This model is based on the observed high surface Th concentrations (2–3 ppm) on the SPAT, which may be an exposure of the lower crust [e.g., Prettyman et al., 2006]. Our result, however, does not support a model with the Th-rich lower crust underneath the FHT-An.
 Obtained upper limits for the initial surface temperature gradient for impact basins on the FHT-An, such as Hertzsprung and Freundlich-Sharonov, are 20–24 K km − 1. Figure 10 shows the time evolution of surface temperature gradient for Hcrust = 80 km (i.e., a typical crustal thickness for the FHT-An) and k = 1.5 W m − 1 K − 1, illustrating that the surface temperature gradient lower than 24 K km − 1 within the first 820 Myr requires column-averaged crustal Th concentrations lower than 0.5 ppm. Since a larger k results in a lower surface temperature gradient for a given Q, a larger k permits a higher Th concentration; column-averaged crustal Th concentrations need to be lower than 0.9 ppm for k = 2.0 W m − 1 K − 1. These upper limits are similar to the mean surface Th concentration observed by Gamma-ray spectrometers on the FHT-An (i.e., < 1 ppm) (Gillis et al. 2004). Consequently, in contrast to previous estimates, our result indicates that such a low Th concentration observed on the surface continues to the deep portion of the crust for the FHT-An.
 Since our calculation assumes a one-layer plagioclase crust, the rheological effects of stiffer mafic minerals, such as pyroxene [e.g., Mackwell et al., 1998], are not considered. The abundances of mafic minerals in the lower crust may be much higher than those in the upper crust [e.g., Wieczorek et al., 2006]. As discussed in section 'Mafic Lower Crust', however, the mafic-rich crust would probably be not stiff enough to maintain large mantle uplifts underneath a 80 km thick crust with Th ∼ 2 ppm. Consequently, the deep FHT crust must be depleted in radioactive elements at least near impact basins, and the Th-rich lower crust cannot be extended throughout the farside crust.
 Results for the FHT-O are summarized in Table 4. Upper limits for Th concentrations both on the nearside and the farside are similar to or even higher than surface Th concentrations [e.g., Gillis et al., 2004; Prettyman et al., 2006]. Thus, in contrast to the FHT-An, the lower crust underneath the FHT-O may be richer in radioactive elements than the surface. Note that the Serenitatis and Humorum basins are located near the boundary between the FHT-O and the PKT. The upper limits for the initial surface temperature gradient for these basins are very high (i.e., ∼ 40 K km − 1) compared to those for basins on the farside FHT-O (i.e., ∼ 25 K km − 1), suggesting that the column-averaged crustal radioactive element concentration for the central nearside FHT-O and that for the farside FHT-O may be significantly different.
 We obtained thermal constraints for the Apollo basin, located in a peripheral region within the SPAT. Assuming a crustal thickness of 50 km, we found that the column-averaged Th concentrations > 2 ppm are allowed (Table 4). These values are higher than surface Th concentrations in the peripheral region of the SPAT. Therefore, the lower crust underneath the SPAT may be richer in radioactive elements than the surface.
 Unfortunately, significant thermal constraints are not obtained for impact basins near the center of the SPAT. Consequently, the column-averaged Th concentration for the central part of the SPAT crust cannot be obtained. Because of this, we cannot exclude either of the following two contrasting models for vertical Th distributions in the SPAT crust: (1) The enrichment of Th on the SPAT is highly restricted to near the surface and the Th concentrations deep in the SPAT crust are as low as those in the FHT-An crust, and (2) the enrichment of Th on the SPAT continues down to the deep portion of the crust. To judge which model is more probable, further detailed analyses to determine Th concentrations on the floor of impact craters, which would expose an area below the layer of ejecta from nearside basins, on the central part of the SPAT using a high-spatial-resolution Th concentration map are necessary. For example, if a Th-depleted floor is observed for many impact craters, the former Th distribution model is more likely.
5.3 The Procellarum KREEP Terrane
 Imbrium is the only analyzed impact basin whose entire basin floor is inside the PKT (Figure 9). For Imbrium, the thermal constraint that guarantees initial positive crustal thickness (i.e., dT ∕ dr < 39 K km − 1) was obtained. However, we did not obtain thermal states, which always lead to initial negative crustal thicknesses (Table 1). In other words, even the hottest interior model in our calculations does not necessarily lead to initial negative crustal thickness. Thus, we cannot obtain a very conservative upper limit for Th concentration based on the latter thermal condition for the PKT.
 It is worthwhile noting that lunar crustal thickness models derived from gravity and topography data are ambiguous in their reference crustal thickness, which needs to be determined by other measurements, such as seismic data or non-negative minimum crustal thickness conditions. Because our crustal thickness model assumes zero crustal thickness at its thinnest area, this model gives the estimate for the thinnest condition. As shown in Figure 5, a thicker crust leads to a larger degree of deformation. Consequently, a thinner crust model allows a warmer interior, making our estimate on the Th concentration very conservative. In order to assess the effect of crustal thickness offset, we used the crustal thickness model by Neumann et al. [ 1996], which is ∼ 10 km thicker than our model, for an analysis of Imbrium. As discussed in section A5, we could obtain the upper limit for the column-averaged Th concentration. This upper limit for a column-averaged Th concentration obtained under a thick crustal condition is shown in Table 4. This result indicates that further detailed analyses of crustal structure on the lunar nearside should be very important for investigating radioactive element concentrations deep in the PKT crust.
 The fact that obtained constraints for the PKT much “looser” than those for the other provinces suggests that the column-averaged Th concentration inside the PKT may be much higher than those for the other provinces. Such a strong regional dependence of column-averaged crustal Th concentration could be accounted for by (1) a heterogeneous distribution of KREEPy material underneath the anorthositic crust and/or (2) a large regional variation in the Th concentration within the anorthositic crust. The former type of vertical compositional structure may result from a global, degree-one-mode mantle overturn immediately after the solidification of the LMO [e.g., Parmentier et al., 2002]. The latter type of vertical compositional structure, in contrast, would be formed by horizontally heterogeneous crustal growth during solidification of the LMO [e.g., Longhi, 1978; Loper and Werner, 2002; Arai et al., 2008; Ohtake et al., 2012]. Since either model would predict a highly heterogeneous Th concentration as inferred from our analysis, we cannot tell which model is more consistent with the observed geodetic state of major lunar impact basins. However, our result that the deep portion of the FHT-An crust is highly depleted in radioactive elements may pose a rather strong constraint on these lunar thermal evolution models; a successful model would have to squeeze out radioactive elements from the FHT-An region very efficiently.
 Based on viscoelastic calculations under a wide variety of parameter conditions, we investigated the long-term deformation of impact basins. Using Kaguya geodetic data, we recovered initial surface and Moho topographies and obtained upper limits for surface temperature gradients and Moho temperatures approximate to the formation age of impact basins. Here we used non-negative post-impact crustal thickness to constrain the initial central mantle uplift. Based on thermal constraints, we further constrained the upper limit for the column-averaged radioactive element concentration in the crust, which cannot be observed directly with spectroscopic observations (i.e., visible, infrared, X-ray, and Gamma-ray spectroscopies). We found that thermal constraints and upper limits for radioactive element concentrations varied greatly among major geological units, i.e., the FHT, the SPAT, and the PKT. The tightest thermal constraint (i.e., ≤ 20 K km − 1) is found for impact basins on the FHT-An, suggesting that the deep portion of the crust of the FHT-An is highly depleted in radioactive elements (i.e., Th ≤ 0.5 ppm), similar to its upper portion. This result indicates that an exposure of the lower crust cannot account for Th elevation on the SPA basin floor. We also found that a loose thermal constraint for the Apollo basin allows a much higher Th concentration (i.e., ∼ 2 ppm) in the peripheral region of the SPAT. Furthermore, the column-averaged Th concentration in the PKT is not constrained based on our analysis and may be very high. These results strongly suggest that early thermal evolution and column-averaged crustal radioactive element concentrations vary greatly in different provinces on the Moon, supporting an early mantle overturn and/or asymmetric crustal growth on the Moon.
Appendix A:: The Effects of Different Model Assumptions and Parameter Conditions
 Both the thermal evolution calculation and viscoelastic deformation calculation require many assumptions and parameters. Since the goal of this study is to obtain conservative estimates on the upper limit for the surface temperature gradient and for column-averaged crustal radioactive element concentrations, we assume a simple and relatively stiff Moon model. If a lunar interior model more realistic and much weaker than our model is assumed, constraints on the thermal state and crustal radioactive concentrations more severe (i.e., lower surface temperature gradient, Moho temperature, and Th concentration) than those obtained in this study would be necessary. In this study, we use a Th concentration of 25 ppb for the mantle. This value is a low value among previous estimates. For example, Jolliff et al. [ 2000] and Hagerty et al. [ 2006] estimate average mantle Th concentrations of 40 ppb and > 50 ppb, respectively. If we use a mantle Th concentration higher than that used in our calculations (i.e., 25 ppb), crustal Th concentrations must be lower than our upper estimates in order to satisfy the thermal constraints obtained in this study.
 In this study, we assume density profiles, which satisfy the Adams-Williams condition. Consequently, the density depends on radius within each layer (i.e., crust and mantle) slightly. A model with a uniform crust and a uniform mantle is much simpler and may be sufficient for our calculations. Such a uniform density profile, however, would cause Rayleigh-Taylor instability in a compressible Moon because such a density profile is gravitationally unstable (Plag and Jüttner, 1995). This phenomenon is a kind of numerical instability due to the oversimplified assumption on initial density profile; an upper layer is heavier than a lower layer when adiabatic compression/decompression is taken into account. Then calculations would fail under all thermal conditions. In order to avoid such a situation, we assume depth-dependent density profiles. Nevertheless, the change in crustal density is extremely small and would not affect deformation significantly.
 One of the major simplifications in our thermal evolution calculations is that heat sources other than radiogenic heating, such as impact and tidal heating, are not considered. Such heat sources may increase temperature around impact basins significantly.
 In addition, the effect of nonlinear rheology is not taken into account; we assume the Maxwell model. Thomas and Schubert [ 1987] show that crater relaxation timescales for a nonlinear rheology are shorter than those for a linear rheology. Thus, the use of nonlinear rheology would probably result in a crust and/or mantle weaker than our model. Furthermore, we use dry rheologies for silicates in our viscoelastic deformation calculations. Recent re-analyses of Apollo samples, however, suggest a mantle with geophysically significant amounts of water [e.g., Saal et al., 2008; Hauri et al., 2011]. If this is the case, the lunar mantle may be much weaker than our model. Thus, we assume a relatively stiff lunar interior model.
 In this study, we use a spectral scheme for viscoelastic deformation calculations. Therefore, the amplitude of deformation is assumed to be small in our calculation. This condition is not necessarily satisfied by the problem addressed in this study because the height of a mantle uplift is on the order of the average pre-impact crustal thickness. However, the violation of this condition is not likely to affect our conclusion significantly. Nunes et al. [ 2004] compare calculation results using a spectral method and those using the finite element method, which can handle large-amplitude deformation. They investigate Venusian crustal deformation under a condition in which the amplitude of Moho undulation is about a half of crustal thickness. They found that the terminal amplitude is almost the same for both calculation schemes. Guest and Smrekar [ 2005] also compare calculation results using a spectral method and those using the finite element method by considering the relaxation of the Martian crustal dichotomy boundary. They found that results using different methods do not differ significantly. Thus, results obtained by a spectral scheme would give a good first-order estimate of the ratio of initial and terminal topographic amplitudes even under large-amplitude conditions [e.g., Mohit and Phillips, 2006].
 In the following, we examine the effect of different parameter conditions on our estimates on the upper limit for the surface temperature gradient and for column-averaged crustal radioactive element concentration.
A1 Initial Temperature Profiles in the Crust
 In this study, we assume an initially isothermal crust (the solidus temperature of peridotite on the Moho), which is the hottest possible crustal condition. To examine the effect of the initial condition on the thermal structure during basin formation ages (i.e., the first 1 Gyr), we calculate the thermal evolution with a much colder initial crust. More specifically, we assume that the initial temperature in the crust increases with depth linearly. The temperature at the base of the crust and the top of the mantle is the same. These profiles are shown in Figure A1(a). This figure also shows the temperature structure given by a steady state (i.e., the thermal state obtained by assuming the left-hand side of equation (2) to be zero) for k = 1.5 W m − 1 K − 1.
 Figure A1(b) compares the time evolution of the surface temperature gradient between initially hot and cold crust models, illustrating that both models give almost the same surface temperature gradient for 100 Myr after the major solidification of the lunar magma ocean. Figure A1(b) also illustrates that surface temperature gradients during these ages are almost the same as those for the steady state model. In other words, the temperature profile in the crust during the basin formation period is mainly determined by heat production in the crust. The same result is found for other calculation conditions. Such insensitivity to the initial temperature profile in the upper part of the Moon occurs because of the following reason. Since a typical thermal diffusion length for 100 Myr is about 100 km, the initial heat in the crust, which is several tens of km in thickness, is removed within this timescale. Consequently, the temperature profile in the crust for > 100 Myr is mainly determined by sustained heat production in the crust. Thus, although we assume an extremely hot crust in the main calculations, the use of different initial temperature profiles in the crust would not change our results significantly.
 In our main calculations, we calculate viscosity using the stress σ = 20 MPa for the crust and the grain size d = 1 mm for the mantle. To examine the effect of these parameters, we calculated viscoelastic deformation using different values for stress and grain size.
 Figure A2 shows the time evolution of the Moho topography under harmonic degree 30 Moho loading conditions. The corresponding wavelength for harmonic degree 30 is about 360 km, which is a typical diameter of mantle uplifts. Thus, the effects of crustal stress and those of grain size on the deformation of a mantle uplift underneath impact basins can be seen in Figures A2(a) and (b), respectively. Differences in the terminal Moho topography are only minor (i.e., < 1 %), strongly suggesting that our conclusions are not sensitive to stress in the crust and grain size in the mantle. These results further suggest that the use of nonlinear rheology will not change our conclusions significantly.
 It is noted that a very small effective activation energy (i.e., 120 kJ mol − 1) for the Earth mantle, which is about a half of that for “wet” rheology of olivine obtained in laboratory experiments, is estimated from the mechanical behavior of the oceanic lithosphere (Watts and Zhong, 2000). The use of such a small activation energy would lead to the mantle viscosity much smaller than our rheological model. As discussed above, constraints on the thermal state and crustal radioactive concentrations more severe than those obtained in this study would be necessary when weak rheological models are used.
A3 Maria and Cryptomaria
 Since the density of mare basalt is higher than that of anorthositic crust, mare basalts cause gravity anomalies and behave as surface loads. Consequently, mare basalt fills, including cryptomaria, would influence crustal thickness estimates and the long-term viscoelastic deformation of impact basins. In this study, we assume a mare model used in previous crustal thickness modelings [e.g., Wieczorek and Phillips, 1998; Ishihara et al., 2009]. Thermal constraints for Imbrium, Serenitatis, Crisium, Smythii, Humorum, Nectaris, Grimaldi, and Orientale are obtained considering both an early loading case (i.e., all the mare basalts are emplaced immediately after the basin formation) and a delayed loading case (all the mare basalts are emplaced after viscous deformation).
 In order to examine the effect of mare on the surface on thermal constraints, we assume a disk-shape, 5 km thick mare basalt fill on Hertzsprung. The radius of a mare fill is assumed to be the same as that of the basin main rim. We re-calculated global crustal thickness, recovered initial crustal structures, and obtained thermal constraints for Hertzsprung with mare basalt. Here we considered both an early loading case and a delayed loading case. Figure A3 shows initial minimum crustal thickness as a function of the initial surface temperature gradient for different loading cases. This calculation result indicates that this mare model causes no significant change in the upper limit for the initial surface temperature gradient. The upper limit for the Moho temperature is also unchanged. It is noted that we assume a thick mare basalt here; the thicknesses of maria are estimated to be smaller than 5 km [Wieczorek et al., 2006]. Consequently, while the presence of mare basalt would be important for estimating present-day crustal structure accurately, its effect on thermal constraints obtained in our analysis would be very small. Similarly, this result also indicates that the effect of cryptomaria, whose thicknesses are estimated to be ∼ 1 km for the major part [e.g., Antonenko and Head, 1994], would be extremely small.
A4 Mafic Lower Crust
 The abundances of mafic minerals, such as pyroxene, in the lunar crust may increase with depth [e.g., Wieczorek et al., 2006]. Wieczorek and Zuber [ 2001] suggest that the lower crust contains ∼ 20% mafic minerals by volume. As shown in Figure A4, the rheology is harder for higher abundances of mafic minerals. Consequently, the lower crust may be significantly harder than the highly anorthositic upper crust.
 Figure A5 shows the time evolution of Moho topography for harmonic degree 30, which corresponds to a typical size for the mantle uplift of Freundlich-Sharonov. As discussed in the main text, a column-averaged Th concentration lower than 0.5 ppm is required for this basin (under k = 1.5 W m − 1 K − 1). Figure A5 illustrates that the terminal Moho amplitude for Columbia diabase rheology and Th = 1 ppm is lower than that for plagioclase rheology and Th = 0.5 ppm. This result indicates that the recovered mantle uplift under the former condition is higher than that under the latter condition. Consequently, the use of Columbia diabase rheology would not allow a column-averaged Th concentration higher than 1 ppm for the FHT-An.
It is noted that Columbia diabase is more mafic than the lunar lower crust previously estimated [e.g., Wieczorek and Zuber, 2001]. Also, the above calculation assumes the rheology of Columbia diabase for the whole crust. Consequently, the lunar crust would probably be weaker than the above calculation. Thus, even if we assume a mafic lower crust, the Th-rich ( ≥ 2 ppm) lower crust would probably not be hard enough to maintain a large mantle uplift underneath the FHT-An crust.
A5 Crustal Thickness
 Since the minimum crustal thickness is assumed to be zero, our crustal thickness model is relatively thin compared to several lunar global crustal thickness models [e.g., Zuber et al., 1994; Neumann et al., 1996; Wieczorek and Phillips, 1998, 1999; Ishihara et al., 2009]. As shown in Figure 5, a thicker crust leads to a higher degree of deformation. Consequently, a crust thicker than our model would lead to thermal constraints more severe (i.e., colder interior) than postulated in this study.
 In order to assess the effect of crustal thickness offset, we used the crustal thickness model by Neumann et al. [ 1996], which is ∼ 10 km thicker than our model, for an analysis of Imbrium. Results for the Imbrium basin are shown in Figure A6, and they illustrate that the initial minimum crustal thickness is always negative when the initial surface temperature gradient is larger than 53 K km − 1. Assuming a crustal thickness of 50 km for the PKT, we found that this thermal constraint leads to an upper limit for the column-averaged Th concentration of 4.5–6.2 ppm. It is noted that using our reference crustal thickness model, we cannot obtain (1) the initial thermal condition for Imbrium, which always results in the initially negative crustal thickness, and (2) a conservative upper limit for the column-averaged Th concentration inside the PKT. Thus, a ∼ 10 km difference in crustal thickness is crucial in order to discuss crustal radioactive element concentrations.
 We would like to thank Shijie Zhong and Patrick McGovern for their careful reviews and constructive comments for improving this manuscript. The authors wish to thank Makiko Ohtake, Yuzuru Karouji, Shingo Kobayashi, Makoto Hareyama, Hiroshi Nagaoka, Yasuhito Sekine, and Yuichiro Cho for fruitful discussions; Francis Nimmo for his constructive comments on an earlier version of the manuscript; and the proofreading/editing assistance from the GCOE program. Our spherical harmonic analyses were conducted using the software archive SHTOOLS (available at shtools.ipgp.fr). This research was partially supported by a grant-in-aid from the Japan Society for the Promotion of Science (JSPS).