Synchronizing Geomagnetic Field Intensity Records in the Levant Between the 23rd and 15th Centuries BCE: Chronological and Methodological Implications

Archeomagnetic records are an important source of information on the past behavior of the geomagnetic field. Frequently, however, coeval archeomagnetic intensity (archeointensity) datasets from nearby locations display significant discrepancies, hampering precise reconstruction of high‐resolution secular variation curve. This is the case for the time interval between the later phase of the Early Bronze and the early phase of the Late Bronze Ages (23rd–15th centuries BCE) in the Levant and Mesopotamia. We address the problem by cross‐correlating archeointensity datasets from four major multilayered archeological sites in the southern Levant (Hazor and Megiddo), northern Levant (Ebla), and western Upper Mesopotamia (Mari). We report new archeointensity data, obtained using the Thellier‐IZZI‐MagIC and the Triaxe methods, from six strata at Hazor and four radiocarbon‐dated strata at Megiddo. From 39 pottery fragments, 199 specimens passed our selection criteria, from which we calculated the mean archeointensity for each stratum. To strengthen the comparison of these data with previously published data from Mari and Ebla, obtained using the Triaxe method, we conducted a blind test of the methods that resulted in indistinguishable results or a difference of less than 1 μT. The synchronized compilation, constrained by radiocarbon data from Megiddo, displays a V‐shaped pattern with a prominent minimum of at least 200 years centered around the 18th century BCE. The study highlights the importance of stacking archeomagnetic data obtained by different archeointensity methods only after cross‐testing the methods and ensuring that archeological samples were dated in a consistent manner.

effort has been invested over the past decades in improving the resolution, accuracy, and coverage of geomagnetic models (Arneitz et al., 2019;Campuzano et al., 2019;Hellio & Gillet, 2018;Korte et al., 2011;Nilsson et al., 2014;Osete et al., 2020;Pavón-Carrasco et al., 2014), leading to a growing need for high-quality Holocene paleomagnetic records. Archeomagnetic data of the past several millennia, derived from the analysis of well-dated baked clay artifacts, are significant in this effort because they enable recovery of the absolute intensity of the ancient field (archeointensity). One of the main challenges in this respect is synchronizing archeomagnetic datasets obtained from different archeological sites, because frequently, coeval records from the nearby locations display significant discrepancies. This is a result of biases both in age scales and archeointensity determinations. Age biases can, for example, result from stratigraphic uncertainties, usage of different terminologies for archeological periods, and limited quantity of radiocarbon data. Archeointensity biases can be caused by inconsistency in paleointensity methods and laboratory protocols, selection criteria, and averaging schemes.
The archeointensity data from Turkey, Syria, Jordan, Cyprus, and Israel, published over the past decades (Ben-Yosef et al., 2017;Ben-Yosef et al., 2008Ertepinar et al., 2012Ertepinar et al., , 2016Ertepinar et al., , 2020Gallet et al., , 2008Gallet et al., , 2014aGallet et al., , 2020Gallet & Al-Maqdissi, 2010;Gallet & Butterlin, 2015;Genevey et al., 2003;Shaar et al., 2011Shaar et al., , 2015Shaar et al., , 2016Stillinger et al., 2015) potentially provide sufficient information to inspect the behavior of the field in this region with an improved resolution and accuracy. Yet, when these recent data are plotted together with all other data published since the 1980s, substantial discrepancies arise ( Figure S1). The noise in this compilation can be attributed to large variabilities in dating approaches, laboratory methods, and interpretation techniques.
As part of our efforts to construct a detailed and consistent archeointensity database for the Levant, we focus in this study on the time interval between 2300 and 1400 BCE. Gallet et al. (2014a) suggested that field behavior during this period is characterized by a V-shaped pattern, with a minimum during the 19th and 18th centuries BCE, which marks the lowest field in the Levant during the past 4,500 years. This unique trend falls with several key historical events that make the archeointensity curve important for both archeomagnetic dating and geomagnetic secular variation research. Figure 1 shows archeointensity data within this time interval obtained using only modern paleointensity standards (Ertepinar et al., 2016(Ertepinar et al., , 2020Gallet et al., , 2008Gallet et al., , 2014aGenevey et al., 2003;Shaar et al., 2016;Stillinger et al., 2015). Older datasets, which do not include corrections for both anisotropy and cooling rate effects (Athavale, 1969;Games, 1980;Hussain, 1983;Odah, 1999Odah, , 2004Odah et al., 1995), are not shown. The significance of these corrections has been well known for decades (Fox & Aitken, 1980;Rogers et al., 1979) and we further demonstrate it in Section 3. In addition, Figure 1 ignores a dataset published without details on quality criteria (Aitken et al., 1984), which prevents comparison with modern data. The compilation shown in Figure 1 displays significant inconsistencies hindering attempts to construct precise high-resolution secular variation curve. Distinguishing between discrepancies resulting from archeological uncertainties or age biases and those resulting from archeointensity methodological issues is not trivial. From a dating viewpoint, only few sites are radiocarbon dated (Kültepe,Kinet Höyük,Tel Atchana,and Tel Megiddo [Figure 1a]); archeological dating of the other sites relies on different, and sometime contradicting, archeochronological frameworks (e.g., Figure S2). From an archeointensity perspective, the different datasets were obtained using a range of methods (several variants of the Thellier method applied using different selection criteria, Triaxe, and microwave) and different averaging and error estimation schemes. Thus, compatibility between the archeointensity values in the compilation shown in Figure 1 may be questionable without rigorous testing.
The purpose of this study is to minimize the noise in the archeointensity picture displayed in Figure 1. Our approach is as follows: first, we establish compatibility between data obtained using the Triaxe and the Thellier-IZZI-MagIC methods (see Section 2.2 for methods) in a "blind test." This is done because, for this paper, the majority of the data was obtained using these two methods. Then, we report new data for pottery collected from four radiocarbon-dated strata in Tel Megiddo and six archeologically dated strata in Tel Hazor (Israel, Figure 1a). Under a working assumption that contemporaneous ceramics from nearby archeological sites must have recorded the same field intensity, we assemble a reduced synchronized dataset, constrained by Tel Megiddo radiocarbon ages. The synchronized dataset is based on data from four major multilayered archeological sites studied using paleointensity methods that were tested against each  (Figure 1a). Finally, we compare the reduced synchronized compilation with other sites, which were analyzed using methods that were not tested against Thellier-IZ-ZI-MagIC and/or Triaxe.  ; circles: Triaxe and/or Thellier-Coe tested against Triaxe (Gallet et al., , 2008(Gallet et al., , 2014aGallet & Al-Maqdissi, 2010;Genevey et al., 2003); triangles: microwave (Ertepinar et al., 2020) or microwave combined with Thellier-Type and multispecimen (Ertepinar et al., 2016); pentagons: Thellier-IZZI (Stillinger et al., 2015). Yigael Yadin. Excavations were renewed by Amnon Ben-Tor in 1990. Both excavations were carried out under the auspices of the Hebrew University of Jerusalem. Directors of the current dig are Amnon Ben-Tor and Shlomit Bechar. Tel Hazor is composed of a sequence of more than 20 strata that span from the Early Bronze to the Persian period (ca. 30th-4th centuries BCE), representing a nearly continuous occupation. Hazor was first established as a large urban center in the Middle Bronze ( Figure S2). The Middle Bronze city encompassed an area of approximately 84 hectares -in both the acropolis and lower city -with temples, residential quarters, workshops, a palace, and open cultic places (for an extensive description of the site and its monuments, see Ben-Tor [2015] and Yadin [1972]). Two strata are attributed to the Middle Bronze: XVII and XVI in the acropolis and 4 and 3 in the lower city. In this study, we also consider Stratum XV, which begins at the end of the Middle Bronze III and continues to the Late Bronze I, and Stratum XVIII, which represents a rural settlement of the Early Bronze IV (ca. 2350-2000 BCE) (this period is termed Intermediate Bronze Age by many of the scholars studying the southern Levant). For archeointensity analysis, we collected pottery vessels, with emphasis on local domestic material, from a succession of six stages in Strata XVII-XVI in the acropolis, based on architectural and ceramic considerations . These stages, labeled from top to bottom -XVI-A, XVI-B, XVI-C, XVII-D, XVII-E, and XVII-F -were exposed in an open cultic area near the entrance to the palace. Their initial ages are assigned following Ben-Tor (2004), who dated the establishment of MB Hazor (i.e., the beginning of Stratum XVII) to 1720-1710 BCE, the beginning of its peak (Stratum XVI) to around 1680 BCE, and its end to 1550 BCE. The ages assigned by Ben-Tor (2004) follow the Mesopotamian "Ultra Low Chronology" ( Figure S2). Archeological information on each of the investigated artifacts is provided in the supporting information.

Tel Megiddo
Megiddo ( Ussishkin [2018]). The current excavation focuses on tight control over stratigraphy and ceramic typology and on a rigorous program of radiocarbon dating. The latter, with an unprecedented number of samples, encompasses almost the entire stratigraphic sequence; special attention was taken to process only samples that originate from secure stratigraphic contexts: Regev et al. (2014) Martin et al. (2020). Collection of the pottery samples was done with emphasis on local domestic material securely associated with stratigraphic and ceramic context and thus with radiocarbon dates.

Archeointensity Methods
Our reduced dataset is based on three methods, which were tested against each other and show compatibility: (1) the Thellier method (Thellier & Thellier, 1959) applied using the IZZI protocol (Tauxe & Staudigel, 2004;Yu et al., 2004) and analyzed using the automatic interpretation technique of Shaar and Tauxe (2013) with criteria defined in Shaar et al. (2016), hereafter called Thellier-IZZI-MagIC (squares in Figure 1) ; (2) the Thellier method applied using the Coe protocol (Coe, 1967), hereafter termed Thellier-Coe (circles in Figure 1 (Le Goff & Gallet, 2004) (circles in Figure 1) (Gallet et al., , 2008(Gallet et al., , 2014a. In the following, we review the main concepts of each method and assess the differences between them. In Section 3.1, we provide the experimental evidence for compatibility.

Thellier-IZZI-MagIC
Thellier-IZZI-MagIC archeointensity experiments of samples collected for this study were carried out in the shielded paleomagnetic laboratory at the Institute of Earth Sciences, The Hebrew University of Jerusalem (HUJI), using modified ASC TD-48 ovens, 2G-750 superconducting rock magnetometer (SRM), and 2G-RAPID SRM. We cut several specimens from each pottery fragment (sample) and glued them inside nonmagnetic glass vials of 12 mm in diameter (Hazor samples), or in 22 mm × 22 mm × 20 mm square alumina crucibles (Megiddo samples). Archeointensity experiments followed the IZZI protocol (Tauxe & Staudigel, 2004;Yu et al., 2004) with routine pTRM checks at every second temperature step (Coe et al., 1978), using an oven field of 40 or 50 μT. Heating time ranged from 40 to 65 min depending on the target temperature. Figure 3a shows typical successful data from these experiments, with Arai plot (Nagata SHAAR ET AL. et al., 1963) displaying a linear behavior, successful pTRM check (graphically displayed as triangles overlapping the IZZI datapoints), and a stable straight single-component Zijderveld diagram (Zijderveld, 1967) converging to the origin.
Anisotropy of thermoremanent magnetization (ATRM) experiments consisted of eight heating steps carried out at the highest temperature that the specimens reached during the Thellier experiment: a baseline step in a zero field (subtracted from the subsequent infield measurements), six infield steps at orthogonal directions (+x, +y, +z, −x, −y, −z) in 40 or 50 μT, and an additional infield alteration check at the end of the experiment. As the long, repeated high-temperature procedures can cause alteration of the magnetic minerals, we calculated an alteration parameter from the vector differences of four pairs of measurements: ([−x, +x], [−y, +y], [−z, +z], and [first measurement, alteration-check measurement]). ATRM data of specimens with alteration check >6% were rejected. For these specimens, anisotropy was measured using anhysteretic remanent magnetization (ARM) acquired at 100-mT AC field and 0.1-mT DC bias field at six orthogonal directions, similar to the ATRM experiment, where AF demagnetization step at 110 mT was applied before each ARM step. AARM experiments were carried out after thermal demagnetization of the specimens. Anisotropy tensors were calculated following Hext (1963), using the PmagPy Thellier GUI program (Shaar & Tauxe, 2013;Tauxe et al., 2016).
Cooling rate correction experiments consisted of 4-5 cooling steps from 590°C or 600°C to room temperature: a baseline zero-field step, an in-field step in a fast (regular) cooling rate, one or two in-field steps at slower cooling rate, and finally, an in-field alteration check step in a fast cooling rate. Cooling rate alteration parameter was calculated as the percentage of difference between the first and the last cooling steps. Specimens with alteration parameter >6% were rejected, and their cooling rate correction factor was calculated by averaging the correction factors of sister specimens from the same mother fragment. Cooling rate correction factors were calculated using the PmagPy Thellier-GUI program, assuming a logarithmic relation between oven TRM TRM and oven Cooling rate Cooling rate (Genevey & Gallet, 2002;Halgedahl et al., 1980). Correction factors were calculated from the best fit line through the experimental cooling rate data, as illustrated graphically in Figure 2b. In our calculation, we assumed an averaged ancient cooling rate of 6 h from 500°C to 200°C for all specimens following Genevey et al. (2003) and Shaar et al. (2016).
Archeointensity values were calculated with the Thellier-GUI program (Shaar & Tauxe, 2013), incorporated into the PmagPy software package , using the Thellier Auto Interpreter algorithm, and the acceptance criteria listed in Table 1 Table 2 are shown.
1. Line fitting to the Arai plot: The program analyzes all the possible best fit lines of each Arai plot separately and filters out the interpretations (best fit lines) that fail the specimen acceptance criteria 2. Applying corrections: Each interpretation that passes step 1 is corrected for the effect of anisotropy and cooling rate 3. Sample/site mean calculation: The program calculates all the possible sample/site means (where sample denotes for a pottery fragment, and site is attributed to archeological stratum, level, or context) using all the specimens' acceptable interpretations calculated in steps 1 and 2. Means that fail the sample/site acceptance criteria in Table 1 are screened out 4. STDEV-OPT sample/site mean calculation: From all the acceptable sample/site means calculated in step 3, the STDEV-OPT (standard deviation optimum) mean is the mean that has the lowest coefficient of variation statistic (σ % = 100 × [σ/mean], where σ is the standard deviation) 5. Calculating error bounds: Uncertainty bounds are defined by taking from all the acceptable means calculated in step 3, the two end-case interpretations with the lowest and the highest values, and adding or subtracting from them the standard deviation (these end case values are [B min − σ max ] and [B max + σ max ])

Triaxe
Triaxe measurements were done at the paleomagnetic laboratory of the Institut de Physique du Globe de Paris (IPGP) using a Triaxe three-axis vibrating sample magnetometer, which allows magnetization  Tauxe and Staudigel (2004).

Table 1
Acceptance Criteria Applied to the Thellier-IZZI-MagIC Data measurements at high temperatures. The experimental protocol is illustrated in Figure 2c, and it includes five heating/cooling steps between two reference temperatures T 1 (generally 150°C) and T 2 (around 500°C): the first (heating) step is natural remanent magnetization (NRM) demagnetization up to T 2 ; the next two demagnetization steps (cooling-heating) are designed to scrutinize the thermal variation between T 1 and T 2 of the magnetization fraction still unblocked at T 2 ; the fourth (cooling) is done in a laboratory field whose direction is automatically adjusted so that it leads to the acquisition of laboratory TRM (TRM lab ) parallel to the NRM; the fifth (heating) step demagnetizes the latter up to T 2 . The intensity value is estimated for each running temperature T i (every ~5°C) between T 1 and T 2 from the ra- Figure 2d), where M 1 , M 3 , and M 5 are moment measurements at the first, third, and fifth steps (i.e., while temperatures are increasing). If the R'(T i ) data display a straight and flat line, B anc is calculated at the specimen level by averaging all R'(T i ) values, provided that (1) the TRM anc exhibits a single directional component, being that acquired during the manufacture of the artifacts. In case of a secondary magnetization, the pivotal temperature T 1 can be shifted to a higher temperature T 1 '; (2) the magnetization isolated between T 1 /T 1 ' and T 2 represents more than 50% of the magnetization fraction remaining above T 1 /T 1 '. As the direction of TRM lab is parallel to that of NRM, no anisotropy correction on TRM acquisition is required. Furthermore, it was experimentally shown that the cooling rate effect is overcome by considering the R'(T i ) data (Le Goff & Gallet, 2004). For an in-depth review on the method and the selection criteria, see Le Goff and Gallet (2004), Genevey et al. (2009), andHartmann et al. (2010). Table 2 lists the selection criteria of the Triaxe method (see also Gallet et al., 2020).

Blind Test of the Methods
As the motivation for this study is synchronizing archeointensity datasets acquired using different laboratory procedures, we performed a "blind test" aimed at assessing the compatibility between the Thellier-IZ-ZI-MagIC and Triaxe methods.  already compared archeointensity data from Syria obtained using the Triaxe and the Thellier-Coe methods (additional comparisons were also carried out by Genevey et al. [2009], Hartmann et al. [2010], Hartmann et al. [2011], and Hervé et al. [2017]). They concluded that when using the selection criteria in those studies, the two methods yield equivalent data. The Thellier-IZZI-MagIC applied here adopts an automatic interpretation procedure with different acceptance criteria than in  and Genevey et al. (2003). Hence, we augment their conclusions with additional testing of the Triaxe with the Thellier-IZZI-MagIC.
The slope in the diagram, expressed in % through the temperature of analysis must be less than 10% (slope defined by : For mean computation of the R'(T i ) values: The magnetization fraction, with unblocking temperatures larger than T 1 (or T 1 '), must be at least 50%.

Fragment
Coherence of the intensity values Results obtained from at least two different specimens.
Estimated error ≤5% of the corresponding mean.
Abbreviation: TRM, thermoremanent magnetization. We split a set of randomly selected Bronze Age pottery fragments from Tel Hazor (Table 3) into two groups. One group was sent to the IPGP laboratory without the archeological details of the fragments for Triaxe measurements and the other group was analyzed at the HUJI laboratory using the Thellier-IZZI-MagIC method. As the interpretation procedure of Thellier-IZZI-MagIC is fully automatic, the test is free of any subjective considerations for both methods. Sixteen fragments passed the Triaxe criteria (Table 3 and Figure 3), 15 fragments passed the Thellier-IZZI-MagIC criteria (Table 4), 4 fragments failed criteria in both methods, and 4 fragments failed criteria in one of the methods. Thirteen fragments that passed the selection criteria in both methods are shown in Figure 4. The agreement between the two datasets is excellent, displaying differences of less than 3 μT between the means of each fragment. When considering the error bounds, only two fragments show distinguishable values in the two methods, but their difference is smaller than 1 μT. We note that owing to the way the error bounds of the   its error bars are larger. From this, we conclude that archeointensity data calculated using the two methods are indistinguishable.

Archeointensity Results
Out of 219 specimens, 181 passed the Thellier-IZZI-MagIC acceptance criteria, representing a success rate of 83%. Table S1 lists their statistics. Figure S3 displays histograms of the statistics used as selection criteria (Table 1), indicating the high quality of the accepted results. Histograms of the anisotropy and cooling rate correction factors are shown in Figure 5. The anisotropy correction of 27% of the specimens is higher than 10%, and the cooling rate correction of 10% of the specimens is higher than 10%. This emphasizes the importance of the two corrections. Also, it highlights the preference of fitting the cooling rate data to a line using three different cooling rates (e.g., Figure 2b), instead of the typical procedure of only two rates. All the raw measurement data and interpretations, including the ATRM, AARM, and cooling rate correction experiments, are available in the MagIC database (earthref.org/MagIC/16857).
Out of 52 fragments, 39 passed the Thellier-IZZI-MagIC criteria listed in Table 1, representing a success rate of 75%. We incorporated specimens analyzed using the Triaxe method in the fragment paleointensity calculation by assigning the value listed in Table 3 for each Triaxe specimen. The fragment results are presented in Table 4. Figure 6a shows fragment archeointensities plotted versus fragment group-level at Tel Megiddo and stratum/stage at Tel Hazor. Overall, there is a good agreement between fragments collected from the same group (archeological context) with some exceptions: one fragment from Megiddo H-15 had a lower value than the other four fragments; five fragments from Megiddo K-10 yielded two groups of results: two showing high values (~55 μT) and three indicating much lower values (near ~46 μT); one fragment from Hazor XVI-A showed a lower value than the other three fragments; one fragment from Hazor XVI-B showed significantly different values than the other four fragments from this stage. The latter fragment from Hazor XVI-B that showed values distinguishable from the rest of the fragments in the group, with error bounds distinct from all other fragments, is shown in red in Figure 6a, and is considered as an outlier. This fragment was not used in the calculation of the group mean. The scatter of the fragment data within the different groups can be explained by the nature of the potsherds found in the archeological context. Even after a careful preselection, the potsherds represent a time interval starting from the time a ceramic vessel was produced to the time it was deposited. Restricting ourselves to non-luxurious domestic vessels, this interval can last a few decades. Thus, by collecting several potsherds per context, we get a range of archeointensity values associated with the corresponding time interval. Also, we cannot eliminate the possibility of "contamination" of potsherds from other contexts, during the complex process of site formation.

(a) (b)
by being younger in age, for example, from Hazor XV. For this reason, we took the approach of analyzing several fragments per group and tested the consistency within the fragment group before averaging them.
A mean value for each group (level, stratum, or stage), representing the time interval associated with the archeological context, was calculated using two approaches. The first is to average the mean values of the fragments (STDEV-OPT means in Table 4). This is the simplest and most straightforward approach. However, it does not take into account the different uncertainty of each fragment. The second approach is to use all the specimens from all the fragments passing the criteria (excluding the outlier fragment) and to calculate their STDEV-OPT mean and error bounds using the Thellier Auto Interpreter algorithm. The mean values calculated using these two approaches are shown in Figure 6b and listed in   (Table 4). The red symbol mark an outlier not used in group mean calculation. Numbers correspond to the number of fragments in each group excluding outliers. (b) Groups mean archeointensity values calculated by averaging fragments' means (orange circles and error bars) and by averaging specimens using the Thellier Auto Interpreter (blue squares, where green and blue error bars are the Thellier Auto Interpreter bounds and STDEV-OPT standard deviation, respectively). Numbers correspond to the number of specimens in each group.

A Reduced Archeointensity Dataset
To clarify the sources for some of the noise displayed in the archeointensity compilation shown in Figure 1, we start by constructing a reduced archeointensity compilation assembled from contexts, which were dated using consistent archeological chronologies. We restrict ourselves to methods and selection criteria that were tested against each other: Triaxe (Gallet et al., , 2008(Gallet et al., , 2014aGallet & Al-Maqdissi, 2010;, Thellier-Coe (Genevey et al., 2003), Thellier-IZZI-MagIC , and Thellier-IZZI-MagIC + Triaxe (this study). Figure 7a shows data from Mari, Ebla, Megiddo, and Hazor. These are large stratified sites that include most of the data in the time span reported here, ca. 2300-1400 BCE. Table S2 outlines the archeological context of these data. It is important to note that data from Syrian sites were reported at both the fragment and the group levels, but they are plotted as group mean. Thus, each published data point in Figure 1 is an average of several fragments. The data from Megiddo and Hazor, however, were reported at the fragment level, thus each data point in Figure 1 is a specimen mean from a single fragment. To account for this difference and enable adequate comparison, we calculated a group mean for the previously published data from Megiddo and Hazor using the same Thellier Auto Interpreter technique deployed for the new data reported in the current study. These group means are listed in Table 5.  (Lev et al., 2019).

Table 5
Group Archeointensity From Hazor and Megiddo, from this study and Shaar et al. (2016) The radiocarbon-dated contexts are highlighted with a bold frame in Figure 7a; they include Levels F-10, K-11, K-10, H-15, and S-2 at Megiddo and Stratum XVIII at Hazor. Stratum XVIII at Hazor has recently been dated to the interval between the second half of the 24th century and the 23rd century (Lev et al., 2019); since the analysis of this date is still in progress, we treat it merely as the best estimate. Figure 7b shows the same subset shown in Figure 7a with additional data from nearby sites -Mishirfeh-Qatna (Gallet & Al-Maqdissi, 2010), Tel Masaikh (Gallet et al., 2008), Terqa (Gallet et al., , 2008, and Haft Tepe ) -obtained using the Triaxe or the Thellier-Coe methods.
Considering the error bounds both in age and archeointensity, the reduced dataset in Figures 7a and 7b shows agreement between the different data points except one significant discrepancy between Hazor XVI-A,B and the rest of the data in this time interval. This difference is explained by the different absolute chronologies adopted by the respective excavation teams. The archeological age range of Hazor XVII-XVI (MB II-III) was adjusted by the excavators to conform with the "Ultra Low Chronology", while the age of Ebla and Mari were adjusted to the "middle chronology" there ( Figure S2) literature regarding the absolute age of the Middle Bronze is long-standing and beyond the scope of this archeomagnetic article. Yet, in this case, the radiocarbon ages of the stratigraphic sequence at Megiddo Levels K-11, K-10, H-15, and F-10 provide objective chronological constraints to the archeological age of Hazor XVII-XVI: the high archeointensity data of Megiddo are significantly higher and different than Hazor XVII-XVI suggesting that the age limit of Hazor XVI-A should be shifted to earlier ages. The extent of this shift is calculated in Section 4.2 and its archeological implications are discussed in Section 4.4.
Megiddo Level S-2 (ca. 1950-1900 BCE) is a well-dated destruction layer that yielded the lowest archeointensity value in the whole interval, similar to the values of some of the data from Mari end of City 3. Only two fragments from Level S-2 pass our criteria because of severe secondary burning that resulted in secondary pTRM acquisition in most of the samples we analyzed. Yet, if confirmed with more fragments, this layer can constrain the duration of the low archeointensity episode that marks the minimum in the archeointensity curve. One option is that the minimum extended from the late 20th century to the late 17th century BCE. Another possibility is that there was a local maximum around 1800 BCE associated with Megiddo Level F-13 and the higher values in Mari. Graphically, the former interpretation would result in a "V-shaped" archeointensity curve, whereas the latter would make a "W-shaped" curve. We note that the low archeointensity values (virtual axis dipole moment [VADM] of 70-75 10 12 Am 2 ) mark the lowest field during the past 4,500 years in the Levant and Mesopotamia.

Bayesian Modeling
To calculate an archeomagnetic age limit of the Middle Bronze levels at Hazor, we constructed a Bayesian archeointensity variation curve with 95% credible envelope using the reduced dataset displayed in Figure 7b. To minimize divergence of the Bayesian model due to edge effects, we expanded the modeled age interval and added data from the 25th to 24th centuries BCE and from the 14th to 13th centuries BCE (from Mari, Tell Gudeda, Chogha Zanbil, Tel Hazor, and Tel Megiddo (Gallet et al., , 2008(Gallet et al., , 2020Gallet & Butterlin, 2015;Genevey et al., 2003;Shaar et al., 2016). For modeling purposes, we used the fragments' mean and standard deviation instead of the STDEV-OPT mean to keep consistency with the published data from the northern Levant and Mesopotamia. All ages of Hazor XVI-XVII were set to 1800-1550 BCE to account for all the possible age ranges within the Middle Bronze II/III ( Figure S2) according to both the traditional chronology in the southern Levant and the middle chronology in Mesopotamia. The Bayesian curve was generated using the age hyperparameter reverse-jump Monte Carlo Markov Chain (AH-RJMCMC) algorithm, which is based on piecewise linear interpolations between randomly drawn vertices (Livermore et al., 2018) (https://github.com/plivermore/AH-RJMCMC1). Here we follow similar approach used in Gallet et al. (2020). The prior assumptions regarding the model and data are as follows: (i) the allowed range of vertices of VADM values is between 60 and 120 Z Am 2 ; (ii) the allowed number of vertices (K) is between K min = 1 and K max = 150; (iii) the archeological ages of the data are uniformly distributed (archeological age range in Table 5 and Table S2); (iv) normal distribution of the intensity data defined by the groups mean and standard deviation; (v) ages of stratified layers with overlapping age range must agree with the stratification order. The likelihood function is calculated for each data point at an age drawn from its age distribution by the difference between the intensity of the data to the intensity of the model, where the model is calculated from linear extrapolation between the vertices. The RJ-MCMC sampling algorithm represents a random walk through the space of models permitted by the likelihood function. Each perturbation of a model from its predecessor is done by shifting vertices in age and/or intensity and/or by adding or deleting a vertex. The perturbation is calculated using the following parameters: σ move = 30 years, σ change = 10 Z Am 2 , and σ birth = 10 Z Am 2 that describe the distributions of a vertex move in age, vertex change in intensity, and intensity of a new vertex born with respect to the extrapolated intensity at the vertex age. In our model, each perturbation includes one age resampling of the data in each perturbation (num_age_changes = 1). The chain length is 10 8 . These parameters are close to the parameters defined by Gallet et al. (2020); for more details on the AH-RJMCMC code and algorithm, see Livermore et al. (2018). of fragment groups Hazor XVI-A and Hazor XVII-F. The 95% credible bounds suggest that Hazor XVII-F started in the beginning of the 18th century and that Hazor XVI-A ended before ca. 1575 BCE. With these archeomagnetic constraints in hand, we reassign the age range of Hazor XVII-XVI between 1800 BCE (the beginning of Middle Bronze II according to the middle chronology) and 1575 BCE (new archeomagnetic constraints) keeping as much as possible the original internal division within the strata provided by the excavators. We assign wide error age bounds to fit the historical constraints on one hand (see Section 4.4) and the archeomagnetic constraints on the other hand. The archeomagentcally revised ages of Hazor XVII-XVI are listed in Table 5 and shown in Figure 7b. The Bayesian archeointensity variation curve with 95% credible envelope are given in Table S3.

Comparison With Nearby Data
In Figure 7a, we synchronized the datasets of Mari, Ebla, Hazor, and Megiddo, because these sites are the source for the majority of the data in Figure 1; each of these sites includes a detailed succession of strata dated in a consistent manner; data were analyzed by archeointensity methods that were cross-tested with each other; and six radiocarbon-dated contexts enable partly tying the data to absolute ages. Data from other SHAAR ET AL. nearby sites, which were analyzed using similar archeomagnetic methods and dated following the middle chronology are shown in Figure 7b and are in agreement with the rest of the data. Both datasets displayed in Figures 7a and 7b were used to construct the Bayesian model. Figure 9 shows the rest of the data, which were obtained using methods that were not tested against Thellier-IZZI-MagIC or Triaxe. Tell Mozan (Stillinger et al., 2015) was analyzed using the Thellier-IZZI protocol with anisotropy and cooling rate corrections, but with different acceptance criteria than Thellier-IZZI-MagIC. Data are reported at the fragment level, typically with 1-2 fragments per age group, with some apparent internal inconsistency among fragments from the same age. The one group that includes at least two fragments with the same archeointensity value is dated between 1900 and 1600 BCE. The low value of this group is in agreement with the archeointensity minimum in the synchronized compilation. Two pottery fragment groups from Cyprus-Bellapais Vounous and Marki Alonia (Ertepinar et al., 2020)-analyzed using the microwave method-are in agreement with the synchronized dataset considering their error bars, but are not perfectly aligned with it.
The radiocarbon-dated data from Turkey can potentially provide critical key chronological tie points to the archeoinensity compilation. Some of the data from Tell Atchana and Kinet Hoyuk (Ertepinar et al., 2020) and Kale Höyük and Kültepe (Ertepinar et al., 2016) are in agreement with the synchronized dataset, but other data show significantly different values. These data were analyzed using the microwave method and/or the Coe or the IZZI variants of the Thellier method and/or the multispecimen method. We note that the acceptance criteria used in these datasets are significantly weaker than the criteria used in the Thellier-IZZI-MagIC used here (Table 1). For example, most of the Anatolian data use fraction parameter (f-statistic of Coe et al. [1978]) ≥0.35, whereas in the reduced compilation, we use fraction parameter (FRAC , Table 1) ≥0.79. As the raw measurement were not published, we could not reprocess these data using the same interpretation algorithm, selection criteria, and averaging schemes as in this study.
Figures 9 and 1 demonstrate that, when attempting to construct a coherent regional archeointensity compilation from the entire published data by a simple stack of modern data, there is a considerable noise that masks the true trend of the geomagnetic field behavior. We repeat that the sources of the noise are uncertainties in the archeological age determinations as well as from difference in the paleointensity interpretation approaches. Therefore, unless each of the discrepancies in Figure 9 are rigorously assessed, we suggest that future geomagnetic models that include the time interval 2300-1400 BCE use the synchronized compilation in Figure 7b ( Table 5 and Table S2).

Implications to Archeological Chronologies of the Middle Bronze
Our results for Strata XVII-XVI at Hazor and Level K-11 at Megiddo touch on two parallel, but related debates regarding the chronology of the first half of the second millennium BCE in the Ancient Near East. The first is the dispute over the historical Mesopotamian chronology, which involves vast and complex textual and archeological data. The focus of the discussion is the dating of the first dynasty in Babylon and its well-known king, Hammurabi. At least five chronological systems, stemming from astronomical observations of Venus in the eighth year of the 10th ruler of the dynasty, Ammisaduqa, have been proposed (Cryer, 1995;Rochberg-Halton, 1988;Schwartz, 2008 mat, 1992), and Mari is mentioned in a tablet found at Hazor (Horowitz & Wasserman, 2000); the suggestion that Hazor of the Mari texts is not the site discussed here (Astour, 1991)  Hazor XVII-XVI as well as the destruction of Mari is not radiocarbon dated, but Megiddo can serve here as a chronological anchor and proxy. Our magnetic intensity data show that Hazor strata XVII-XVI are earlier than Megiddo K-11 (Figures 7 and 8), the duration of which is radiocarbon dated to 1626-1579 BCE (68.2% probability range). The pottery assemblages of Hazor strata XVII-XVI cover the Middle Bronze II and the Middle Bronze III. Megiddo K-11 could have started in the late Middle Bronze II and continued into the early days of the Middle Bronze III. Middle Bronze II Megiddo K-12 was probably contemporaneous with the early days of Hazor XVI and, possibly, the late days of XVII (perhaps also Level K-13, but thorough work on the ceramic assemblages of both Levels K-12 and K-13 has not yet been carried out). The missing datum in order to wrap up all this information is the earliest possible date for "Greater Hazor," sometime during the life of Stratum XVII. Fixing it depends on the date of transition from the Middle Bronze I to the Middle Bronze II (ceramic phases), which is also debated, between ca. 1850 and close to 1700 BCE (see summary in Höflmayer [2017], Figure S2). Finally, we should note that judging from its material culture, including the six phases of construction , "Greater Hazor" must be given a longenough duration.
From the perspective of Hazor and Mari, when put together, all these can be summarized as follows (emphasis only on Middle and Ultra Low Mesopotamian chronology): •  (Höflmayer, 2017). In this case, the Hazor-Mari overlap would fall at ca. 1850/1800-1760 BCE In the compilation in Figure 7, we still cannot resolve the chronological debates concerning the absolute chronology of the destruction of Mari and the Middle Bronze I/II transition in the southern Levant. Instead, we plot the archeointensity data from Mari as published Gallet et al., 2008;Genevey et al., 2003), assuming its destruction in 1760 BCE (Middle Chronology) and Hazor XVI-XVII between 1800 and 1750 BCE as calculated from the archeomagnetic Bayesian model.

Conclusions
A considerable amount of archaointensity data were published from the northern Levant and Mesopotamia for the time interval between the 23rd and the 15th centuries BCE. Yet, a simple stack of the data displays large inconsistencies that cannot be easily resolved, due to the application of different archeointensity methods and selection criteria, as well as different historical-archeological chronological schemes. Here we established a basis for comparison between data derived by the Thellier-IZZI-MagIC and the Triaxe methods from a blind test that resulted in indistinguishable results or a difference that was less than 1 μT. Based on the conclusion that the two methods are compatible, we constructed a reduced and synchronized archeointensity compilation using 37 fragment groups from Mari, Ebla, Hazor, and Megiddo, which includes new data from 10 groups from Hazor and Megiddo. Additional eight fragment groups from four other nearby sites are in agreement with the reduced compilation and hence we include them in our compilation. The reduced compilation is constrained by five radiocarbon-dated levels at Megiddo and one radiocarbon-dated stratum at Hazor. The compilation illustrates a prominent field intensity minimum centered around the 18th century BCE, which spans over at least 200 years and marks the lowest field intensity in the Levant in the past 4,500 years.
A comparison of the synchronized-reduced compilation with other data, derived using different archeointensity methods and/or different acceptance criteria, is more complicated. Discrepancies can be attributed to archeointensity biases, different archeological chronologies and age terminologies, or problems of archeological contexts. As these biases can be problematic to address without rigorous comparison between the methods, we suggest using the reduced compilation presented here (Table 5 and Table S2) for modeling purposes and archeomagnetic dating rather than using the entire set of published data.
From the perspective of archeological chronology, the new data may provide a new tool for assessing the chronology of the first half of the second millennium BCE in Mesopotamia and the northern Levant, and the chronology of the Middle Bronze phases in the southern Levant. In particular, the new archeointensity data provide constraints to the age of the Middle Bronze II/III strata at Tel Hazor.

Data Availability Statement
The Thellier-IZZI-MagIC paleointensity measurements data are available in the MagIC database (earthref. org/MagIC/16857).