Non-Destructive Survey of Early Roman Copper-Alloy Brooches using Portable X-ray Fluorescence Spectrometry

This paper argues that portable X-ray ﬂ uorescence spectrometry (pXRF) is a suitable elemental measurement technique to study the production of copper-alloy artefacts. However, rather than try to imitate the accuracy and precision of laboratory techniques, it is more bene ﬁ cial to deploy it in a survey role, one that attempts to model chronological and geographical changes within large quantities of artefacts. To achieve this, it was investigated to what extent corrosion and the issues surrounding surface measurements affect the potential of this type of research. Analyses on early Roman period brooches gathered in the Nijmegen region of the Netherlands were subsequently compared with published data.


INTRODUCTION
This paper argues that portable X-ray fluorescence spectrometry (pXRF) is a suitable elemental measurement technique with which to study the production of copper-alloy artefacts. Rather than attempt to match the accuracy and precision of laboratory-based techniques, we deploy pXRF in a survey, or screening mode, arguing that it is more suitable to model the social, chronological and geographical nature of 'metal flows' through the non-destructive study of large groups of artefacts (Needham 1998, Pollard et al. 2015. The primary goal of the present research is to investigate to what extent the analyses of the corrosion layer by pXRF can be used for classifying copper-alloy artefacts. The ability to classify objects by their alloy composition is critical if pXRF is to be a valid technique for this approach.
Our approach is to explore the relationship between the form and bulk alloy composition, both of which are intentional actions determined by human choice: the choice of alloy composition can be guided by such factors as raw material availability, workshop organization, and trade or exchange. Many large collections of copper-alloy artefacts have become available for study since the 1970s when the cost and availability of metal detectors resulted in widespread use by both amateur and professional users. Many thousands of these items, which would otherwise have been lost to modern farming practices or building projects, provide ideal data sets for compositional study. Although much of the material has been recovered without a detailed archaeological context-in contrast to those found in carefully planned site excavations-it does provide an opportunity to study trends in availability and choice of alloys in a typological, geographical and chronological context. By using a large Roman period pXRF data set, measured from collections in the Nijmegen region of the Netherlands, we explore the effectiveness of the technique by comparing the traditionally held problems associated with pXRF against the requirements needed for more socially orientated research models. Other studies have demonstrated the usefulness of the technique for late Roman and early Medieval copper-alloy objects .

The traditional challenges facing pXRF
Recent years have seen a significant increase in the use of pXRF devices in archaeological research (Shackley 2010, 17), especially those that are compact enough to be held in the hand. These instruments are typically user friendly and allow for a high throughput of non-destructive analysis. Much criticism has been put forward, however, concerning their reliability when used by operators with inadequate analytical training (e.g., Shackley 2010, 17;Speakman and Shackley 2013). The point, shoot and read nature of these devices means they can be operated with the minimum of training and supervision. This does not mean, however, that interpretation of the results should be undertaken at that same operator level. Archaeological pXRF projects should include researchers who are experienced in both the interpretation of compositional data and the basics of XRF analysis (Shackley 2010, 18) to produce valid and reliable results.
Because of the continuing development and miniaturization of the XRF technique, many portable machines on the market today now have superior detector resolution than laboratory equipment in use a decade ago (Frahm andDoonan 2013, 1080;Shackley 2013, 1436). Although higher precision and lower detection limits may be obtained with more standard analytical laboratory techniques (e.g., Lab-XRF, neutron activation analysis (NAA), atomic absorption spectrometry (AAS), scanning electron microscopy (SEM) or inductively coupled plasma (ICP) techniques, which are also improving), the need for artefact damage-by drilling or the removal of corroded surface layers-may be problematic for many objects. The non-destructive nature of the technique therefore makes access to and analyses of artefacts possible which would be inaccessible for damaging techniques (see below).
However, while it has been shown that it is possible to obtain reliable compositional data on corroded, ancient copper-alloy surfaces using XRF (Lutz and Pernika 1996), it is still a surface-measuring technique. Therefore, it becomes necessary to consider any compositional variation present on the surface of artefacts. Variation can be caused by sample inhomogeneity, the effects of corrosion, including surface irregularity and variations in thickness. These parameters affect the outcome of all analytical techniques, but in most cases pXRF optimization of sample measurement conditions due to its nature and restrictions placed on it by artefact owners (e.g., forbidding destructive sample preparation) is minimal or lacking.
Despite these less favourable conditions, a number of archaeological applications of pXRF have since been published that include, for instance, insights into the production organization behind the bronze weapons found with the Terracotta Army in Xi'an, China (Martinón-Torres et al. 2012) as well as later Roman and early Medieval studies by the present authors , Roxburgh et al., 2016b. A study of both early and late Roman brooches at Richborough, Kent, by Bayley and Butcher (2004) is particularly useful in its comparison of quantitative and qualitative techniques (XRF versus AAS), showing especially that the assignment of alloy names is comparable, but only when measuring large data sets (Bayley 1992, 301, Bayley andButcher 2004, 22). Lately, the approach has also been applied in an analysis of the social organization behind the production of late Roman brooches in northern Gaul (van Thienen and Lycke 2017).
The use of pXRF devices in archaeometric research is dependent, therefore, on an appropriate methodological approach. This includes adequate sample preparation and a calibration process relying on internationally recognized standards (Kaiser and Shugar 2012). The usefulness of pXRF devices depends on whether the achievable level of measurement accuracy and precision is enough to address a particular archaeological problem.
In the case of metallic artefacts, the major advantage of pXRF devices is that they can be employed in non-destructive analysis. This is especially true for copper-based artefacts where preservation of the patina, or corrosion layer, is often considered to be of great importance by conservators. Since pXRF allows for safe, non-destructive analyses, much larger numbers of artefacts can be released by museum curators. An additional advantage is that analyses are quick and require short preparation times, which allows for faster analyses of large collections, with the same machine and settings. However, more than for other artefacts, the elemental composition of the corrosion layer from alloys will almost certainly differ from the uncorroded core. This would hinder the use of pXRF analyses in a pure 'provenance' role. But the potential for the technique to group artefacts according to variations in their alloy composition and thus engage with a different set of questions-those relating to human interaction, their deliberate choice of alloying elements, recycling practices, questions that reflect social and economic change over broad chronological and geographical frameworks -is certainly large.

Artefact production and compositional variation
The composition of copper-alloy artefacts is influenced by intentional acts by the craftsmen as they manipulated the alloy in a liquid and a solid state. Studying variation in alloy composition is an avenue of research aimed at a better understanding of the social organization of craft production in past societies. For this, we need to determine and group the compositional variation in the alloys of a considerable number of objects of a uniform typology. Comparing alloy composition with form allows us to try to infer, for example, whether or not objects were produced in large production centres and subsequently distributed. Or, conversely, whether they were more likely produced in widely dispersed local workshops. Other, related, inferences could include whether the raw materials came from several sources or from centralized supply centres (e.g., Ling et al. 2014) or if scarcity of the raw materials induced recycling practices (and if that were the case, to what extent different alloys were sorted before being remelted?).
Even when the raw material sources are unknown, the degree of centralization of raw material procurement can be inferred from the compositional variation in the artefacts. Recent pXRF studies on the copper-alloy weaponry found with the terracotta warriors, Martinón-Torres et al. (2012) attempted to shed a light on production organization and craft specialization employed by ancient craftsmen. The degree of compositional uniformity found in large groups of objects such as these is likely to suggest something further about the organization of labour, transmission of technical knowledge and cross-craft interaction. High degrees of uniformity over a wide geographical area could imply that raw materials were sourced and supplied in a centralized fashion, or that artefacts were mass produced in a central workshop. In contrast, significant compositional variation may indicate separate production modes, and differences in raw material sourcing including variation due to recycling. Thus, the study of artefact composition can provide useful information when investigating the organization of production, providing the right questions are asked.

Analytical challenges
A reliable level of accuracy and precision has previously been demonstrated using XRF on the corrosion layer on copper alloys (Lutz and Pernika 1996). However, the degradation of copper alloys typically results in the formation of a corrosion layer that may be depleted in copper (decuprification) and, therefore, enriched by the alloying metals, when compared with the intact metal core. Different soil types may also influence the rate at which an object corrodes and, therefore, the type of corrosion, its intensity and the direction in which the changes take place interfere with the determination of an original alloy composition for an artefact. To complicate this issue further, surface concentrations may also vary considerably given the heterogeneous distribution of lead globules within a copper alloy (Smit 2012). In addition, deposition of iron hydroxides or sulphides onto metal objects often occurs in the soil. The effect of such an iron-rich layer is that secondary copper radiation is absorbed, but that higher energy secondary tin X-rays are relatively enhanced.
The reflection depth of X-rays in metals is dependent on the mass attenuation of the material and the incident X-ray energy. In metals, the critical reflection depth, i.e., the depth of analyses is restricted for most elements below 0.1 mm (Gigante et al. 2005). As a result, in the case of metals, XRF devices provide the composition of an object's surface and thus will be influenced by compositional changes caused by corrosion effects (see also Gigante et al. 2005, Orfanou andRehren 2014).
In the present study, we assessed the challenge presented by these surface variations by comparing XRF analyses on corroded and subsequently cleaned surfaces of a group of Roman brooch fragments. We then test whether the corrosion-induced changes in alloy ratios affect the potential for using the data for studying the human choices involved in production.

Materials
Non-destructive measurements were taken from 187 identifiable bow brooches from the private collection of Harry Sanders, which was made available for scientific study at the Bureau Archeologie & Monumenten at Nijmegen (Table S1). The objects, which dated between 150 BC and 200 AD were all recovered during archaeological fieldwork in and around the city of Nijmegen and from metal detection in ploughed fields in neighbouring municipalities. Nijmegen is located on the south bank of the River Waal (a branch of the Rhine) and was the site of a major Roman military and civil centre, situated on a hill at the apex of the Rhine and Meuse delta. The soils around Nijmegen and those to the south are generally sandy, well drained and relatively acidic (podzol types). North of the river, however, the region is dominated by clayey limebuffered soils with elevated groundwater tables. There were 14 additional bow brooch fragments (reserved for destructive mechanical cleaning) that were recovered from the villages of Elst and Oosterhout, both on the northern side of the river.
The following common typological names have been used to identify the brooch types in this survey. The earliest measured is the pre-Roman Nauheim series (150 to 70 BC), included as a comparison for the following early Roman types. For the 187 complete brooches, we have the Aucissa, Eye and Almgren 15 series (20 BC-80 AD, 5-100 AD and 30-180 AD respectively), then the corrosion test includes 14 additional brooch fragments, including fragments from the Almgren 20 series (20 BC-100 AD) and also van Buchem 24 and Böhme 19 variants (both second century AD). See Heeren and van der Feijst (2017), types 8, 30, 20, 45, 17, 47-48 and 51 respectively for recent typological analyses and links to earlier publications. A common manufacturing characteristic for brooches of this period was that many types were wrought (hammered) out of a single piece of copper alloy, including the spring (Bayley and Butcher 2004, 32). This meant these particular brooches were made in a extremely low or unleaded alloy, because of the working limitations of lead in alloys (Bayley and Butcher 2004, 15). Other brooch types were composed of multipiece assemblies, with the spring being made separately from the body of the brooch. This meant that higher proportions of lead could be added to the main body because it was made separately from the spring. Of the brooch types used in this study, the Aucissa and Böhme 19 variants are multipiece assemblies. The rest are one-piece variants.
The organization behind brooch production is still attracting archaeological debate. One theory being explored is that brooch production was linked to military workshops, primarily due to their similarity in form to a number of military items (Roxburgh et al. 2016a, 413). The Aucissa type, for example, is considered by some to be the soldiers brooch, which if the case allows the question to be posed as to whether all brooches at this time had a close relation to the army or whether some types were more local in nature, perhaps linked to a particular regional group, which may be the case for the Almgren 15 series (Heeren and van der Feijst 2014, 99). The debate surrounding the introduction of brass into Roman production is of some importance to this question. It is thought that brass was first produced on an industrial scale during the first century BC (Bayley 1998, 8-9). Furthermore it is also thought that during this early period the Roman state reserved it for the production of military gear and coins, but at some point towards the end of the first century AD its use rapidly declines Butcher 1995, 118, Dungworth, 1997, 903). To engage this debate with pXRF, large numbers of individual brooches from well-defined typologies are required in order to assess how homogeneous they are over a wide area. This cannot as yet be done through other methods requiring the destructive cleaning or sampling of large numbers of items. If pXRF can do this non-destructively, then a comparison can be made between pre-conquest and Roman period production (between brass and bronze use in particular), then further an exploration of alloy types found at later military production sites can be compared with local tribal settlements.
For the Netherlands (the province of Germania Inferior in Roman times) this ability to explore alloy choice and the level of homogeneity present in large numbers of brooch types enables useful comparisons to be drawn with research in other regions, such as that undertaken by Bayley and Butcher (2004) at Richborough in Britain.
Methods pXRF A Niton XL3t GOLDD Handheld XRF analyser was used for this study. It was factory calibrated with standards for metals and alloys and equipped with a large area silicon drift detector with optimized geometry. The electronic metals mode was selected and used throughout the data-gathering phase. This takes an off-the-shelf approach rather than rely on the development of custom calibrations. The advantage of this mode is that the same metals of interest (i.e., those found in Roman and Medieval alloys) are used in modern electronic equipment (Cu, Sn, Ag, Zn, Au), or are marked as potential hazardous materials (Pb, Hg, As, Se). In order to test whether this mode, which may not have been originally designed for use on copper alloys, was suitable for this application, we tested its performance on the Cultural Heritage Alloy Reference Material Set (CHARM) set of reference metals (see below).
The analyser was mounted on a portable test bench (hence, we use pXRF not hhXRF; see Frahm andDoonan, 2013, 1426, for further labelling discussion), with a lead cover to provide a consistent operating environment whilst protecting the user from radiation. The test bench made it possible to place the objects over the 8 mm spot size easily. In most cases the objects fully covered the opening, but some were slightly smaller. However, slightly varying the angle of the object to the opening, including the amount of coverage across the opening, was found to be insignificant in terms of the method proposed here. By checking the machine read-out during analyses, it was found that the machine-reported analytical error, for the elements of interest, was < 0.2% with reading times > 35 s, so this reading time was deemed sufficient for the present study (see below for the performance of the analyses with this reading time on the CHARM set of reference materials). Two spectrum readings per 35 s intervals were taken, the first for the main range of elements at 50 kV (Cu-K to Ba-K, and Au-L to Pb-L) and the second for the low range at 10 kV (Al-K to Cu-K). After the analyses, the spectra were checked individually for inconsistencies and unexpected overlaps. Based on these checks, the analytical values for arsenic were discarded because of a peak overlap of lead.
One measurement was taken for each item, on the front, central bow section of the brooch, or on the front, head or foot sections when fragments were used. An external normalization of the completed data set in a spreadsheet program (Microsoft Excel ™ ) was then undertaken, which corrected for the contribution of the light elements that would be present in the patina due to contamination from soil residues (such as sand, clay and iron hydroxides). The elemental concentrations of the alloying elements were normalized on a light elements (Si-Fe)-free basis. In the present paper, only the main alloying elements (Cu, Sn, Zn, Pb) are considered.
The factory calibration of the device was checked against the reference samples of the copper CHARM set (Heginbotham et al., 2015). The results (Table 1 and Fig. 1) show that standard deviations remain within a few per cent. Plotting the results in a ternary diagram, such as the one uses for all analyses results (see below), shows no deviation for all but one of the samples. Only sample 32xLB10 deviates in its lead content, possibly due to the inhomogeneity that is common for leaded copper alloys.
Corrosion study The study was conducted in a similar manner to that of Fernandes et al. (2013) by comparing the compositions of the corrosion layer and uncorroded core of the artefacts. Contrary to some of the Fernandes et al. objects, we fully removed the corrosion layer for all tested artefacts. The pXRF analyses were conducted on both corroded and cleaned surfaces and typically 0.5-1.0 mm in depth were removed to reach the uncorroded core (Fig. 2, inset), using the same type of Niton machine as described above, including settings and calibration.
Visualization of compositional data The study of copper alloys in Roman Britain by Butcher (1995, 2004) demonstrated the effectiveness of using triangular (ternary) diagrams for visualizing the concentrations of Sn, Zn and Pb measured using AAS (Fig. 2). The study identified multiple distinct compositional groups of Roman brooches that matched well with typo-chronological groupings. The use of ternary diagrams implies that concentrations of the selected alloying elements (Sn, Zn, Pb) are normalized to 100% so that copper content is not taken into account. This has a disadvantage that objects that are almost pure copper without care will still be classified as an alloy (e.g., a bronze or brass) on the basis of very low concentrations of alloying elements. It is important, therefore, that an assessment of the data is made before producing these diagrams-checking and removing items produced in copper, lead or tin, for example. The measurements should also be used as reference when interpreting the nature of any groups present, such as determining if certain elements were deliberately added or not. It must also be kept in mind that the alloy classification in these diagrams does not match the common alloy definitions based on absolute concentrations. A major benefit, however, is that large numbers of compositional analyses can be visualized in one diagram, so that trends in alloy choice can be identified for differing groups of artefacts. Butcher (1995, 2004) have set a benchmark by showing how compositional variation-in their case determined by AAS analyses-as depicted in triangular diagrams is especially suitable for Roman fibulae, and how distinct groups are easily distinguished by eye. Pollard et al. (2015) applied a different approach for interpreting a large database of copperalloy artefacts: they classified copper alloy artefacts based on absolute concentrations, using a threshold of 1% to identify deliberately added alloying elements to the metal melt. The present approach is different, partly out of necessity: a 1% threshold cannot be maintained in corrosion-affected measurement data. More important, however, is that a 'hard' a priori classification may not do justice to the actual grouping that, for example, become apparent in ternary graphs such as those published by Butcher (1995, 2004). Visually assessing the degree of separation between groups, or recognizing mixing lines between end members is a necessity for studying trends in alloy choice that is only possible by using such graphs. Bayley and Butcher (2004, 24) divided their ternary diagrams into alloy classes (Fig. 1, inset). These classes (e.g., brass, bronze and gunmetal) are of course associated with terminology employed in modern metallurgy. A hard classification between leaded and unleaded alloys is also not maintainable in corroded objects so a qualitative judgement based on the construction limitations mentioned above is preferred (i.e., one typological group may systematically show a higher lead measurement than another). In the present diagrams, we indicate a simplified version of this classification (with brass (> 4% Zn), bronze (≥ 3% Sn) as background without imposing a rigid classification. Note that these diagrams do not include copper contents and are therefore not suitable for determining alloy properties. It is also important to understand that ancient names and their corresponding alloy ratios are not well understood (Bayley and Butcher 2004, 14). Attempts to impose rigid modern classifications would be unlikely to reflect historical boundaries, intentionally created or otherwise, by ancient craftsmen. Plotting large numbers of measurements in ternary diagrams and observing their distributions allows for a better understanding of historical boundaries and, therefore, technical choices. In the present study, we therefore only lightly touch on this classification, and put more confidence in the study of alloy distributions in ternary diagrams.

Corrosion effects on pXRF analyses
A comparison of the corroded and uncorroded measurements on the Nijmegen brooch fragments is shown in Table 2 (for additional data, see Table S1). In general, we can see an average depletion in copper contents of 35% for the seven brooches alloyed with tin. For the two brooches having relatively high tin and zinc contents, the average depletion in copper was 18.5%, but only a small depletion of 0.7% zinc was observed. The remaining five brooches were alloyed with zinc and exhibited a zinc depletion of 9%.
The results show that decuprification and dezincification represented the main corrosion processes at work. Depletion in copper content was observed to be the most relevant change in bronzes and the leaching of zinc from brass objects was also common.
When plotted in a ternary diagram, the effects of corrosion and heterogeneity can be compared with the alloy groups (Fig. 2). Since a triangular diagram is based on a copper-free recalculation (so Sn + Pb + Zn = 100%), small variations of alloying elements in copper-rich alloys are magnified. The comparison shows that two measurements that initially are classified as bronze after cleaning become gunmetal. A further two measurements that initially fall into the gunmetal classification with around 50% zinc after cleaning cross into the brass classification, with a new value of 95% zinc-although the composition has not changed. There is also a leaded bronze measurement that also sees a slight increase in lead and an associated decrease in tin. This may be due to the inhomogeneous distribution of lead in copper alloys. For the remaining nine items, the corrosion effects are limited in the sense that they still fall within the same broad compositional group. For our study, it is important to realize that the corrosion effects have not affected the overall grouping; the zinc-dominated alloys ('brass'), the tin-dominated alloys ('bronze') and the intermediate ones ('gunmetal') are still recognizable as distinct clusters, even though the 'brass' group shows more inter-artefact variation, especially those samples with a high copper (i.e., low alloying elements) content.

Grouping the early Roman brooches
The Nauheim brooches appear to have only been made with tin, whereas the (later) Aucissa brooches are made with zinc ( Fig. 3) (for the data set for the Nijmegen brooches, see also Table S1). This observation is in line with the brass alloys from Roman period brooches and military equipment found in Masada, Israel, as well as in Britain and other locations in Western Europe (Bishop and Coulston 1993, Ponting and Segal 1998, Bayley and Butcher 2004 The group of measurements for the Eye brooch series show that they are also zinc based, similar to the Aucissa brooches. In contrast, however, the measurements for the Almgren 15 wire brooch series reveal two clusters: one tin based, the other showing a mixing line from zinc-to a lead-and tin-based copper alloy. Comparing this with the Eye brooch and the Almgren 15 measurements from Britain (for the British results, see Fig. 3) shows a similar division in zinc-and tin-based alloys, but with a different distribution for the outliers-probably due to the corrosion processes. If these distributions are compared with the displacement values seen in Figure 2, the outliers on the Nijmegen results would be more in line with the British results, after decuprification, dezincification or secondary copper X-ray absorption processes have been considered. Although outliers exist in all cases, it is possible to distinguish the core alloy properties when observed in large numbers. Also knowing that the one-piece brooches contain by necessity little or no lead (see the second section), the large proportion of measurements are below the 20% lead line in the ternary graphs.

DISCUSSION
The tests on the CHARM reference set showed good levels of accuracy and precision under more ideal analytical circumstances. On corroded brooches, however, lack of sample preparation and non-ideal measurement conditions (e.g., not removing the corrosion layer, or preparing a flat surface and variations in angle of brooch to sensor) do not prevent a broader classification of alloys into zinc-or tin-rich traditions. Within the limitations of this measurement method (e.g., no removal of the corrosion layer) we can still address relevant archaeological questions such as those mentioned above. The offsets between corroded and uncorroded compositions demonstrated that the corrosion effects do not preclude the recognition of separate alloy groups. A measurement taken from a corroded item subsequently classified, for example, as zinc based will not have come from a tin-based alloy.
The results for the Nijmegen study can therefore be compared with the trends found by Butcher (1995, 2004). Although we measured the corrosion layer, it is clear that pXRF measurements on corroded brooches can be classified broadly into archaeologically relevant groups. Therefore, subsequent interpretations can be formed alongside those presented in more ideal circumstances. Furthermore, because the rapid and non-destructive technique allows for more items to be measured than with any other method (enabling the better definition of the core distribution for brooch types), it gives a better chronological resolution, allowing a closer view of the transition between types such as those objects changing from bronze to brass, or those staying in bronze during the early Roman period. CONCLUSIONS Systematic compositional differences for the major elements (tin, lead, zinc) are observed between the corroded surfaces and the uncorroded metal cores. However, the magnitude of these changes only becomes relevant when considering the specific research question. For questions needing simple alloy classification, achieved through identifying the compositions of large sets of artefacts, it has been shown that compositional ratios remain within a satisfactory tolerance. The results for Nijmegen showed a difference between pre-Roman and early Roman brooches, the former being produced in a tin-rich tradition and the latter in a zinc tradition. Furthermore, where two alloy traditions were present within a brooch series, it was noticed that the same was true of the same series measured in Britain.
It has been demonstrated, therefore, that data provided by an appropriate application of pXRF -perhaps best described as a survey or reconnaissance role-are reliable enough to detect the deliberate control of composition between typological, copper-alloy groups and that the data are comparable with earlier research.
Understanding more about how corrosion affects the results of non-destructive measurements is an important step forward in adopting an appropriate application of pXRF to archaeological copper alloy. Once an association between typology and alloy is identified using this approach, the next step would be to explore the results in more detail using much fewer, carefully selected examples under more rigorous laboratory conditions.