Which isotopes should we choose? Entropy-based feature ranking enables evaluation of the information content of stable isotopes in archaeofaunal material

Funding information Archaeobiocenter of the Ludwig Maximilian University, Munich; German Research Foundation (DFG), Grant/Award Number: Gr959/15-1,2 Rationale: Methods for multi-isotope analyses are gaining in importance in anthropological, archaeological, and ecological studies. However, when material is limited (i.e., archaeological remains), it is obligatory to decide a priori which isotopic system(s) could be omitted without losing information. Methods: We introduce a method that enables feature ranking of isotopic systems on the basis of distance-based entropy. The feature ranking method is evaluated using Gaussian Mixture Model (GMM) clustering as well as a cluster validation index (“trace index”). Results: Combinations of features resulting in high entropy values are less important than those resulting in low entropy values structuring the dataset into more distinct clusters. Therefore, this method allows us to rank isotopic systems. The isotope ranking depends on the analyzed dataset, for example, consisting of terrestrial mammals or fish. The feature ranking results were verified by cluster analysis. Conclusions: Entropy-based feature ranking can be used to a priori select the isotopic systems that should be analyzed. Consequently, we strongly suggest that this method should be applied if only limited material is available.


| Stable isotopes
Diet can also partly be assessed by oxygen stable isotope ratios due to water incorporated with food. 1,2 Therefore, there is an additional link of different isotopic systems. On the contrary, differences in diet can also be explained by a different origin or even a different ecological niche of an individual. This signal is also contained in oxygen stable isotope ratios.
Furthermore, the oxygen isotope ratios of bone carbonate and phosphate are partly linked. They have similar sources; however, δ 18 O phosphate values are more influenced by the isotopic composition of drinking water, while δ 18 O carbonate values rather depend on diet. 1 Nevertheless, both can give hints on water source, diet, and a variety of environmental information, for example, climate, altitude, and latitude. 3 Due to these relationships the question arises whether it is mandatory to measure all isotopic systems.
Multi-dimensional isotope analyses using modern data mining methods are capable of interpreting isotopic data in a more detailed way than common bivariate analyses. [4][5][6][7] For example, cluster analysis of isotopic data of fish from Haithabu and Schleswig using Gaussian Mixture Model (GMM) clustering (see 1.3 and 2.3) not only separated fish according to their habitat (freshwater, brackish, marine) but also revealed a fourth cluster of probably non-local fish from a colder environment. These groups could not be detected in the bivariate plots. 4 Therefore, it is advisable to use multidimensional isotopic data whenever possible. However, especially in the case of archaeological material, it is not always possible to measure all different isotopic systems due to a lack of sample material. Furthermore, some isotopic systems might not be useful for answering the research question anyway. Consequently, it is necessary to select certain isotopic systems with respect to a basic hypothesis that should be tested. However, which isotopes could be omitted without too much loss in information for the dataset?
Which isotopes should we choose? In this paper, we present entropy-based feature ranking of multi-isotope fingerprints established on archaeozoological finds from a particular complex ecosystem (see section 2.1).
Therefore, the aim of this study was to evaluate isotopic systems of different subsets in order to examine if there were certain isotopic systems that do not contain much (additional) information in general and are consequently not necessarily be used if (archaeological) material is insufficient. Furthermore, combinations of certain isotopes may contain more information than others. We tested several subsets with different potential research questions (e.g., diet and non-local origin) in order to analyze which combinations of isotopes contained the highest information and which isotopic systems could be omitted due to only low information content.
A distance-based entropy measure (see below) was used to rank the isotopic systems as well as the different combinations of these systems to evaluate how important (in terms of information content) the different stable isotopes were in general with respect to the basic research questions. These research questions might include, for example, the detection of primarily non-local individuals, individuals of different cultural or social status and thus also individuals with different dietary habits, and individuals of different habitats and different ecosystems. Consequently, we expect that the dataset is separated into different groups, i.e. clusters, if the isotopic systems analyzed exhibit some information content. A clustered data structure is an important prerequisite for the method described below.
In addition, features, which are measurable properties or characteristics of e.g. an individual, are often correlated to each other.
This might have an impact on the feature ranking results. Therefore, we investigated the impact of both marginal and partial correlations on feature ranking (see 2.4).
In the following, a distance-based entropy measure is introduced, which allows to differentiation between datasets with and datasets lacking any clustering structure without actually performing cluster analysis. This can be used to rank different features (here: isotopic systems) of a dataset according to their information content.

| Feature ranking using entropy
The aim of feature ranking is to find the most important feature for a specific task. A variety of methods are available for feature ranking.
Recently, the Adjusted Rand Index (ARI) has been applied to multiisotope data to test the relative contribution and importance of δ 18 O phosphate values for provenance analysis. [7][8][9] However, feature ranking can also be based on entropy. Entropy is a measure of the information, choice, and uncertainty of a certain variable. 10 The entropy value of a variable corresponds to its information content. 11 Shannon entropy H is defined as with constant K (K > 0) and probabilities p i . 10 However, in the present study we refer to a modified definition of entropy, namely a distance-based entropy measure. The probability of points, which is needed for Shannon's entropy (see Equation (1)), is usually not known. Therefore, a proxy method was applied to estimate the entropy. Accordingly, distances between data points instead of probabilities are used. 12 Entropy is not necessarily a probabilistic measure as in the basic definition by Shannon. 10 A common data mining approach is to choose a distance-based entropy as a measure of information. [12][13][14][15][16] Distance-based entropy H d can be expressed as with normalized distance matrices D ij between instances X i and X j (see section 2.2 for more details). This entropy measure allows distinguishing between a dataset with clusters and another dataset missing any clustering structure. If the dataset is not structured into clusters, entropy is much higher than in a clearly clustered dataset.
This can be explained by the fact that in a dataset containing some clusters intra-cluster distances are smaller than inter-cluster distances, resulting in lower overall distance values. Minimum distance-based entropy should thus define the optimal combination of features. 12 Therefore, it can be used for feature ranking. It is important to mention that Equation (2) does not imply any equilibration of probabilities (see Equation (1)) and distances. However, this proxy method still results in an entropy measure that can distinguish between unstructured and well-structured datasets. 12

| Gaussian Mixture Model clustering
For an illustration of the feature ranking results, optimal combinations of features were visualized using Gaussian Mixture Model (GMM) prior to the ranking to eliminate this influencing factor using the following formula resulting in values between zero and one: Afterward, distance matrices D ij were computed using Euclidean between all (normalized) data points. The distance matrices created were again normalized according to Equation (3).
As mentioned earlier, the entropy measure used in this study is not identical to Shannon's definition of entropy (see Equation (1)), but uses distances instead of probabilities (see Equation (2)).
The distance-based entropy measure used in this study is similar to that of Dash et al, 12 however with three main differences, which shall be explained in the following: First, Dash et al 12 12 suggested a value around 10, which seems to work well in their study. 12 The meeting point μ should help to differentiate between intra-and inter-cluster distances more accurately. It might be difficult to distinguish intra-and inter-cluster distances if the distance is 0.5. The parameter μ was calculated as 0.185 based on parameter β. 12 Both parameters can help to correct the entropy measure. However, both μ and β must be estimated or set to a (subjective) value. This might consequently introduce an additional inaccuracy. Thus, we did not include these two parameters in our formula (Equation (2)).
The described procedure was performed for all possible combinations of isotopic systems. The number of possible combinations can be calculated by where k is the total number of isotopic systems.
All statistical and data mining analyses were performed using the R software. 28 The R programming code used for entropy calculation is available in the supporting information S2.

| Gaussian Mixture Model clustering
Gaussian Mixture Model (GMM) clustering was performed using the package "mclust" version 5.3 within the statistical program "R". 17,28 To compare entropy-based feature ranking (see above) and clustering results multivariate outliers were removed from the data as described earlier. Furthermore, isotopic data were normalized according to Equation (3)

| Marginal and partial correlations
Correlations between isotopic systems may have an impact on feature ranking results. Two different types of correlation were conducted, namely marginal and partial correlations. The marginal correlation between two variables x i and x j is described by The partial correlation of variables x i and x j while controlling for x k can be calculated by the following equation: Partial correlation describes the relationship between two (random) variables after removing the effect of all other (random) variables. Thus, partial correlation only gives the "unaffected" actual correlation between two variables, without the potential influence of another variable, which was removed from the dataset. Accordingly, partial correlations might be of interest if variables are removed as performed in this study.
Correlation analyses were conducted using R software. 28 The R package "ppcor" was used to calculate the partial correlation. 30

| METHOD EVALUATION
Application of the feature ranking method described earlier strongly depends on the basic question of a research study that should be answered using stable isotopic ratios. In our study we illustrate three possible subsets with different issues and a varying number of isotopic systems to demonstrate the effect of isotopic ranking using entropy.
The entropy values differ between the datasets using four (dataset I) and five (dataset II) dimensions due to the different species included in the dataset as well as differences in sample numbers (see section 2.1).
Furthermore, we must point out that it is not possible to rank features across dimensions, as distance-based entropy mathematically increases with an increasing number of dimensions.
Thus, higher entropy values detected, for example, in the twodimensional subset than in the one-dimensional subset are mathematical artifacts since more distances are computed in the twodimensional subset.

| Evaluation of the feature ranking method
The applied feature ranking method (without preliminary outlier removal) was tested using different artificially generated test sets (T1-T4) with certain properties as illustrated in Figure S1 and variable without any clustering structure (C4) ( Figure S1 and Table S1, supporting information S1).
When the feature ranking method was applied on the four different test sets, the following results were obtained. For test set T1, where a single outlier was found in variable B1, ranking indeed resulted in the lowest entropy values for B1. This was clearly caused by the outlier, which is the only difference between variables A1 and B1 ( Figure S2A, supporting information S1).
Consequently, the exclusion of (multivariate) outliers, as described in the Experimental section, is recommended. Entropy did not find any difference between variables A1 and B1 after the removal of the outlier (not shown in this study). Test set T2, including the same single outlier in variables A2 and B2, however, with an additional grouping into two clusters in the case of variable C2, showed that an outlier (at least a single outlier value) does not have the ability to affect feature ranking in the presence of a clearly structured variable. Variable C2 exhibited the lowest entropy values ( Figure S2B, supporting information S1).
Nevertheless, outlier exclusion might be recommended. In the third test set (T3) with no clustering structure for variable C3 but three clusters in both A3 and B3 with a clearer structuring of the latter one; the lowest entropy value was found for the well-structured variable B3 as well as a combination of variables A3 and B3.
Variable C3 showed a high entropy value because of its unstructured ("chaotic") distribution ( Figure S2C, supporting information S1). Test set T4 was used to evaluate the feature ranking method when a different number of clusters was present in the dataset dependent on the variable. As expected, the separation into three clusters (B4) was favored over a separation into two groups (A4), consequently resulting in low entropy. Variable C4, lacking any clustering structure, again showed a high entropy measure. As for test set T3, the combination of variables A4 and B4 showed the lowest entropy values in the two-dimensional case ( Figure S2D, supporting information S1).
For all tested combinations, feature ranking gave the expected result. As already mentioned, we clearly recommend removing (multivariate) outliers prior to feature ranking to avoid biased results. For each of the eight combinations of herbivores, carnivores, and omnivores tested in this study (evaluation sets A -H, Table S2, supporting information S1), ten sample sets of a given sample size (n = 40 -80; see Table S2, supporting information S1) and of a given absolute ratio of herbivores, carnivores, and omnivores (see Table S2, supporting information S1) were randomly drawn from the whole dataset I. To avoid a bias, multivariate outliers were removed from each subset (herbivores, carnivores, and omnivores) separately. After the removal of outliers, the whole evaluation dataset (n = 92)

| Composition of datasets-an evaluation
consisted of a total of 55 herbivores, 21 carnivores, and 16 omnivores. The distance-based entropy was calculated for each feature combination as described earlier.
Several subsets, including different proportions of herbivores, carnivores, and omnivores, were tested as shown in Table S2 (supporting information S1). The feature ranking result shown in Tables S2 and S3 (supporting information S1) was the most frequent of the ten conducted runs conducted of the ten sample sets tested in each dimension. Feature ranking showed some variability as a consequence of the varying ratio of herbivorous, carnivorous, and omnivorous terrestrial mammals (Table S2, supporting information S1  (Table S2, supporting information S1).
In the two-dimensional case (Table S3,  values, and thus the lowest information content, in all but one (E) set. This seems to be in good accordance with the worst entropy results using one isotopic system only (Table S3, supporting information S1).
Removing a single isotopic system, consequently leading to a three-dimensional dataset, resulted in three different optimal combinations, depending on the proportion of herbivores, carnivores, and omnivores, namely "123," "134," and "234." Interestingly, the combination of δ 13 C collagen , δ 13

| Correlation between the isotopic systems
The results revealed by correlation analysis are shown in Tables S5 and S6 (supporting information S1) for datasets I and II, respectively.
The different subsets chosen for feature ranking (see below) showed several significant correlations between the isotopic systems for both marginal and partial correlations.
No overall pattern could be detected with respect to correlation.
In the present study only the δ 13 C collagen and δ 15 (Tables S5   and S6, supporting information S1).
The correlations detected between the isotopic systems might play an important role for feature ranking (see below).

| Terrestrial mammals
If  Table S7 (supporting information S1). The proportion of herbivorous, carnivorous, and omnivorous mammals was quite similar in the analyzed datasets I and II (Table S7, supporting information S1); therefore, we might expect similar feature ranking results. In both datasets, the proportion of herbivores was markedly higher than those of both carnivores and omnivores (Table S7, supporting information S1). Furthermore, the proportion of herbivores, carnivores, and omnivores was quite similar to that in our evaluation set C (see section 3.2; Table S2, supporting information S1). In addition, we expect a separation of the isotopic data with respect to diet because of the inclusion of herbivores, carnivores, and omnivores in this subset.
The feature ranking results of the terrestrial mammals are illustrated in Figure 1 and  (Table S9, supporting information S1).
Regarding dataset II, clustering without δ 18 O carbonate values ("1235") resulted in three instead of four clusters in the five-isotope ("12345") scenario ( Figures S5 and S6, Table S10, supporting information S1). Consequently, in this case at least some information loss can be observed.
These findings were in accordance with the cluster validation results using the trace index with an optimal trace index for exactly those combinations of isotopic systems resulting in optimal (minimum) entropy values (see Table 1). Therefore, we expect that combinations of isotopic dimensions exhibiting the lowest entropy values also result in good clusters due to their optimal trace index as demonstrated by GMM clustering.

| Fish
The fish subset (n = 46) showed low entropy values when analyzing δ 13 C collagen values only as well as combinations of isotopic systems including collagen carbon isotope ratios (Figure 3 and Table 3). As for the herbivorous subset, combinations with δ 15 N collagen values resulted in rather high entropy values, probably due to the poor information content of this isotopic system. Using the information of more dimensions, especially the combination of δ 13 C carbonate and δ 18 O carbonate values exhibited a low entropy value (Figure 3 and Table 3). According to the feature ranking results, the removal of δ 15 N collagen values should cause a rather small loss of information ( Figure 3 and Table 3).
GMM clustering of all isotopic dimensions compared with the clustering without δ 15 N collagen values ("134") showed two identical clustering results with two relatively distinct clusters with the exception of only two individuals (perch 48FB5Pop, pike 10H1C; Table S12, supporting information S1).
A previous cluster analysis (without data normalization) of the fish dataset from Haithabu and Schleswig revealed an optimal number of four clusters, namely a freshwater cluster (cluster 3), a brackish water cluster (cluster 4), and two marine clusters (clusters 1 and 2; see Table S13, supporting information S1). 4 Freshwater, brackish, and marine clusters were mainly separated from each other due to their  Figure 3 and Table 3), was very similar to the clustering of the whole dataset without prior normalization of the data. 4 This was especially conspicuous for the cluster including probably non-local fish (cluster 1 in case of "34"), which was the most important cluster when the task was to detect primarily non-local individuals (Table S13, supporting information S1; see Göhring et al. 4 ). Cluster 1 differs from the previous cluster 1 by only three individuals. Two individuals (cod 1D1V and cod 4D4V) were previously grouped into the marine cluster 2, and another cod (cod 42D4V) was previously grouped into cluster 1. However, when clustered using the (normalized) carbonate fraction only ("34"), this individual was grouped into the marine cluster (cluster 3 in case of "34"; Table S13, supporting information). Cluster 2 combined the previous freshwater cluster (cluster 3 in the case of the not-normalized "1234") and parts of the brackish water cluster 4. The remaining brackish water individuals were grouped into the fourth cluster in the case of "34" (Table S13,  ). The differences between the two cluster analyses are rather small, probably as an effect of data normalization. As already mentioned, clustering with the carbonate fraction only ("34") revealed the cluster of probably non-local fish, which can be considered one of the main goals of Göhring et al. 4 The trace index coincided with the ranking results only in the one-dimensional case. For both two-and three-dimensional feature combinations, the trace index was equal to the second-best feature ranking result ("13" and "123"; Figure 3 and Table 3).

| Evaluation of the feature ranking method
The evaluation procedure using isotopic data demonstrated that the ranking procedure is, logically, dependent on the composition of the  (Table S2, supporting information S1).
Thus, the shift in the ranking could also be caused by the rather low sample sizes. However, test set E had a similar (but slightly higher) sample size (n = 52), and the feature ranking method still showed the lowest entropy values for δ 15 N collagen (Table S2, supporting information S1). Nevertheless, it is advisable to perform the feature ranking method with a higher number of data to gain trustworthy results.
Entropy-based feature ranking was compared with the trace index, which is a cluster validation index. Clustering should be optimal when the trace index is maximum. Moreover, an optimal clustering, in the sense of clearly structured data points, should result in the lowest entropy values. Thus, we expect similar results for ranking and validation.
The optimal feature combinations according to the trace index were identical for the subset of terrestrial mammals (Table 1). Some variations were present in the subsets of herbivores and fish (see Tables 2 and 3). These differences can be explained as follows: The entropy values of feature combinations "123" and "234" in dataset I of the subset of terrestrial herbivores were almost identical with slightly better results for "123" (see Figure 2A). However, the removal of δ 18 O carbonate values ("123") did not result in the optimal clustering structure according to the trace index. Similarly, as regards the fish subset feature combinations "34" and "134" were classified as optimal with respect to their entropy values. However, even the combinations of "13" and "123" resulted in relatively low entropy values, thus indicating a quite well-structured dataset, namely the second-best ranking results for the two-and three-dimensional sets, respectively.
Indeed, both "13" and "123" were optimal according to the trace index. In addition, herbivores showed differences in the twodimensional ranking of both datasets I and II. While both these combinations were rather different from the optimal combination with respect to entropy, they showed relatively similar results for the trace index (not shown in this study). This could explain the divergences between entropy-based feature ranking and trace index.

| Terrestrial mammals
For the subset including all terrestrial mammals, entropy-based feature ranking pointed towards relatively high information content, especially as regards both δ 13 C collagen and δ 15 N collagen values. Since the carbon and nitrogen isotope ratios of bone collagen are related to the protein part of the diet, it comes as no surprise that a dataset including a mixture of herbivores, carnivores, and omnivores (Table S7, supporting information S1) can be best separated according to these isotopic systems.
On the contrary, the information provided by δ 18 O carbonate values was not sufficient, resulting in quite high entropy values ( Figure 1 and

| Herbivorous mammals
Terrestrial herbivores were best separated by δ 13 C carbonate values and their combinations with other isotopic systems. The information contained in δ 13 C collagen and δ 15 N collagen values (see above) was no longer very important in a subset consisting of herbivores only.
δ 18 O carbonate values also seem to play a minor role for the herbivorous subset resulting in relatively high entropy values ( Figure 2 and Table 2). However, even in this case this is not equivalent to a meaningless isotopic system. Moreover, GMM clustering showed a loss of information when δ 18 O carbonate values were removed from dataset I ("123") compared with the complete dataset ( Figures S7 and   S8, Table S10, supporting information). This, consequently, confirms that the suggested feature removal with respect to the entropy measure is clearly not equivalent to the detection of a meaningless isotopic system. It must rather be understood as the feature that would cause the lowest loss in information when being removed from the dataset.
It is important to mention that feature combination "123" (dataset I) and "1235" (dataset II) exhibit only slightly lower entropy values than the "234" (dataset I) and "2345" (dataset II), respectively.  Figure 2 and Table 2). Consequently, here the decision on the removal of an isotopic system also relies on the underlying scientific question to be solved.

| Fish
Entropy-based feature ranking showed that δ 13 C collagen values, as well as combinations of especially δ 18 O carbonate values and δ 13 C carbonate values, are relatively important in the fish subset ( Figure 3 and Table 3). Differentiation between marine and freshwater fish is, among others, possible using δ 13 C collagen values. 33 However, the information content contained in δ 15 N collagen values was comparatively low.
Comparison of the GMM clustering of the total fish data ("1234") and the dataset without δ 15 N collagen ("134") showed almost no differences, with the exception of the clustering results of two single individuals (Figures S11 and S12, Table S12, supporting information).
Therefore, the exclusion of δ 15 N collagen data would result in almost no information loss.
Since δ 13 C collagen and δ 15 N collagen data are usually generated in parallel, even the exclusion of both dimensions must be investigated.
This might be necessary if collagen is not (well) preserved in fish remains. Indeed, clustering with the carbonate fraction only ("34") led to the detection of four (compared with two) distinct clusters ( Figure S13,

| Correlation between isotopic systems
Depending on the subset different isotopic systems resulted in relatively high entropy values and, consequently, they could be removed from a dataset without too much loss of information. It is indeed possible to detect a relationship between the less informative isotopic system in each subset and the correlation between the isotopic systems (Tables S5 and S6, supporting information). For all subsets examined in this study, the isotopic system, which was proposed to be omitted according to entropy-based feature ranking, was linked to at least two other isotopic systems by a significant correlation. At least one of these relationships also showed a significant partial correlation (Tables S5 and S6, supporting information).

| Applicability of the entropy-based feature ranking method
In order to reduce costs or sample material needed for isotope analysis, isotopic systems extracted and analyzed together (e.g., δ 13 C collagen and δ 15 N collagen , δ 13 C carbonate and δ 18 O carbonate ) must also be excluded from the analysis together. This is also considered in the fish subset (see above This is because the entropy would be lowest for a well-structured dataset with at least two clusters, but high for an unstructured dataset lacking any clusters. However, as mentioned before, many research questions related to stable isotope analyses aim to detect primarily non-local individuals, individuals with different diet or status, and individuals inhabiting different habitats or ecosystems. This would result in datasets with two or more clusters. Thus, entropy-based feature ranking is a valuable tool to validate the information content of different isotopic systems and to choose the feature combination with the highest information content if one or more isotopic systems must be excluded from analysis.

| CONCLUSIONS
Whenever possible, a multi-isotope approach should be preferred. It has been shown previously (see the Introduction section) that multiisotope data analyses are part of future isotope studies. New data mining methods are therefore needed to analyze isotopic datasets.
However, especially in the case of archaeological studies, the material available for stable isotope analyses (e.g., bones, teeth, and hair) is often limited or certain skeletal components are insufficiently preserved. Therefore, it is of particular importance to decide which isotopic system(s) could be omitted without losing too much information. Entropy-based feature ranking offers a feasible and objective method to rank isotopes as well as a combination of isotopes and to select the isotopic systems that are most important to answer an underlying research question. Those isotopic systems that are less important according to entropy-based feature ranking of a pretest subset could be excluded from analyses to reduce the required material. In addition, this method can also be applied to modern specimen, reducing the amount of sample material (e.g., blood) in the case of live animals.
Our study showed that it is not possible to generally rank different isotopic dimensions without concerning the dataset. Feature ranking obviously depends on the composition of the dataset with respect to, for example, species-and diet-specific peculiarities.
Terrestrial mammals, for example, showed a different ranking from herbivorous mammals only, or even fish. While δ 18 O carbonate values showed low information content in the subsets of both terrestrial mammals and herbivorous mammals, the isotopic system exhibiting the lowest information content with respect to entropy was collagen carbon for fish, where δ 15 N collagen values could have been excluded, although this in turn was the most important isotopic system for terrestrial mammals. Consequently, a general exclusion of a certain isotopic system could be highly erroneous. Nevertheless, multiisotope analyses on a small (representative) pre-test subset will allow ranking of the isotopic systems of the whole material under study.
The material needed for the analyses of the remaining majority of the sample can consequently be reduced based on the feature ranking results of the pre-test. The present study can be used as a first hint when investigating different groups of animals. In addition, even the multi-isotope data of other studies can be chosen for a pilot ranking as far as the investigated sample material is similar with respect to, for example, species as well as the research hypotheses.
In addition, we detected a relationship between the outcome of the entropy-based feature ranking and the correlation, both marginal and partial, between isotopic systems. Obviously, an isotopic system can be removed from a dataset without too much loss of information if the respective isotopic system was correlated with other isotopes in a sufficient amount (here: at least two marginal and one partial correlations). The information contained in the removed system is, accordingly, still at least partly present in the remaining isotopic systems and the removal does not cause a loss of information (or at least only a minor loss). Consequently, this might also be a first indication when deciding about the removal of an isotopic system. We recommend using the described feature ranking method where no or only few data are available. However, a small (representative) subset of the collected material of a site should be analyzed as a pre-test and the tested isotopic systems should be ranked according to our method. These ranking results can be adopted on the remaining majority of the collected material. Sitespecific differences in stable isotopes are probably also present in the feature ranking results. Further knowledge on feature ranking results from other sites is needed to detect potential general patterns in the isotopic data. This would, consequently, facilitate researchers to a priori decide on the set of isotopic systems that should be analyzed to gain as much information as possible when the available study material is limited. In addition, the removal of an isotopic system should certainly also be in accordance with the research question. Furthermore, it is important to emphasize that the described method also allows detection of the second-best combination of isotopic systems with respect to the entropy measure if, for example, gelatine could not be extracted.
Consequently, entropy-based feature ranking can help to qualify even clustering of isotope ratios when one or more isotopic systems are not available for analysis.