Headline Environmental Indicators Revisited with the Global Multi-Regional Input-Output Database EXIOBASE

Environmentally extended multiregion input-output (EEMRIO) databases are used to quantify numerous environmental pressures and impacts from a consumption perspective. How-ever, for targeted communication with decision makers, large sets of impact indicators are unfavorable. Small sets of headline indicators have been proposed to guide environmental policy, but these may not cover all relevant aspects of environmental impact. The aim of our study was to evaluate the extent to which a set of four headline indicators (material, land, water, and carbon) is representative of the total environmental impact embedded in an EEMRIO database. We also used principal component analysis combined with linear regression to investigate which environmental indicators are good candidates to supplement this headline indicator set, using 119 environmental indicators linked to the EEMRIO database, EXIOBASE. We found that the four headline indicators covered 59.9% of the variance in product-region rankings among environmental indicators, with carbon and land already explaining 57.4%. Five additional environmental indicators (marine eco-toxicity, terrestrial eco-toxicity, photochemical oxidation, terrestrial acidiﬁcation, and eutrophication) were needed to cover 95% of the variance. In comparison, a statistically optimal set of seven indicators explained 95% of the variance as well. Our ﬁndings imply that there is (1) a signiﬁcant statistical redundancy in the four headline indicators, and (2) a considerable share of the variance is caused by other environmental impacts not covered by the headline indicators. The results of our study can be used to further optimize the set of headline indicators for environmental policy. in for in different

While using a set of multiple complementary indicators is helpful to cover all relevant aspects of environmental impact, it is considered unfeasible and also unnecessary to base policy decisions on dozens of indicators simultaneously. In response to the potential overload of environmental indicators, small sets of (resource) indicators, called headline or dashboard indicators, have been proposed to serve as a basis for environmental policy (e.g., Galli et al. 2012;European Resource Efficiency Platform 2014). However, the smaller the proposed headline set of indicators, the higher the chance that the set is not representative of all relevant impact pathways. Various authors evaluated the usefulness of the cumulative energy demand (CED) or the carbon footprint as proxy indicator for environmental damage (e.g., Huijbregts et al. 2010;Röös et al. 2013;Kalbar et al. 2017;Simas et al. 2017). While relatively high correlations are found for most metrics of environmental damage, there are also impact categories (such as freshwater eco-toxicity) for which neither CED nor carbon footprint are good proxies. Focusing on just one indicator of impact clearly does not cover all relevant aspects of environmental impact. Systematic searches for an optimal set of indicators based on correlations between indicators were performed by Berger and Finkbeiner (2011) for indicators of resource scarcity and by Lasvaux and colleagues (2016) for a large number of indicators used to quantify environmental impacts of the building sector. Lasvaux and colleagues (2016) showed that five dimensions of environmental impact related to the building sector should be covered, namely fossil energy consumption, eco-toxicity, ionizing radiation and ozone depletion, land use, and mineral depletion.
Eurostat (2017) proposes to use a headline set consisting of the material, land, water, and carbon footprint. Recently, Steinmann and colleagues (2016) applied an indicator reduction procedure and showed that a headline set of four resource footprint indicators (energy, water, land, and material) together accounted for 84% of the variance in product impact rankings on 161 indicators of impact for a set of nearly 1,000 different commodities. It is crucial that a proposed small set of headline indicators to be used for policy making does cover all relevant types of environmental impact. This is also acknowledged by the European Commission (EC), stating that the headline set of indicators can be supplemented with more specific thematic indicators if necessary (Eurostat 2017). Since the number of potential thematic indicators is high, it is important to find and optimal balance between simplicity and exhaustiveness.
The goal of this study was to reveal the extent to which a set of four headline indicators (material, land, water, and carbon footprints) is representative of the total environmental impact embedded in the EXIOBASE database. We also investigated with a statistical analysis which environmental indicators are good candidates to supplement this headline indicator set. Finally, we applied our methodology on the full set of indicators to find an optimal set of indicators from a purely numerical point of view.

EXIOBASE
While a number of EEMRIO models are available (see Tukker and Dietzenbacher 2013), we used the EEMRIO EX-IOBASE (base year 2011; version 3.2.4) in this study because of the relatively large coverage of emission and resource types (Wood et al. 2015). EXIOBASE includes 200 products with a relatively large amount of detail (table 1). For example, agricultural production is broken down into 15 product groups based on different livestock species and different crop types, which have dissimilar environmental impact. These 15 product groups are followed down the supply chain into 12 more groups, which include manufactured products related to food. Energy commodities are likewise detailed in EXIOBASE, with 69 types of energy carriers distinguished, based on International Energy Agency (IEA) energy balances (IEA 2012), and including the disaggregation of the electricity generation sector into 12 types of electricity producers. Further detail is included in the mining sector (11 types of ores and quarrying) and the manufacturing sector, which includes 42 types of manufacturing products in addition to the manufactured energy and food products previously identified. Not all countries produce all products.
In terms of environmental pressures, EXIOBASE records emissions to air and water, as well as land use, material extraction, and water use. Water accounts are provided for both blue water and green water and in terms of water consumption and withdrawal. Material accounts are provided in terms of energy content and mass of both used and unused extraction. Unused extractions form the part of the extraction that does not enter the economic system (e.g., excavated soil with no further economic use during building activities). Land accounts are broken down by activity (e.g., forestry vs. pasture). Air emissions accounts are provided for 27 substances, broken down by source (combustion, noncombustion, agricultural, and waste). All GHG emission categories are covered except emissions from land use, land-use change, and forestry. In addition, agricultural emissions of nitrogen and phosphorous to water are included (see Stadler et al. [2017] for a more extensive description of EXIOBASE 3).
Calculation of the environmental multipliers by products follow the standard demand model of Leontief (1966), where environmental pressures per million euro Q are calculated via where the vector S denotes the environmental stressors (e.g., land, water, and material accounts) per unit output of each product-region combination; the matrix A denotes the direct coefficients representing the global economic structure (I is an identity matrix of appropriate size). A full description of the variables and derivation of the Leontief calculation is available in various references (e.g., Miller and Blair 2009). Q is known as the multiplier matrix in I-O economics, and in this study it denotes the effect in terms of environmental pressure that is generated for each unit of final demand. It corresponds to the "system process" in life cycle assessment (LCA).

Environmental Impact Indicators
To allow for a meaningful comparison across studies, we use the same set of indicators as Steinmann and colleagues (2016). This set includes indicators from all major life cycle impact assessment (LCIA) methods (CML 2001, Ecoindicator 99, Ecological Scarcity 2013, EDIP 2003, EPS 2000, Impact 2002, ReCiPe 2008, and Tool for Reduction and Assessment of Chemicals and Other Environmental Impacts [TRACI]) as well as the resource-based indicators (land, water, material, and energy). Only the latest version of each impact assessment method was included. The impact assessment methods include so-called midpoint and endpoint indicators. Midpoint indicators are used to quantify the impact for a single impact category, such as acidification or global warming, whereas endpoint indicators are more comprehensive indicators of damage, which include multiple impact categories to come to an impact in terms of overall ecosystem damage, overall human health damage, or even a combined score of human health and ecosystem impacts. Per assessment method, we included all midpoint indicators as well as endpoint indicators related to damage to ecosystems or human health. We excluded indicators reflecting resource scarcity because of a lack of adequate input data. For example, total amounts of extracted and used ore are available, but the amount of metal present in that ore cannot be (directly) calculated from those amounts. This means that the impact on metal scarcity cannot be meaningfully calculated. We used characterization factors to quantify the environmental impact indicators, thereby summarizing the amount of damage per unit of each environmental extension. To calculate the headline indicators, we summed all used extractions, including the metal ores (in kilotonnes) for the material footprint. The land footprint was calculated by summing all types of land use, the water footprint was calculated by summing all types of blue water consumption, while the carbon footprint was calculated by using the characterization factors from the midpoint ReCiPe 2008, Hierarchist method.
For the analysis, we selected the consumptive environmental impacts of the 6,982 product-region combinations with a final demand of at least 1 million euros. We used the characterization factors as implemented in ecoinvent version 3.1 (see supporting information S2 available on the Journal's website for a complete overview of all characterization factors per environmental extension). However, since the number of environmental extensions in EXIOBASE is limited, not all 161 initial indicators could be included. Indicators were excluded if no environmental extensions related to an impact indicator were present in EXIOBASE, which was the case for indicators of ionizing radiation and ozone depletion. In the end, 119 different indicators (including the four headline indicators) were retained, from eight different LCIA methods. Ecoindicator 99 To reveal the intrinsic relationships among all indicators, both within and across LCA methods, we calculated Pearson's correlation coefficients between the indicators based on their underlying characterization factors. To that end, extensions which did not contribute to an indicator, that is, did not have a characterization factor, were given a value of 0 for that indicator. The correlation matrix is provided as supporting information S3 on the Web. We then calculated the rank scores of the product-region combinations (ranging from 1 for the product-region combination with the lowest impact to 6,982 for the product-region combination with the highest impact for the concerned impact indicator) for each indicator and found that 21 impact indicators showed a perfect correlation (Spearman's rho = 1) with at least one other indicator in the data set. Note that perfectly correlated rank scores can occur even if the characterization factors from the underlying methods are not perfectly correlated. Since our indicator optimization approach (see next section) is not able to deal with perfectly correlated indicators, we removed 14 indicators a priori. While many of the remaining indicators also showed very high correlations (up to 0.99), these are automatically grouped together in the optimization procedure. It was therefore not necessary to further reduce the number of indicators a priori. See supporting information S4 on the Web for a full list of the environmental impact indicators included and removed. The 105 remaining indicators are summarized per category and impact type in table 2.

Evaluation of Headline Indicators
To evaluate the extent to which the set of four headline indicators is representative of the total environmental impact embedded in EXIOBASE, we followed the two-step procedure as proposed by Steinmann and colleagues (2016). According to this procedure, first the dimensionality in the full set of indicator values is determined based on a principal component analysis (PCA). Next, a linear regression analysis is used to relate the resulting principal components to a selection of indicators (in this case the four headline indicators, if needed, supplemented with one or more thematic indicators) and evaluate the amount of variation explained. PCA was performed on the correlation matrix of the rank scores for the 105 indicators. We compared the explained variance of each component to the average explained variance of the same component based on a PCA on random data with the same number of indicators (105) and observations (6,982). Because we use rank scores, each random data set (1,000 in total) was a reordering of the numbers 1 to 6,982. A component was considered nontrivial if the explained variance of a component in our data set was larger than the average explained variance based on the random data sets. This procedure is an adaptation of the approach described by Peres-Neto and colleagues (2005).
In a second step, we combined the set of four headline indicators with the midpoint environmental impact indicators used in the first step and used these as predictors of the principal component scores in a linear regression analysis. We excluded the endpoint indicators (n = 10) as possible predictors because these are composite indicators that require multiple underlying indicators as input. To define the optimal size of the indicator set, we used the explained variance of the nontrivial principal components as a benchmark. We started with headline set of indicators (material, land, water, and carbon footprints) to evaluate the amount of variation explained by this key set. Next, we supplemented this set with one additional thematic indicator at a time, selecting from the 91 midpoint indicators included in the PCA, in such a way that the resulting set covered the maximum amount of variance. This procedure was repeated until the amount of explained variance was equal to or higher than that of the set of nontrivial components. We also employed this methodology without starting with the headline set of indicators. In that approach, we started with one indicator that covered most of the variance and supplemented this with additional indicators until the required amount of variance was covered. This yields the best set of indicators from a purely numerical perspective, that is, without including the headline set a priori.

Interpretation
To interpret the meaning of the nontrivial components, a twofold approach was used. First, the indicators with the highest and lowest loadings were compared for the first four principal components. Loadings are the weights given to each indicator for each principal component. Indicators contrast one another on a component (meaning they would lead to different rankings), if one indicator has a negative loading and another has a positive loading. Principal component scores can be calculated by taking the sum product of the loadings for a principal component and the standardized original rank scores that principal component. Second, the scores of the individual products were assessed for the first four principal components to see which types of products score are particularly associated to that component. This analysis contributes to the interpretation of the components and provides insight into which type of indicator is most appropriate to differentiate between products of a certain product type. Products with the most extreme scores are said to be separated from one another by a component. We divided the products into eight different categories for this purpose (agricultural and food products, electricity, fossil fuels, metals and electronics, minerals chemicals and plastics, nonfood bio-based products, services, and waste and recycling). Results from this analysis can be found in supporting information S1 (section S1-2) on the Web. A list with the names of each product and its corresponding category is provided in supporting information S5 on the Web. All analyses were performed in the R environment (R Core team 2016).

Results
Six nontrivial components were found, which explained 95.3% of the variance in the data set (figure 1). The majority of the variance in the data set was explained by the first component (58.9%). Consecutive nontrivial components explained 22.5%, 7.4%, 3.2%, 2.1%, and 1.2% of the variance (see supporting information S1 [section S1-1] and supporting information S4 on the Web for an interpretation of the principal components). Because the six nontrivial components covered 95% of the variance, we searched for the smallest subsets of indicators that explain this amount of variance. With all correlations between the impact indicators being positive, the first  component can be seen as representative of overall impact. As a single indicator, the carbon footprint represents this impact fairly well (41.6% explained variance). Adding the land footprint as a second indicator raises the explained variance to 57.4%, while the four headline indicators together explained 59.9% of the variance (table 3, figure 1). This means that a substantial part of the variance in product rankings is not explained by the headline set of indicators, and that additional thematic indicators are necessary to cover the missing impacts. Adding an indicator of marine eco-toxicity boosted the explained variance to 85.3%. In total, five additional thematic indicators were necessary to cover >95% of the variance (figure 1). This set contained the headline indicators (carbon, land, water, and material footprints) as well as indicators of marine eco-toxicity, terrestrial eco-toxicity, photochemical ozone formation, terrestrial acidification, and eutrophication. From a purely numerical point of view, the headline set of indicators is not optimal because there is overlap between the different headline indicators. For example, the water footprint correlates well with the land footprint (Spearman's rho = 0.84), meaning that these two footprints lead to a similar ranking between product-region combinations and there is little added value in using both in EXIOBASE. Results of the numerical analysis show that 50.1% of the variation in ranks scores could be covered by a single indicator of particulate matter (PM) formation (figure 1, supporting information S6 on the Web). Adding an indicator of freshwater toxicity increased the explained variance to 74.3%, and further adding an indicator of marine eco-toxicity raised the explained variance to 84.0%. In Table 3 Explained variance of sets of headline indicators supplemented by additional thematic indicators (sizes 1 to 9) and the numerically best set of impact indicators order to explain 95% of the variance in product-region ranks, a set of seven indicators was needed. This set contained indicators of PM formation, freshwater aquatic eco-toxicity, marine ecotoxicity, climate change, terrestrial eco-toxicity, photochemical oxidation, and land occupation (table 3).

Discussion
Our analysis showed that a set of four headline indicators, consisting of the material, land, water, and carbon footprints, covers around 60% of the variance in product-region rankings based on a set of 119 impact indicators applied to the EX-IOBASE EEMRIO database. In order to explain more than 95% of the variance present in the EXIOBASE data set, this headline set needs to be supplemented by five more indicators. Alternatively, one could employ a numerically optimal set of seven indicators. It is interesting to note that the resulting number of nine indicators (or seven for the numerically optimal set) is smaller than the number of impact categories that was originally present in the data set. These findings are in line with other studies on the potential for data reduction in terms of environmental indicators (Pozo et al. 2012;Brunet et al. 2012;de Saxcé et al. 2014;Sabio et al. 2012;Li et al. 2012;Gutierrez et al. 2010;Pascual-González et al. 2016). Compared to the original impact categories that were included, no indicators of human toxicity, PM formation, and freshwater eco-toxicity are among the supplemented headline set of indicators. This means that the emissions underlying these impact categories are correlated to other environmental extensions in the database, which may be caused by processes (i.e., the burning of coal) that generate multiple emissions simultaneously (e.g., carbon dioxide, PM, and lead). Because of these correlated emissions, part of the impacts resulting from these can be covered by proxies from other impact categories.
One of the reasons such high correlations between indicators were found is that several impact indicators are based on the same limited number of extensions. For example, there are 46 different indicators of toxicity (out of 119 indicators in total), which are all calculated based on the emissions of 11 different toxic substances. It might be argued that part of this correlation is caused by the fact that not all relevant toxicants were included in EXIOBASE. LCA databases, such as ecoinvent (Moreno Ruiz et al. 2013), may include up to a few hundred different emissions of toxicants, and toxicity information for even more substances is available through toxicity models such as USEtox (Rosenbaum et al. 2008). With a limited amount of underlying emissions, consisting mostly of heavy metals (which are relatively well covered in EXIOBASE), the toxicity indicators can be considered as over-represented in the LCIA methods used here. Note, however, that there are also intrinsic differences between the included toxicity indicators for different ecosystems. While the ecotoxic effects are often calculated through extrapolation from one ecosystem to another (freshwater, marine, and terrestrial), the characterization factors between the different types of toxicity indicators are not strongly correlated. This is because the fate part of the characterization factor is specific to the receiving compartments (marine vs. freshwater vs. terrestrial environment), which results in characterization factors for chemical impacts in different ecosystems that are intrinsically different.
Another limitation of our study is that various impact categories, such as metal scarcity, ozone depletion, and ionizing radiation, had to be excluded because corresponding extensions are not available in EXIOBASE. This might give an overestimation of the amount of variance that can be explained with a limited number of indicators. Regardless of the cause of the correlations, however, our results do show that not all impact indicators are required for efficient communication of the results of an EEMRIO database.
The reduction in number of indicators in this study, based on the consumption of 1 million euros of products from EX-IOBASE, was approximately equal to the reduction for the ecoinvent data set, which was based on 1 kilogram (kg) of each product (Steinmann et al. 2016). In that study, 92.3% of the variance could be explained by six indicators, compared to 95.0% for the numerically optimal set of seven indicators in the current study. Given that the earlier study was based on the ecoinvent data set, which includes a much larger number of different emissions than EXIOBASE and therefore also has more variation in impact indicators, we expected that it would have a lower reduction potential. However, the differences between products and therefore the correlations between indicators are larger when they are compared on a 1-kg basis. The ranked impacts of 1 kg of gold are much larger than those of 1 kg of corn, for example. On a 1 million euro basis, this effect is weakened because of the price differences between the products. In other words, 1 million euros represents a lot more kg of grain than gold, making the impact per million euros more similar. Despite this effect, we still found that the consumption of services had the lowest impacts while the consumption of metal products and electronics category still showed the highest impacts per million euros spent (see supporting information S2 on the Web). While allowing for an equal base of comparison between indicators, the use of rank scores partly neglects the fact that some impacts might be very similar across product-regions whereas other indicators may show much more variation. By transforming each indicator to rank scores (as opposed to simply standardizing the impacts per indicator), the potential to distinguish between highly variable and nondiscriminating indicators is lost. We feel that the use of rank scores is justified, however, because without transforming the scores to ranks the product-region combinations with the highest impacts would have dominated the correlation structure.
While there are numerous differences between a bottomup approach like ecoinvent versus a more top-down EEMRIO model, results show a remarkable similarity as well. In the study from Steinmann and colleagues (2016), the best set of six indicators included indicators of climate change, land use, acidification an eutrophication, ozone depletion, marine ecotoxicity, and terrestrial eco-toxicity. The numerically optimal set of seven indicators in this study included four indicators of the same impact categories (climate change, land use, marine eco-toxicity, and terrestrial eco-toxicity). Emissions of ozonedepleting substances (as defined by the WMO [2011]) are not included in the EXIOBASE extensions, hence no indicators for this impact category could be included in this study.
We have demonstrated that the set of four headline indicators as proposed by Eurostat (2017) was not able to fully represent the environmental impacts embedded in EXIOBASE. This means that supplementing this set with additional thematic indicators is recommended. A limitation of using the indicators identified in this study is that they are optimal in terms of explained variance only. For policy making, however, additional criteria, such as the societal acceptance and relevancy of the indicators, are of vital importance as well. These criteria have been formalized under the acronym RACER (Relevant, Accepted, Credible, Easy and Robust) (Lutter and Giljum, 2008) and represent additional considerations when assessing the usefulness of an indicator. It is questionable whether the toxicity indicators we identified are regarded robust enough by policy makers, given their relatively large uncertainties (Rosenbaum et al. 2008). Nonetheless, they do cover an aspect of environmental impact that cannot be approximated by simple footprints of resource use. Overall, our results are promising for policy makers, who aim to design environmental policies for product manufacturers, for example. Instead of focusing on a large number of conflicting indicators, we argue that a relatively small subset of indicators can be used to guide environmental policy.

Funding Information
This work was funded by the European Commission-Seventh Framework Programme 308552.