Does it pay to be sustainable? Looking inside the black box of the relationship between sustainability performance and financial performance

1198 wileyonlinelibrary.com/journal/csr Abstract The last three decades have witnessed a huge amount of research exploring the linkage between companies' sustainability performance (SP), sustainability disclosure and financial performance (FP). Researchers have applied various methods and techniques to investigate this relationship, yet the results remain equivocal. In this article, we look inside this black box by considering various manifestations of sustainability practices and investigating their link with FP. We apply a manual content analysis technique to analyse the sustainability reports of the 100 best‐performing US firms. Our results reveal that fragmentation in the results is caused by the SP measurement. Additionally, we note that the interlinkages between different SP dimensions and sub‐dimensions are weak and somewhat contradictory. The results help draw important policy implications for the development of an SP reporting framework.


| INTRODUCTION
Does it pay to be sustainable? This question has been asked by many studies in the last three decades, yet the results are fragmented (Callan & Thomas, 2009;Barnett & Salomon, 2012;Song, Zhao, & Zeng, 2017). Recent discursive and meta-analytical reviews by Horváthová (2010), Endrikat, Guenther, and Hoppe (2014) and Lu, Chau, Wang, and Pan (2014) suggest that the uneven application of sustainability performance (SP) measures is one of the main causes of the prevailing equivocality of results. The existing literature so far has neglected the multifaceted nature of sustainability measurement (Trumpp, Endrikat, Zopf, & Guenther, 2015). Most of the researchers in the given SP and financial performance (FP) nexus either used third-party SP measurement such as KLD 1 (e.g. Tang, Hull, & Rothenberg, 2012;Tebini, M'Zali, Lang, & Perez-Gladish, 2016;Waddock & Graves, 1997) or self-defined measurement (e.g. Mahoney, LaGore, & Scazzero, 2008;Godfrey & Hatch, 2007). This lack of congruent SP measurement has created confusion about the relationship between SP and FP (Horváthová, 2010). To clear up this confusion, we conduct an in-depth analysis of the relationship between sustainability disclosure (SD), SP and FP.
Our measurement is based on a widely accepted reporting framework, i.e. the GRI framework. 2 We analyse 152 sustainability reports from the 100 bestperforming 3 US firms by applying a manual content analysis technique.
We categorize the SP information for each indicator categoryeconomic, environmental and socialseparately. Such a classification allows us to calculate an SP index for each indicator and sub-category (see Table 2 later for a detailed description). To test the inter-linkages (Antolin-Lopez, Delgado-Ceballos, & Montiel, 2016;Bradford, Earp, Showalter, & Williams, 2016;Lozano & Huisingh, 2011)  The remainder of the paper is organized as follows: the next section discusses the findings of the extant literature. Section 3 is devoted to discussion about theory and hypothesis development.
Section 4 describes our methodology. In Section 5, we present the empirical findings. In the last two sections, we discuss our results and outline conclusions, implications and future research directions.

| PRIOR EVIDENCE
There are different schools of thought 4 concerning the SP-FP nexus (see Molina-Azorín, Claver-Cortés, López-Gamero, & Tarí, 2009;Revelli & Viviani, 2015). Proponents of the neoclassical school ('traditionalist view') have argued that sustainability initiatives impose additional costs (see, e.g., Walley & Whitehead, 1994;Hamilton, 1995), whereas Porter (1991) and Porter and Van der Linde (1995) support the 'revisionist view' and argue that such initiatives create win-win situations by enhancing FP and social welfare. Flammer (2015) and Marti, Rovira-Val, and Drescher (2015) note that investment in sustainability yields positive accounting performance. Similarly, Wang and Tuttle (2014) argue that sustainability has become an important contributor to investment returns by sending a positive signal to the financial market.
The third stream of research challenges both traditionalist as well as revisionist views and supports an inverse U-shaped relationship (Lankoski, 2000;Wagner, 2001) by arguing that sustainability is beneficial to a limited extent. Others have argued for a neutral association between firms' responsible behaviour and resulting benefits (McWilliams & Siegel, 2001). Table 1 provides an overview of the mixed empirical results. We systematically review the literature and present the competing approaches.
Conversely, Shane and Spicer (1983), Cordeiro and Sarkis (1997) and Preston and O'Bannon (1997) argue that sustainability engagement is detrimental for FP. Hamilton (1995) finds a negative relationship between the Toxic Release Inventory and share price. Similarly, Khanna and Damon (1999) find a negative impact of Toxic Release Inventory on return on investment. Likewise, Konar and Cohen (2001) note that information about toxic chemical disclosure impacts financial performance negatively in the US manufacturing sector. On the other hand, Pava and Krausz (1996), King and Lenox (2001) and Link and Naveh (2006) report an insignificant relationship between SP and FP.
Similar competition among reported results can be seen in many other studies. Horváthová (2010) conducts a meta-analysis on 64 outcomes from 37 empirical studies and concludes that the inconsistency that prevails is due to methodical inconsistency. More recently, Wang, Dou, and Jia (2016) analysed 119 outcomes from 42 empirical studies and found that the measurement of the SP constructs creates variation in the results. The body of knowledge is growing, yet the results are inconclusive. Keeping in view the competing results, our study aims to fill this void by using a more refined measurement of SP.

| HYPOTHESIS DEVELOPMENT
The review of the existent literature shows that not only are the empirical findings contradictory, but the use of theories is also inconsistent (see Table 1). Moreover, theories used in existing SP-FP nexus literature are based on contending assumptions; for example, agency theory (Al-Najjar & Anfimiadou, 2012;Surroca & Tribó, 2008) and stakeholder theory (Hoepner, Oikonomou, Scholtens, & Schröder, 2016; are based on opposing assumptions (Hussain, Rigoni, & Orij, 2016); yet many researchers use these two theories to provide the rationale for similar research questions (McWilliams, Siegel, & Wright, 2006;Wahba, 2008). Among all these theories, the stakeholder theory is the dominant theory, suggesting a positive relationship between corporate sustainability initiatives and FP (McWilliams & Siegel, 2001). 4 Traditionalists and revisionists hold competing views about firms' engagement with sustainability initiatives and its impact on FP. Friedman (1962) considers economic profit making as the only social responsibility of the firm. He argues that CSR is a 'subversive doctrine' (p. 133). On the other hand, Porter (1991) and Porter and Van der Linde (1995) have formulated the 'Porter hypothesis', according to which the investment in sustainability is in the long-term benefit of stakeholders as well as investors. Stakeholder theory assumes that a firm should take into consideration the needs of a wider variety of stakeholders and not only the profit requirements of its owners (Freeman, 1984). Endorsing stakeholder theory as a relevant theoretical lens, Freeman (2010) argues that, although shareholders' wealth creation is at the top of the corporate agenda, firms should not ignore the needs of a wider spectrum of stakeholders. He further argues that such stakeholders play a vital role for the success, survival and growth of a firm. Under a similar assumption, Russo and Fouts (1997) document a significant positive relationship between environmental disclosure and FP. Similarly, King and Lenox (2002) and Ducassy (2013) observe a positive relationship between EP and FP. Waddock and Graves (1997) argue that, if the firm does not incur the explicit cost of being sustainable, then it has to incur an implicit cost of losing competitive advantage. Likewise, Hull and Rothenberg (2008) Our next two regression equations test the relationship between SP and FP. Formally, our second and third equations are Based upon the Hausman (1978) specification test results, we apply fixed-effect panel regression analysis for all our equations.

| Measurement of variables
To test our first model, we used the ESG parameters provided by Bloomberg. Bloomberg ESG scores range from 0 to 100 depending on the number of data points disclosed by companies. The more the company discloses, the higher the score. ESG estimation covers a broad range of items (Lo & Kwan, 2017). ESG scores are broad, although not verifiable, measures of firm sustainability disclosure.
Despite their limitations, we use ESG scores to understand whether SD is relevant for firms' FP.
In Models 2 and 3 we use verifiable SP measures that are based on GRI guidelines. GRI argues that sustainability reports based on its guidelines can be used as a benchmark for organizational performance and demonstration of organizational commitment towards sustainable development goals (GRI, 2006). GRI reporting framework challenges firms to report on both positive and negative aspects of their perfor- For each dimension and sub-dimension, we measured the disclosure level on a binary scale (1 when the information on an item is provided and 0 otherwise). This procedure allows us to generate for each level a disclosure index as the ratio between the number of items disclosed and the overall number of items included in the dimension or sub-dimension. As for the quality of the sustainability disclosure, we calculated a quality index based on the classification of positive and  Patten and Crampton (2003, p. 40). This approach is consistent also with the work of Plumlee, Brown, Hayes, and Marshall (2015)). The classification of the sustainability information as positive and negative allowed us to calculate a quality index, which is a normalized algorithm proposed by Krajnc and Glavič (2005) and used by Hussain et al. (2016) for SP measurement: In Equation (4), 'real score' is the algebraic sum of positive and negative scores; 'minimum' is the minimum potential score assigned to each sustainability category, which occurs when all the information provided has been classified as negative, while 'maximum' indicates the contrary: the maximum potential number of information items with a positive sign.
Finally, we calculate our measure of SP, multiplying the disclosure index and the quality index of each dimension and sub-dimension.    alpha on 25% of the data coded by two researchers. The value of alpha should be 'greater than 0.67 for useful conclusions' (Krippendorff, 2004, p. 241). We find that all the alpha values for disclosure and quality indexes are above the acceptable threshold value.
To proxy a firm's performance we use both market and accounting performance measures. In the first category, we select the Tobin's Q ratio, which measures the market appreciation/depreciation of the firm's value with respect to the book value of the company (Lindenberg & Ross, 1981). We select ROA and ROE as proxies for accounting performance. We select a set of control variables according to the extant literature. More specifically, we use firm size, sales growth, capital intensity and debt-to-equity ratio as firm-specific controls. In line with Hussain et al. (2016), we include ENV_SENS, a dummy variable capturing whether the company belongs to an environmentally sensitive industry.

| Descriptive statistics
We present the descriptive statistics in Table 3  Panel A documents that, as expected and supported in the literature (see, e.g., Xu, 1999), the mean disclosure level of the sustainability issues (as measured by the ESG parameters) depends systematically on the kind of industry considered: the ESG scores of the environmentally sensitive industries are greater than the scores attributed to environmentally less sensitive industries. The Wilcoxon rank-sum test results support this notion. This results further support the idea that environmentally sensitive industries have multifaceted pressure from various stakeholder groups and that such companies disclose more (Lyon & Maxwell, 2011   Our findings show that no ESG parameter is significantly related to FP. This is valid for both the accounting performance (ROA and ROE) and the market-based performance (TOBINQ). This evidence suggests that the level of a company's commitment to transparency and accountability, as elaborated in the ESG parameters, is not relevant to the FP of that company. As for the control variables, ENV_SENS has a positive and significant relationship with the accounting performance. Similarly, the SALE_GROWTH has a positive linkage, but it seems weak. RD_INT is negatively associated with the accounting performance, but it does not show any relationship with the market-based FP. SIZE is significant for ROA and TOBINQ only, while the ratio D/E is strongly negatively associated with ROE.

| Sustainability performance and financial performance
Tables 6 and 7 report the results of our main regression models (Equations (2) and 3). Table 6 shows that the impact of the three dimensions of sustainability performance is different depending on the financial performance proxy considered. More precisely, the environmental and social performance measures are significant and have a positive impact on ROA, ROE and Tobin's Q. The economic dimension is on the contrary relevant only when we measure the FP by the company Tobin's Q ratio. In this case, the economic dimension shows a weak correlation for TOBINQ (p = 0.0562) but the relationship turns out to be negative. Table 7 reports the results concerning the broken-down SP dimensions. These findings allow us to identify which specific components of SP are related to FP. A number of aspects are worth pointing out. First, the result concerning EC_SUST detected in Table 6  Regarding the social sub-dimensions, SO_SUSTsub1 has a positive effect on TOBINQ, while SO_SUSTsub2 and SO_SUSTsub4 affect positively the accounting measures only. In Table 6 we note that social performance is weakly linked to TOBINQ. However, further in-depth analyses show that some aspects of the same measures are positively linked to market-based FP. For both Equations (2) and 3 we run the variance inflation factor test to check for the multicollinearity issue.
The results did not raise any concerns.
Summarizing, our empirical evidence showed that the transparency of a company's sustainability commitment, as measured by the ESG parameters, is not related to the company's FP. However, SP is significantly linked to accounting as well as market-based measures of FP. Furthermore, we find a negative, although weak, relationship between the economic sustainability performance of reporting companies and their market value. This shows weak and contrasting links between various pillars of SP.

| DISCUSSION OF THE RESULTS
Our analysis aimed at exploring the relationship between SP and FP.
Our findings provide a new lens for obtaining a more profound insight into the divergence in existing findings (see for comparison Brammer, Brooks, & Pavelin, 2006;Mishra & Suar, 2010;Fujii, Iwata, Kaneko, & Managi, 2013;Flammer, 2015;Hoepner et al., 2016). Our starting model (1), reported in Existing literature has so far neglected the multifaceted nature of sustainability measurement . This creates a huge knowledge gap, which we fill by providing fact-based findings.
We elaborate a set of innovative indicators that are better adapted to capture the essence of companies' efforts towards sustainability: the SP measures included in Tables 6 and 7. As predicted, these models suggest that findings support our intuition. The SP pillars, measured in terms of performance and not just disclosure, may affect significantly the FP. Specifically, we find that the inclusion of our variables significantly improved the overall explanatory power of the regression models and that the coefficients differ considerably according to the specific sustainability dimension.
More specifically, the GRI has entirely eliminated EC_SUSTsub3.
Moreover, 85% of the input dimension of the environmental indicator has been updated. Similarly, 50% of the society (SO_SUSTsub3) and 33% of the product responsibility (SO_SUSTsub4) dimension has been updated (GRI, 2012). In the light of observed results, we argue that there is a need for continuous improvement in the reporting frameworks. Alternatively, our empirical evidence can be interpreted as support for the choice of integrated reporting, as argued by Dong (2017) in his recent experiments. An integrated reporting framework provides a holistic view of a firm's financial and non-financial performance avenues. Building inter-linkages between economic and non-economic performance will provide better performance analysis prospects

| CONCLUSION, IMPLICATIONS AND FUTURE RESEARCH DIRECTIONS
The objective of this research is to gain a deeper insight into the relationship between SP and FP by utilizing unique measures of SP based on globally accepted SP reporting framework. The review of the existing literature shows that there is a huge divergence in the existing evidence (Endrikat et al., 2014;Horváthová, 2010;Wang et al., 2016).
These reviews motivated the present study to link SP and SD with FP.
We find that SP measurement matters and can provide better and conclusive results about the direction of the relationship between sustainability engagement and firms' performance. Our research also provides important insights concerning the compartmentalization of SP dimensions by showing that these dimensions need to be revisited and realigned.
Our results reveal that, no matter how great is the disclosure, the Przychodzen (2017), we argue that firms should include sustainability in their strategic planning and invest more in social and environmental performance to achieve manifold performance objectives. We also conclude that firms that invest more in sustainability, particularly if characterized by an outstanding visibility, perform better. Our results provide some important policy implications for the standard setter in terms of providing new evidence about the need for more aligned parameters for overall sustainability reporting standards. Based on our findings of the relationships between various dimensions and sub-dimensions of SP, we would invite future research into the global context and further investigation in other less developed or developing economies. We consider that deploying a sub-dimensional analysis of SP can provide better insight into outcomes for managers as well as policy makers.