Uncertainty Quantification in Life Cycle Assessments: Exploring Distribution Choice and Greater Data Granularity to Characterize Product Use

The life cycle environmental profile of energy‐consuming products is dominated by the products’ use stage. Variation in real‐world product use can therefore yield large differences in the results of life cycle assessment (LCA). Adequate characterization of input parameters is paramount for uncertainty quantification and has been a challenge to wider adoption of the LCA method. After emphasis in recent years on methodological development, data development has become the primary focus again. Pervasive sensing presents the opportunity to collect rich data sets and improve profiling of use‐stage parameters. Illustrating a data‐driven approach, we examine energy use in domestic cooling systems, focusing on climate change as the impact category. Specific objectives were to examine: (1) how characterization of the use stage by different probability distributions and (2) how characterizing data aggregated at successively higher granularity affects LCA modeling results and the uncertainty in output. Appliance‐level electricity data were sourced from domestic residences for 3 years. Use‐stage variables were propagated in a stochastic model and analyses simulated by Monte Carlo procedure. Although distribution choice did not necessarily significantly impact the estimated output, there were differences in the estimated uncertainty. Characterization of use‐stage power consumption in the model at successively higher data granularity reduced the output uncertainty with diminishing returns. Results therefore justify the collection of high granularity data sets representing the life cycle use stage of high‐energy products. The availability of such data through proliferation of pervasive sensing presents increasing opportunities to better characterize data and increase confidence in results of LCA.


Introduction
Life cycle assessment (LCA) stands today as the leading method accounting for the environmental impacts of products products, which tend to have long useful lifetimes, during which use-stage emissions and their associated uncertainties are likely to dominate estimated life cycle impact (Weber 2012). Designing with the objective of minimizing environmental impact can therefore be limited by the lack of good-quality data pertaining to a product's real-world use and its associated uncertainty in the inventory. Although flows sampled from big data may appear to be a troublesome source of variability in an uncertainty analysis, they should provide a better representation of the reality LCA is intended to portray (Cooper et al. 2013). After an emphasis in recent years on methodological development in LCA, for example in allocation approaches or regionalized impact assessment methods, inventory data development is now the primary focus again (Hellweg and Zah 2016). For example, in an LCA study of electricity generation in the United States. Hauck and colleagues 2014 separated the effect of uncertainty due to lack of knowledge from the effect of inherent variability in supply chains. Virote and Neves-Silva (2012) employed stochastic modeling through the use of Markov chains to predict building energy consumption based on variability in occupant behavior, while other studies have examined the effect of user profiles on energy costs of electric and conventional vehicles (Faria et al. 2013;Hawkins et al. 2013.). The rise of pervasive sensing presents opportunities in today's society to collect rich information, capturing data more accurately and informing us how products and systems are really used. Improved profiling of variation in product users behavior, categorized as interindividual variability, enables more accurate characterization of use-stage parameters and their associated uncertainty in LCA modeling, in turn improving the credibility of LCA results. Building electricity use varies over seasons and days, and this individual and temporal variability is largely dependent on human behavior. A recent study in Sweden used high-granularity household electricity data assess the effectiveness of different probability distributions to characterize electricity load profiles (Mumkhammar et al. 2014), while Ross and Cheah (2017) demonstrated how high-granularity data could be used to profile interindividual variability in user behavior and identify potential reductions in the life cycle greenhouse gas (GHG) emissions of air-conditioning (AC) systems. This highenergy-consuming product is a pertinent case study, as AC is the most energy-intensive home appliance (Rapson 2014) and was present in 70% of U.S. domestic residences (Shah et al. 2008). Furthermore, the use stage of AC systems can account for 80% to 90% of the total life cycle GHG emissions (De Kleine 2009;Ross and Cheah 2017). Hertwich and Roux (2011) noted that smaller domestic products, such as televisions or personal computers, were of comparable importance to larger energyintensive appliances in terms of life cycle GHG, but this was due to the manufacture stage contributing an equal or greater share of GHG to the product use.
Although variation in life cycle parameters is inherent, treatment of uncertainty is often still prescribed as optional or only subject to qualitative treatment in emissions accounting frameworks (BSI 2011;GHG Protocol 2013). The international standard, ISO (International Organization forStandardization) 14044 (ISO 2006), states that uncertainty should be determined where feasible, by either parameter ranges or probability distributions, but the treatment itself is not mandatory. Reap and colleagues (2008) stated that reliably incorporating uncertainty was one of the key unresolved challenges to wider adoption of the LCA method. In recent years, however, uncertainty propagation in life cycle studies has been conducted using a range of well-understood and generally accepted methods, such as stochastic modeling, fuzzy data sets, interval calculations, and analytical uncertainty propagation (Clavreul et al. 2013). The importance of considering how covariance among dependent LCA inputs influences output uncertainty has also been noted (Bojaca and Schrevens 2010). Uncertainty in LCA may also be treated qualitatively, employing pedigree estimates based on data quality indicators, which can then be quantified using expert judgment or empirical data (Ciroth et al. 2012). While there remain distinct differences regarding the appropriateness of different methods (Lloyd and Ries 2007), stochastic modeling in the form of Monte Carlo analysis has been the most commonly employed method (Henriksson et al. 2015a). In previous years, characterization of variables in stochastic LCA was commonly confined only to the use of four default statistical distributions: the uniform, triangular, normal, and lognormal distributions (Heijungs and Frischknecht 2005), which were proposed to overcome bias that can arise when practitioners choose different probability distribution functions (Muller et al. 2017). The widely adopted LCA software package, SimaPro, also employs these four distributions (Pré 2013), and although the most recent release of the prominent ecoinvent database now offers a wider range of up to eight distributions for uncertainty quantification (Weidema et al. 2013), 99.9% of uncertain parameters are modeled as lognormally distributed (Muller et al. 2017). Other statistical tools permit the evaluation and application of a large range of different probability distributions (Palisade 2015). In practice, the characterization of uncertainty for stochastic variables depends on the quality of data and the model employed.
Where rich data are available, the best-fit probability distribution may be estimated by goodness-of-fit tests. However, in the event of a limited data set, a selected distribution provides an approximation of parameter variance based on the available information, such as the mean value and standard deviation (SD). Hozo and colleagues (2005) suggested the SD could be estimated based on the sample size and range alone. In instances when expert judgment is solicited for input data, it may be easier for experts to identify a foreseeable range and most likely value, leading to use of a triangular distribution (Lloyd and Ries 2007). In instances where there has been no available information, uncertainty has been estimated employing data-quality indicators to represent the sample data (Weidema 1998), and the lognormal distribution is typically assumed when applying a simplified procedure (GHG Protocol 2013). In general, a wide range of probability distributions are employed to characterize energy consumption, and Mumkhammar and colleagues (2014) noted that, when drawing conclusions from the literature, there is generally no unique distribution type suitable for modeling Table 1 Equations describing model used for calculation of greenhouse gas emissions from the life cycle of the case study air conditioning system

Variable Units Equation
GHG total kgCO 2-eq = GHG manufacture + GHG transport + GHG use + GHG end of life GHG use kgCO 2-eq = GHG power + GHG leakage GHG power kgCO 2-eq = P AC * EF electricity * t lifetime GHG leakage kgCO 2-eq = l line * m R410a * loss R410a * CF R410a * t lifetime P AC kWh = P AC-Jan + P AC-Feb + . . . . + P AC-Nov + P AC-Dec household electricity use. The Weibull distribution, however, is commonly implemented in modeling the generation and consumption of electricity (Carillo et al. 2014;Mumkhammar et al. 2014;Muller et al. 2017) and the lifetime of products (Oguchi and Fuse 2015;Nishijima 2016). Big data opportunities presented by pervasive sensing enable recording the use stage of products in ever higher data granularity, that is, the level of depth represented by the data. Through this, we can gain insight into not only interindividual variability, but also temporal variability such as variation introduced by seasonal effects. Accounting for these effects, a model will truly reflect the life cycle use stage of products and yield more precise estimation of output uncertainty. With a renewed focus from the LCA community on data development, there is a need to examine the efficacy of gathering and employing such high-granularity data for uncertainty quantification in a stochastic LCA model. This study aimed to investigate the benefits of high-granularity use-stage data collection in LCA through analysis of the case study of AC systems. Specific objectives of the study were to examine: (1) how characterization of interindividual variability in the use stage by different probability distributions and (2) how characterizing data aggregated at successively higher granularity affects LCA modeling results and the uncertainty in output.

Data Source
Data on AC use were sourced from the Pecan Street Dataport, one of the world's largest archive of disaggregated customer energy data (Pecan Street Inc. 2015), providing high granularity data on appliance-level electricity consumption. Although the Pecan Street energy research network comprised an archive of over 1,300 users in the United States, data were necessarily filtered to ensure comparability. To be eligible for the present study, residences were required to: have AC; be situated in the same locality; provide complete AC electricity readings; and have AC systems with comparable power and efficiency ratings. Data were therefore assessed for a sample of 107 residences located in Austin, Texas, over a period of 3 calendar years from January 2013 to December 2015. Each incorporated an AC system with single outdoor compressor, rated cooling power of 10.6 kilowatts (kW) (3 ton 1 of refrigeration, a metric more commonly used in North America), and coefficient of performance (CoP) of 3.5 (14 SEER-the Seasonal Energy Efficiency Ratio). Pecan St recorded appliance level electricity consumption on a minute-by-minute basis, enabling this study to aggregate data at several granularities practical for characterization in LCA. Electricity consumption for each user was aggregated at the annual, monthly, weekly, and daily level.

Life Cycle Assessment
The life cycle system boundary was defined as cradle to grave, covering the stages from extraction or acquisition of raw materials up to the end-of-life (EoL) treatment within the framework of an attributional LCA. The impact category was climate change, with GHG expressed in kilograms of carbon dioxide equivalents (kgCO 2 -eq). The functional unit for this study was the lifetime of a 10.6-kW, 3.5-CoP rated inverter AC system used to cool a single household. Overall life cycle GHG impact was estimated by summation of emissions from the appliance manufacture and transport stages; the product use stage and the end of product life stage. In LCA, emissions arising from the process energy required to operate a product are considered direct use-stage emissions (ISO 2006). In the present study the use stage therefore included all emissions arising from electricity for operation of the air conditioner throughout its useful life span, from installation to uninstallation of the system. This incorporated emissions associated with upstream power generation for electricity consumption, as well as those associated with refrigerant leakage. The use stage included background electrical demand for maintaining refrigerant temperature when the system was idle; however, this use was inherent in the recorded appliance-level electricity use. Equations describing the life cycle model employed and calculation of the use-stage GHG are described in table 1.
An average emissions factor employed for electricity consumption from the ERCOT grid in Texas was 655 gCO 2 e kilowatt-hours (kWh) −1 . This factor also included account of transmissions losses and additional upstream emissions for electricity generation (Anair and Mahmassani 2012). The refrigerant used was R410a, which has a global warming potential 1,725 greater than that of CO 2 (IPCC 2006). Appliances were rated assuming refrigerant was delivered via an average 7.6 meters (25 feet) of piping (Carrier 2011) and leakage estimated to be 2% annually (IPCC 2006;De Kleine 2009). An exact breakdown of the system components was not available, therefore component materials were assumed in line with Shah and colleagues (2008), and emissions embedded in the materials and manufacture processes were estimated from the GREET2 database (Argonne 2015). The real life span of users' AC systems could not be determined as systems were not replaced during the period of the study; therefore, the typical useful life span of a system was estimated from literature values. This was assumed to be 10 years (Attia 2012; ASHI [Dunlop 2012]), with a minimum life span of 5 years, in line with a standard manufacturer's warranty (Carrier 2011), and assumed maximum life span of 20 years (Shah et al. 2008). Emissions associated with transport and delivery of the product system to Austin, Texas, assumed a transport distance of 800 kilometers by road from manufacture in the United States, split between long-haul truck and medium heavy duty vehicle (Argonne 2015). In the absence of available specific data, EoL emissions were estimated using a value from a pertinent U.S. case study in the literature (De Kleine 2009).

Stochastic Simulation and Characterization of Data
A stochastic simulation was employed to quantify the influence of statistical variation in use-stage parameters upon the output from impact assessment. Input parameters were defined by probability distributions, and Monte Carlo simulations were conducted employing the @RISK package (Palisade 2015). This software and approach to life cycle uncertainty quantification has previously been effectively employed in many environmental LCA studies (Basset-Mens et al. 2009;Couth et al. 2011;Wriedt et al. 2014) and in quantification of life cycle use-stage variability (Ross and Cheah 2017). Output metrics employed for quantifying uncertainty in the LCA model were the coefficient of variation (CV) and the contribution to variance (CTV). The CV is a measure of spread of results that describes their variability relative to the output mean. The CTV estimates what percentage of the variance or uncertainty in the output forecast is caused by assumption of variation in each input parameter, and has been forecast as a global sensitivity test for LCA . Monte Carlo simulation ran 1,000 iterations of the model, generating a distribution of the resulting output (kgCO 2 e). This number of simulations is considered appropriate for LCA studies, and suitable for estimating differences between outputs from Monte Carlo analysis (Henriksson et al. 2015b;Klöpffer and Grahl 2014), and for estimation of parameters' CTV .
Use-stage AC electrical power consumption (P AC ) of an average household was characterized by probabilistic distributions satisfying the best fit to the data. In lieu of high-quality data, lognormal and normal distributions were used to represent other parameters in the model, as standard in industry practice when information about the distribution shape is unknown (Weidema et al. 2013). Manufacturer guidance for comparable appliances stipulated minimum and maximum line lengths (Carrier 2011;Mitsubishi 2014), and the stated warranty represented a value for minimum useful life span. Fixed-bound Pert distributions were thus considered appropriate for the useful lifetime and refrigerant line length. Uncertainty parameters and distributions representing inputs were those presented in table 2.

Different Probability Distributions and Data Granularity
The study examined how LCA modeling results and the uncertainty in output were affected by two scenarios: (1) the characterization of interindividual variability in the use stage by different probability distributions and (2) modeling at successively higher data granularity. The two stochastic models correspond directly to the two study objectives.
(i) The initial model characterized P AC using monthly level data, equivalent to that available from a user's utilities bill, in order that the model would make some account of seasonal variation in AC use. P AC was therefore the summation of 12 probability distributions (P AC-Jan to P AC-Dec ) representing the users' aggregated AC power consumption in each month of the year. Distributions were thus derived from the observed spread of data arising from variation amongst the 107 residences power consumption in each month. Parameters for P AC-Jan to P AC-Dec are presented in table 2. The suitability of a range of 26 possible distributions to characterize components of P AC was assessed in @RISK. Goodness of fit was determined using the Kolmogorov-Smirnoff test and the Akaike Information Criterion. Stochastic simulation was then conducted using the best-fit distribution to the data. Three further simulations were conducted, characterizing P AC-Jan to P AC-Dec by each of three "default" distributions: (normal, lognormal, and triangular) representing the scenario that we had only basic parameter information or expert opinion with which to define power consumption. Significant differences between the output estimated GHG were assessed by analysis of variance using the Tukey method. CTV was calculated following the method employed by Mutel and colleagues (2013), using Spearman's rank-order correlation, squaring the rank-order correlation coefficients and normalizing them to 100%. (ii) P AC was characterized for stochastic modeling at four successively higher granularities of data aggregationannual, monthly, weekly, and daily. For example, at annual level, there was a single probability distribution in the model representing users' electricity consumption for the entire year, at monthly level there were 12 distributions, and at weekly level 52 distributions. In order to separate variation observed between, for example weekends and midweek days, daily level was defined as the weekday within each month, therefore there were 7 × 12 = 84 distributions aggregated to estimate P AC at daily level. In this way, the life cycle model accounted for interindividual and seasonal variation over increasingly shorter Refrigerant mass per l line kg m −1 n/a 55.8 Note: P AC-Mmm = user AC electricity consumption each month; EF = emissions factor; CF = characterization factor.
periods within the year while characterizing P AC . Goodness of fit for all distributions was determined using the Kolmogorov-Smirnoff test. The best fitted distributions were broadly exponential in form during the cooler period (Oct-Mar) and Weibull or Beta distributions during the warmer period (Apr-Sep) with higher AC use. Detail of all probability distributions employed in modeling for each data level is available in the Supporting Information on the Journal's website. Stochastic simulation was conducted to estimate the lifetime GHG of the AC system and associated uncertainty when modeling at each data aggregation level.

Results
The use stage was estimated to contribute 93% of the total life cycle emissions of the AC system, almost entirely resulting from electricity use. Embodied emissions in the manufacture and transport delivery of the system accounted for just 2% of the lifetime total. A breakdown of the estimated GHG from deterministic calculation of the different life cycle stages is presented in table 3.
(i) Table 4 presents the results from comparison of modelling with P AC characterized by the best fit versus three standard or default distributions. Three models produced concurrent estimates for mean lifetime GHG; however, the triangular model significantly overestimated mean GHG by 65%. The triangular model also resulted in a considerably higher SD and 95% range of GHG results than the other three models. CV was largest in the bestfit model, with the lowest CV in the normal model. CTV from P AC was different amongst the four models and was highest in the best-fit model. (ii) Table 5 presents the results from stochastic modeling where P AC was characterized at different data granularities for modeling. Figure 1 presents a plot of output distributions from each of the data granularity models. CV in results was largest when employing annual-level data and smaller when modeling at successively higher data granularity. CV reduced by 26%, 42%, and 43% when using monthly, weekly, and daily level data, respectively, compared to annual. Similarly, reductions in CTV were observed with successive increases in data granularity. CTV Note: Different superscript letters within the same row denote a significant difference between mean values (p < .001). SD = standard deviation; CV = coefficient of variation; CTV = contribution to variance by P AC . reduced by 40%, 76%, and 79% when using monthly, weekly, and daily level data, respectively, compared to annual. However, the observed reductions in uncertainty were smaller with each increase in aggregated data granularity. At daily level, the highest data granularity in the study, CV was just 1% lower and CTV 7% lower compared to weekly level.

Fitted vs. "Default" Distributions
The first objective of this study characterized Pecan St users' AC electricity consumption by different probability distributions in the LCA model. The differences observed in estimated mean GHG among different scenarios can be attributed to the Monte Carlo sampling process, resulting in deviation from a static value such as would be obtained from a deterministic calculation. It is not unexpected, however, that the normal distribution model would produce the lowest mean value. The probabilistic mean value of distributions which are broadly lognormal in shape, such as the Weibull and lognormal here, will always be greater than the median value ( McDonald 2014). Accordingly, the results of a Monte Carlo LCA estimate where a large number of lognormal distributions are present will be biased higher than the deterministic calculation. Furthermore, in the case of the triangular distribution, there is no a priori relationship between the underlying mode used in a deterministic calculation and the probabilistic mean (Weidema et al. 2013). Thus, the triangular distribution model is liable to considerably overestimate the output mean GHG, especially where parameter distribution tails are long. This is of some importance when considering that a key LCA parameter such as P AC may often be characterized from limited data or expert opinion and characterized by the triangular distribution. The Beta-Pert distribution represents a viable alternative to the triangular distribution as, although still defined by the same location parameters (maximum and minimum) and the mode, its flexibility of shape makes it less sensitive to extreme values than the triangular (Muller et al. 2017). Although the estimated CV from the triangular model was in line with the lognormal, the scale of the overestimate of GHG cannot be overlooked. It should also be noted that the best-fit model estimated the largest CV and CTV of the scenarios. If we are to assume that the best-fit probability distributions, defined by shape parameters as opposed to only basic statistics, are a true representation of the data, this suggests that the default distribution models all slightly underestimated the true uncertainty in results in this case. A similar effect was also observed by a study characterizing high-granularity data on household energy loads in Sweden (Mumkhammar et al. 2014). They noted that during seasonal periods of low energy use, the fitted probability distributions had lower SDs than that of the underlying data they represented. During periods of higher energy use (i.e., household heating in Swedish winter), the fitted Weibull distribution remained close to the data while the SD of the lognormal increased. Similar to deviations between estimated mean electricity use and the underlying data, this might be an artefact of the stochastic patterns of higher energy use. However, Mumkhammar and colleagues also concluded that this might indicate heating electricity use energy, allegorical with our study's AC use, might not necessarily be optimally modeled with a lognormal distribution, at least during times of intense use. Weidema and colleagues (2013) stated that the choice of distribution has limited influence on the overall uncertainty of a product system since, in accordance with the central limit theorem, the sum of many independent variables each with their own distribution will always approach a normally distributed result. Although the results for estimated mean GHG in the present study broadly agree, there were nonetheless differences in the estimated uncertainty in output between the bestfit model and those employing default distributions. In products such as AC systems, where the use stage can contribute up to 90% of the life cycle GHG, data characterization and distribution choice for use-stage variables can therefore yield important differences in the level of probabilistic output uncertainty.
To adequately account for uncertainty, LCA studies and industry should employ a wider choice of distributions. The prominent ecoinvent database models the vast majority of uncertain life cycle inventory parameters as lognormally distributed. This distribution lends itself to modeling with provision of many useful properties; for example, it is guaranteed to yield a positive value, its right-skewness makes it convenient to model large values and uncertainties, and its definition parameters (mean and SD) are readily retrieved. Thus, while some inventory data may indeed be associated with lognormally distributed samples, for the majority this state simply reflects a choice made to default the lognormal distribution for all data (Muller et al. 2017). Recent developments (Wernet et al. 2016) have enabled incorporating Beta-Pert and gamma distributions, which allow more flexibility in the shape of the distribution depending on the value of the mode. Although these developments were designed to accommodate different scenarios of data availability (e.g., mode, max/min, and bound upper limit) within the framework rather than to assess the best fit to data, it nonetheless provides the co-benefit of additional distribution choice for inventory data characterization. A recent study showed that LCA results can differ based on characterization factors that are functions purely of different software implementation life cycle impact assessment methods, rather than of the underlying data (Speck et al. 2015). In that scenario, different models conducting the LCA of the same product, with the same inventory data, potentially provide different values of output GHG and associated uncertainty. Similarly, the choice of distributions chosen to characterize key parameters between models can thus give rise to differences in the overall uncertainty in outputs. This situation could arise whether the key parameters were user defined or predefined by the model. However, if the ultimate goal is to better reflect the true variation observed in the real world, or often case-specific data, enabling users to make the appropriate fit to data instead of conforming to predefined distributions must be a priority. Ciroth and colleagues (2012) stated that distribution defined by shape parameters such as gamma can be chosen to model data with its total uncertainty only when the shape and the scale parameters are perfectly known, otherwise the lognormal distribution which can be defined by descriptive statistics must rather be chosen. In the age of big data and pervasive sensing, with which to better quantify inventory values and profile product use, shape and scale parameters may be ever more accurately defined, in turn moving LCA away from default distributions and closer to true representation of modeled output uncertainty.

Benefits and Diminishing Returns from Additional Data
The second objective of the present study characterized users' AC power consumption in the life cycle model at four different levels of data granularity. With slight modification, the central limit theorem holds that the product of a large number of variables each with their associated errors will be lognormally distributed (Slob 1994). This implies that the higher the aggregation level of the power consumption data in this context, and the more probability distributions employed, the more likely the estimated output will tend toward a lognormal distribution. This effect can be observed in figure 1, although despite the difference in distribution shapes, the estimated output mean global warming potential values of the four output distributions were not significantly different to one another. Furthermore, results show that both CV and CTV were reducing with each subsequent increase in data granularity, thus reducing the estimated uncertainty associated with users' power consumption in the overall life cycle. Each increase in granularity provided a diminishing return on the uncertainty, however. The reduction in the spread of uncertainty can be observed in figure 1, where plotted distributions are sharper at each successive increase in data granularity. Although reductions in uncertainty presented diminishing returns at each step, the scale of the uncertainty reduction overall was considerable. Thus, we can say that in this case, the results certainly justify the effort of additional data collection beyond the annual level typically employed in LCA inventories. The effect is emphasized in this case by the considerable temporal variation in AC use owing to seasonal changes, where average daily high temperature in Austin, Texas, varies by 19°C between January and July (US Climate Data 2017). Illustrating the scale of temporal variation, mean AC energy consumption for a household ranged from 15 kWh (SD = 27) per month in January to 688 kWh (SD = 381) in August. This is, of course, consistent with the pattern of use that we would expect, with limited use in winter contrasting sustained AC use in summer. Individual cases of high use in winter lead to larger observed variability in January (CV = 1.8) compared to lower variability amongst a high number of consistent users in summer (CV = 0.5). The energy-use profile of a product such as television or an electric razor would still depend on the interindividual user variation, but is likely to be less influenced by seasonal variation on this scale. A recent life cycle study of AC use in a tropical city noted that CTV from variation in external temperature was considerably lower than that introduced by user decisions (Ross and Cheah 2017). The move from weekly to daily level data aggregation represented a further step up, theoretically incorporating a still greater account of users' temporal variability, but returning a 1% reduction in CV and 7% reduction in CTV. It is therefore fair to say that in this case, the additional benefit above weekly granularity was not as valuable. At daily level, we have arrived at a point where P AC no longer dominates the overall output and contributes less to output uncertainty than the other use-stage variables that are presented in table 1. Conversely, in the example of the PC or razor, we might now expect the interindividual variation at daily level to make a more significant contribution. In the present study, however, we could continue to incrementally increase data granularity at which P AC was aggregated and characterized, but the overall uncertainty in output becomes irreducible. Our study approach was broadly similar to that of Mumkhammar and colleagues (2014), who characterized household energy consumption in Sweden at using probability distributions at different temporal scales from monthly to hourly level. At hourly level, their model required 2016 different probability distributions to estimate annual energy consumption. From a modeling perspective, it is desirable to incorporate the comprehensive amount of information on temporal variability contained in a data set, while keeping the number of model parameters to a reasonable minimum. Given the diminishing returns with respect to each successive data granularity, the present study suggests modeling at weekly level with 52 probability distributions to characterize P AC was sufficient to reduce uncertainty in estimated outputs.

Implications for Life Cycle Assessment and Design for Environment
Advances in both software and hardware of appliances have created a wealth of embedded platforms to facilitate real-time continuous data collection and communication. The rise of pervasive and remote sensing present opportunities to gather rich data on the use of high-energy products, and a rapidly growing number of sensors around us also makes it possible to gain understanding of users in the real world (Park et al. 2014). In the past, large-scale data on household end-use energy consumption have been available at national or regional scales through extensive collection efforts, which can be found in the U.S. residential and commercial energy consumption survey (RECS) (EIA 2017) and Residential Monitoring to Decrease Energy Use and Carbon Emissions in Europe (REMOD-ECE) (de Almeida 2008). Many studies since the Millennium have estimated energy use associated with household appliances (Hertwich and Roux 2011), or future energy demand for AC and its response to climate-change scenarios (Sailor and Pavlova 2003;Isaac and van Vuuren 2009;Rapson 2014), based on the average household energy consumption and appliance holding from such survey data. The recent study of AC energy use in Japan employing an approach using Weibull distributions and reverse logistic curves to estimate appliance lifetime and energy consumption (Nishijima 2016) was similarly based on catalogue data. The study of Mumkhammar and colleagues (2014) employed high-granularity energy data recorded at 10-minute resolution, although this was collected at household level and not directly linked to appliances. In the future, high-granularity, appliance-level energy consumption information will be readily available for characterization through pervasive sensing. The Pecan Street Dataport (Pecan Street Inc. 2015), as employed in the present study, is an example of these disaggregated end-use appliance data already available for energy and water use at regional level. Many countries have now piloted or implemented smart electricity meters and app-based appliance-level monitoring, such as the more than 65 million smart meters deployed in the United States (EIA 2016), and the UK government plans to implement a smart meter in every home by 2020 (Smart Energy GB 2017). This foremost enables users to manage their efficiency of energy use, but also provides the framework to collate a vast archive of usestage data. This is further augmented by the proliferation of smartphone-based apps sharing data on user locations and activities. Cooper and colleagues (2013) noted that many LCA practitioners, notably those developing and using impact characterization factors, have for some time called for increased site specificity. Gathering data on this scale not only enables more accurate representation of variation and uncertainty in LCA, but would also inform design for environment, tailoring product design to prevailing user behavior and local environmental conditions. By harnessing the power of crowd-sourced information, sampling big data for smaller geographical areas should provide improved representation of the reality LCA is intended to portray (Cooper et al. 2013). Dunlop (ASHI;Dunlop [2012]) noted that one third to one half of all domestic AC systems in the United States were systematically oversized. Pervasive sensing detailing interindividual variability in this context could record how frequently systems were actually running at capacity, and therefore guide sizing decisions and avoid overcapacity, as well as define temporal and spatial patterns of appliance use. Such spatial information will hold implications for LCA, not just in product design and manufacture, as emissions arising from the consumption of energy by a product are considered direct use-stage emissions (ISO 2006). A product life cycle with 90% or more of emissions occurring in use stage is thus inherently dependent on the emissions factor for electricity generation, which can vary geographically and temporally, much like the use of the product itself. Modeling at higher granularities of product-use data therefore provides the opportunity to complement with rich information for other important parameters too, such as the electricity emissions factor. In the present study, an average emissions factor for electricity in the Texas ERCOT grid was employed with a measure of typical associated variability (Anair and Mahmassani 2012), but there may also exist extensive regional spatial and temporal variation in the electricity generation fuel mix. Hauck and colleagues (2014) stated that variability in electricity-generating plant technologies and plant efficiency contributes the most to uncertainty in an emission factor. Anair and Mahmassani (2012) note that the regional grids serving Texas and California comprised similar proportions of natural gas in the energy mix, yet the emission factor for California was 40% lower, owing mainly to a greater reliance on coal in the Texas ERCOT grid. With a lower electricity emissions factor, the scale of the use-stage contribution to overall product life cycle emissions and uncertainty will be considerably reduced. Furthermore, an energy mix with high renewables fraction will vary with availability of certain sources, for example hydro power seasonally with abundant rain or snow melt, and wind potential changing seasonally or daily. Therefore, an energy mix with a high proportion of renewable sources would result in lower overall emissions, but a relatively higher uncertainty. Some studies have accounted for substate-level energy mixes (Collinge et al. 2013); however, power trading between regions tends to drive the mix toward an average (Weber et al. 2010). While the present study focused on GHG as the impact category, a full LCA study will include many more categories, such as aquatic eco-toxicity or acidification and eutrophication potential. The influence of variation in product use and the electricity grid mix will also produce a range of uncertainty surrounding these wider impact categories, for example eco-toxicity scores associated with the extraction and processing of fossil fuels (Shah et al. 2008). Hertwich and Roux (2011) observed that the use stage of high-energy domestic appliances was so dominant with respect to the GHG profile that other life cycle stages could viably be ignored. However, in an LCA study involving AC systems, Shah and colleagues (2008) noted the high contribution to human health impact by metal manufacturing for piping and ducts stage, while potential eco-toxicity impacts were also high due to dispersion of metallic ions during the manufacturing of appliances. There is therefore a need to adequately characterize and quantify the balance of a range of impacts and their uncertainty from all stages of the life cycle. As pervasive sensing and the Internet-of-Things continue to develop, it will be possible to combine ever higher granularity product-use data in LCA with complementary high-quality emission and impact factors to even more accurately represent the life cycle emissions from different products, systems, and their uncertainty.

Conclusions
The present study demonstrated that although distribution choice between best fitted and default probability distributions does not necessarily significantly impact the estimated mean output GHG, there were differences in the estimated uncertainty in LCA output. Furthermore, the incorporation of usestage power consumption data into the model aggregated at successively higher data granularity reduced the CTV and overall uncertainty, albeit with diminishing returns. Results therefore justify the collection of high-granularity data sets representing the life cycle use stage of high-energy products in GHG studies. The availability of such data through proliferation of pervasive sensing presents increasing opportunities to better characterize data and increase confidence in the results of LCA.

Data Accessibility
Data pertaining to probability distributions employed for modeling in the present study are available in the Figshare repository via the following link: https://doi.org/10.6084/m9.figshare.5987569.v1