Material intensity database for the Dutch building stock: Towards Big Data in material stock analysis

Re‐use and recycling in the construction sector is essential to keep resource use in check. Data availability about the material contents of buildings is significant challenge for planning future re‐use potentials. Compiling material intensity (MI) data is time and resource intensive. Often studies end up with only a handful of datapoints. In order to adequately cover the diversity of buildings and materials found in cities, and accurately assess material stocks at detailed spatial scopes, many more MI datapoints are needed. In this work, we present a database on the material intensity of the Dutch building stock, containing 61 large‐scale demolition projects with a total of 781 datapoints, representing more than 306,000 square meters of built floor space. This dataset is representative of the types of buildings being demolished in the Netherlands. Our data were empirically sourced in collaboration with a demolition company that explicitly focuses on re‐using and recycling materials and components. The dataset includes both the structural building materials and component materials, and covers a wide range of building types, sizes, and construction years. Compared to the existing literature, this paper adds significantly more datapoints, and more detail to the different types of materials found in demolition streams. This increase in data volume is a necessary step toward enabling big data methods, such as data mining and machine learning. These methods could be used to uncover previously unrecognized patters in material stocks, or more accurately estimate material stocks in locations that have only sparse data available. This article met the requirements for a Gold‐Gold JIE data openness badge described at http://jie.click/badges.


INTRODUCTION
The built environment currently accounts for roughly half of global material demand (IRP, 2019). Due to urbanization and population growth, demand for materials in the urban environment can be expected to keep increasing (Fishman et al., 2016;Wiedenhofer et al., 2019). Multiple strategies are required to keep this increasing demand manageable. Together with various strategies to improve material efficiency (IRP, 2020;Scott et al., 2018), it is essential that materials are as much as possible sourced from the demolition of obsolete buildings, rather than primary sources. Re-use and recycling in the construction sector is often categorized with circular economy policy (EU, 2017). To support these circular economy efforts in the construction sector, data on the potential for re-use is essential. This starts with a general understanding of which materials can be found in the built environment and in what volume. This type of data is crucial for making realistic bottom-up models that accurately represent real-world material intensities.
Data availability about the material contents of buildings is one of the biggest challenges for planning future re-use potentials (Augiseau & Barles, 2017;Lanau et al., 2019). Naturally, quantification of materials and their characteristics (mass, end-uses, quality, etc.) can only be feasibly measured empirically during either construction or demolition. The material contents of in-use building stocks can only be estimated. Therefore, studies that focus on the in-use stock typically assign statistically estimated material intensity coefficients (MIs, mass per floor area or building volume) to the in-use stock based on building archetypes, ideally matching similar construction types, locations, and periods. This approach is common in so called "bottom-up" studies of building material stocks, as for example applied to Amsterdam by Van der Voet et al. (2017) to Taipei City by Cheng (2018) and to the whole of Switzerland by Heeren and Hellweg (2018). However, top-down studies that convert floor space to masses of materials also need this type of data (Hu et al., 2010;Huang et al., 2018;IRP, 2020).
Compiling MI data is time and resource intensive, and often studies end up with only a handful of datapoints, resulting in unclear uncertainties.
Furthermore, construction relies on localized styles, preferences, technologies, regulations, and traditions etc., which vary greatly, sometimes even between neighboring cities. On the scale of individual buildings, multiple factors like the land plot's physical attributes including slope, ground and soil type, and hydrology and also societal and economic factors like the construction period, construction budgets, architectural design, intended use affect the actual masses of materials in a specific building. The result is that MIs are virtually unique to individual buildings. This is a rather unique phenomenon compared to nearly all other production sectors that have moved to standardized production lines decades ago. The representativeness of MIs as archetypes is therefore an acknowledged limitation in both material stock and flow studies, and in life cycle assessments of buildings and infrastructure (Saxe et al., 2020).
Compounding the issue of their relative scarcity, MI data are often compiled in an ad hoc fashion for a particular study to fit its objectives and within limitation of data compilation. There are numerous case studies reporting on MIs, amongst others, for Padua, Italy , Luxembourg (Mastrucci, 2017), Germany (Schiller et al., 2017) and Los Angeles, USA (Reyna & Chester et al., 2015). Tanikawa et al. have published multiple papers on stocks, dynamics, and impacts of building and infrastructure materials in Japan (Tanikawa & Hashimoto et al., 2009;Tanikawa et al., 2015). However, coverage of material types, end-uses, resolutions, and even units of measurement vary greatly between studies (Heeren & Fishman, 2019).
Together with the natural variability of MIs, these methodological choices and limitations lead to difficulties in harmonizing MI values for comparison and transferability between studies. Some efforts have been made recently, such as a test of transferability between German and Japanese MIs (Schiller et al., 2019). On a global level, Marinova et al. (2020) and Deetman et al. (2020) have reported on building-related material stocks and dynamics by combining material intensity data from different regions. Heeren & Fishman (2019) analyzed the available material intensity data in the literature and implemented this data consistently in per-country database.
In order to accurately assess material stocks at detailed spatial scopes, many more datapoints are needed to adequately cover the diversity of buildings in cities, and the diversities of materials found in buildings of the same type.
In this work, we present a database on the material intensity of the Dutch building stock, containing 61 large-scale demolition projects with a total of 781 datapoints (), representing more than 306,000 m 2 of built floor space. In our database, one datapoint (sometimes also called a record or observation) consists of a single building unit's material intensity values for all applicable materials in mass per unit of floor area, plus additional information on the unit, such as age and use. Our data were empirically sourced in collaboration with a demolition company that explicitly focuses on re-using and recycling materials and components, and therefore keeps relatively detailed records of material flows. We give data for both the building structure and building components (e.g., window frames, doors, radiators  (Yang et al., 2020), by far the largest country-specific collection of MIs to this date, to which our study further contributes.

METHODOLOGY
The material intensity database is based on empirical data from a significant number of real-world demolition projects in the Netherlands. Buildings are divided into seven sub-types. Each project can involve the demolition of multiple buildings, and each building can contain multiple dwellings (see Tables 1 and 2). This data was augmented with GIS data (Kadaster, 2018), which provided the exact surface area, year of construction, and volume of the demolished objects. The combination allows us to construct a material intensity database per square meter. For each demolition project, the identifiable materials and corresponding material mass were derived from the reported prospective demolition material flows. These flows were subdivided into two classes, based on circular demolition practice: materials related to the structure of the building (e.g., foundation, walls, roof), and materials related to building components (e.g., doors, ceilings, lamps, window frames). This separation is made because structural elements and components are processed differently during the demolition process.
In most cases the division in structural and component materials also makes intuitive sense. The exception being glass, which we report under structural materials but could also be argued to be a component material. During the demolition process the aluminum window frames are "harvested" on a component level to retain the material, environmental, and financial value. Unfortunately, most of the glass is broken during the harvesting of these components. This prevents the reuse of glass on a component level and is therefore removed from the demolition site together with the structural building materials.
For the present material intensity database, we use the split between structural elements and components to identify where certain materials are coming from in the building, thus increasing the accuracy of the material intensities estimates.

On-site data collection
We analyzed the documentation of 61 large-scale demolition projects with 317 buildings of varying sizes. Larger residential buildings contain multiple dwellings (represented as addresses in the governmental GIS dataset). The database contains in total 781 units. This data was provided by a specialized "circular" demolition company that focuses on re-use and recycling of demolition waste. This process involved several back-and-forth interactions between the researchers and the company to make their data consistent and machine readable. The provided building material data is based on a company specialist's estimate, of which the original purpose is to make an invoice for demolition. This estimate is done using on-site measurements of the to be demolished building.
Processing of the projects into corresponding material flows was carried out in Excel. The buildings we used for our dataset ranged in size from 59 m 2 floor area up to 23,857 m 2 floor area. An overview of the dataset is given in Table 1.

Data processing
After processing the raw data obtained from the demolition sites, we extracted several building attributes from a government provided GIS dataset that describes the built environment. These were used to further characterize the buildings found in the demolition projects, beyond the information collected by the demolition company itself. These included the year of construction, functional floor area, building type, and the numbers of dwellings per building and were extracted from the BAG3D (Kadaster, 2018). Using this information, the material intensity in kg/m 2 for each building type was calculated. Categorization of buildings in our database was based on the building types described in the BAG3D. Other construction and demolition waste Some of the components were simplified to a single material as they contained a multitude of materials that were impossible to identify separately. In these cases, the material with the biggest fraction was chosen.
The materials considered for building structure and components can be found in Table 3.

RESULTS
First, we present the average material intensities of the Dutch building stock. Then we will explore the variability in our dataset. The full dataset including all raw data is available in Supporting Information S-1 and S-2, and the data used in the key result figures are given in Supporting Information S-3.
As we can see in Figure 2, the average material intensity ranges from 612 to 1,909 kg/m 2 . Residential buildings can be found on both ends of the intensity scale, with utility buildings being relatively consistent around 1 metric ton/m 2 . Of note is the disparity between apartment buildings and residential high rises, where an apartment building has almost 50% more material intensity of a high-rise building. This is because the apartments and high rises in our dataset have a comparably sized concrete foundation, but a high rise contains more floor space, leading to a lower material intensity per m 2 . Conversely, "residential -single house" units have a relatively high MI because the foundation and the roof are divided by a relatively small floor area.
See Supporting Information S-3 for the average material intensity per building type and age cohort. Figure 3 shows the variability of the datapoints per building type in a boxplot, except for the building types with too few datapoints to construct a boxplot (n < 4). In those cases, we show only the average value. The non-office utility buildings have a particularly high variability. This is both because their functionality varies a lot, and therefore building design is least uniform of all building types. The very high datapoints can be explained by the fact that some buildings contain parking garages, and if these garages are part of the building structure, the resultant concrete flow also goes with the "building structure." The distribution demolition projects per age cohort and building type is shown in Figure 4. This figure also includes for comparison a representation of the distribution of the overall building stock of the Netherlands. Most demolished residential buildings were built in the period 1945-1970, while most office buildings are from 1971 to 2000. Some relatively recent utility buildings have been demolished (newer than 2000), but no residential buildings. This reflects the fact that demand for residential buildings has increased dramatically in recent years, while office buildings were overbuilt, and are currently converted to residential buildings where possible (Meeste Oppervlakteleegstand Bij Kantoren En Winkels, 2019). If an empty utility or office building cannot be refurbished into residential units, it will be demolished and replaced.

DISCUSSION
Compared to the existing literature, this paper adds both significantly more datapoints, and more detail to the different types of materials found in demolition streams. Reporting on materials such as gypsum, and differentiating between high-quality and low-quality wood, allows more detailed planning for matching circular demolition and construction material flows.

F I G U R E 3 Distribution of material intensities per building type
F I G U R E 4 Number of demolition projects per age cohort and building type in the present dataset. The gray line is added for comparison, and shows the age distribution of the overall Dutch building stock (BAG3D) The number of datapoints in our MI database is one or two orders of magnitude higher that the vast majority of previous MI datasets, joining just one other database with a similar scale (Yang et al., 2020). This increased richness of data is a step toward the applicability of big data methods such as data mining and machine learning for industrial ecology research. For instance, the data can be used with classification methods such as decision trees and random forests and clustering approaches to explore their inherent characteristics. These in turn can be used to study similarities and idiosyncrasies, or to test commonly accepted building classifications based on characteristics such as use types, periods of construction, and location. Combined with such methods, the high number of datapoints can also improve the assessment of archetypal MI ranges, which are currently typically based on simple averages of very few observations. The full dataset is provided in Supporting Information S-1, S-2, and S-3, so that researchers can use it as necessary.
Overall, we find material intensity per square meter to be in line with the findings of other studies (see Table 4), which increases our confidence in the accuracy of the detailed classification into separate materials. Several materials stand out. Regarding wood, the Dutch situation for residential buildings is similar to neighboring countries, but coming in on the higher end of the global spectrum. For concrete, we report similar concrete intensities except compared to Sweden and Norway. For bricks, our data is similar to Swiss data, and within range of the data reported for Germany.
Crucially, while we do not use this information in our analysis, the full dataset also differentiates in building age cohorts. The current dataset does not contain enough data to report a full analysis of the impact of building age. For example, we find a slight downward trend regarding metric tons/m 2 overall material intensity for office buildings, but this is not statistically significant. For some building categories, we have quite a few datapoints, but all of these are in the same age cohort, for example, apartment buildings. Material composition of buildings is continuously changing. In the Dutch context, buildings newer than those contained in this dataset will typically contain more concrete and less brick. However, the latest building trend is toward reduction of concrete and increasingly use biobased materials such as cross-laminated timber (Brabantse Corporaties Slaan Handen Ineen Voor Méér Houtbouw, 2020). By providing the raw data we hope to support other researchers that aim to understand the relationship between building age and circular economy potential.
Another observation follows from comparing the age cohorts of demolition dataset compared to the age distribution of the overall Dutch building stock (see Figure 4). The latter clearly shows the post-war lack of new construction, while the former illustrates that post-war residential units were generally of lower quality and are therefore demolished at much higher rates than pre-war or modern residential units. There also is a large surplus of office buildings in the Netherlands. These are generally either being refurbished into housing or demolished and replaced with new housing (Meeste Oppervlakteleegstand Bij Kantoren En Winkels, 2019). This explains the relative abundance of office buildings in our dataset.
While the reported data was empirically sourced from a demolition company that focuses explicitly on re-use and recycling, the type of demolition projects that the company accepts is similar to regular demolition projects. However, the data does skew toward relatively large projects (e.g., office buildings, residential buildings, rather than single houses).
Other aspects that could skew the data is the choice for demolition over refurbishment, and project location. The reported data only includes demolished objects, while the preferred option from environmental/material re-use point of view is to refurbish a building. In fact, during this project we also obtained significant amounts of refurbishment data, but unfortunately the quality was insufficient and consequently left out of the present work. The choice for either refurbishment or demolition is based on a variety of aesthetic, economic, or regulatory aspects. We do not expect that reporting only demolition data skews the average material intensity of the building stock, other than that certain age cohorts are overrepresented because newer buildings tend to be refurbished while older building are more likely to be demolished (see also Figure 4). With regard to location, the exact location of demolition projects is withheld for privacy reasons, but the dataset is fairly evenly distributed across the Netherlands, we do not expect a geographical impact on the reported average material intensities.
In conclusion, the combination of real-world demolition data obtained from and processed in close cooperation with a demolition company, combined with GIS data is a novel data acquisition method compared to existing work. This approach yields material intensities for the building stock, and-we believe-improves upon the accuracy of these data.

Limitations
Some smaller material flows are absent from the database, as they are removed from the building site with the general C&DW flow and therefore not well documented. Notably, insulation materials are not well represented, and while copper is reported in the components part of the database, this does not fully cover all the applications of this metal in a building.
Furthermore, the data reported here is based on expert estimates before the buildings are demolished. While they have a high degree of accuracy, it would still be an improvement if actual weighed material flows could be reported. The reason why this was not possible is because different materials are collected by different sub-contractors at different times. It proved infeasible from an organizational point of view to collect those data in a consistent manner.

Recommendations for future research
In a follow-up paper, we plan to apply this database to a dynamic model of several Dutch cities. Together with several other similarly sized datasets, first steps could be taken toward applying big data methods in analyzing stocks and flows of the built environment. Another logical step would be to add environmental impacts associated with the materials, as was done by Resch et al. (2020).
Beyond using the database in modeling exercises, we identify several opportunities for improving the data. The component section could be further subdivided into an "inside" and "outside" subset. Currently, components that are found outside the building (e.g., light fixtures on the parking lot) are also included. Because the relation between building size and number of components found in the area around the building is highly irregular (especially for utility buildings), future extrapolations could be made more accurate by modeling the area surrounding a building separately. The current dataset does not contain enough samples to further subdivide into age cohorts, which would be desirable. Finally, basements and foundations are a notable source of uncertainty, especially with respect to buildings with significant underground car parking.