Address correspondence to: Chris Davis Jaffalaan 5 2628 BX Delft The Netherlands email@example.com
Industrial ecology (IE) is an ambitious field of study where we seek to understand systems using a wide perspective ranging from the scale of molecules to that of the planet. Achieving such a holistic view is challenging and requires collecting, processing, curating, and sharing immense amounts of data and knowledge.
We are not capable of fully achieving this due to the current state of tools used in IE and current community practices. Although we deal with a vastly interconnected world, we are not so good at efficiently interconnecting what we learn about it. This is not a problem unique to IE, and other fields have begun to use tools supported by the World Wide Web to meet these challenges.
We discuss these sets of tools and illustrate how community driven data collection, processing, curation, and sharing is allowing people to achieve more than ever before. In particular, we discuss standards that have been created to allow for interlinking of data dispersed across multiple Web sites. This is currently visible in the Linking Open Data initiative, which among others contains interlinked datasets from the U.S. and U.K. governments, biology databases, and Wikipedia. Since the types of technologies and standards involved are outside the normal scope of work by many industrial ecologists, we attempt to explain the relevance, implications, and benefits through a discussion of many real examples currently on the Web.
From these, we discuss several best practices, which can be enabling factors for how IE and the community can more efficiently and effectively meet its ambitions—an agenda for Industrial Ecology 2.0.
Industrial ecology (IE) has been defined as “the study of all interactions between industrial systems and the environment” (Graedel 1994, 23) and “the science of sustainability” (Ehrenfeld 2004, 1). It is about “things connected to other things,” a “systems-based, multidisciplinary discourse that seeks to understand emergent behavior of complex integrated human/natural systems” (Allenby 2006, 33) that can be thought to be composed of interacting technical and social networks (Dijkema and Basson 2009, 159) embedded in the biosphere. By using this holistic systems view, we hope not only to understand but also to shape the linkages between the economy, social concerns, and environment, in order to guide the world toward sustainability.
Clearly, IE is an ambitious field of study, and it is not uncommon to study systems ranging from the scale of molecules to the ultimate system boundary of “Spaceship Earth” (Fuller 1969). In these efforts, technological networks have been the subject of many analyses and quantitative tools (cf. Life cycle assessment [LCA], material flow analysis [MFA], substance flow analysis [SFA], input-output analysis [IOA], etc.) to analyze “industrial metabolism” (Suh 2004). At the same time, social networks have been the subject of many investigations into “social metabolism” (Fischer-Kowalski 1998; Fischer-Kowalski and Haberl 2007; Krausmann et al. 2008).
The study of both types of networks alone already represents formidable challenges. Data may be unavailable, inaccessible, incomplete, incompatible, or unreliable. To integrate data and knowledge to enable the analysis of socio-technical systems is even more difficult. Even when data on technical systems and knowledge on social networks does exist, there may not be a one-to-one mapping between concepts developed in different domains to allow their use in an integrated IE approach, as different scientific and engineering communities will each have their own vocabulary, perspectives, theories, methods, and tools.
Enabling collaboration is key to overcoming these barriers to advance and accelerate IE, since we are dealing with systems that cannot be centrally conceptualized (Allenby 2007) or understood by a single mind. Integrating information about these systems is difficult due to their complexity. Here, complexity should be understood as “the property of a real world system that is manifest in the inability of any one formalism being adequate to capture all its properties” (Mikulecky 2001, 344). For example, people can have different ways of describing the same things. Furthermore, information and knowledge relevant for analysis and development of these systems is often dispersed among different communities such as engineers, economists, and environmental and social scientists. Each of these communities in turn have their own vocabulary, perspectives, theories, and tools, which can hinder sharing between them, even though they may deal with related aspects.
This also comes at a time when several members of the Industrial Ecology community have called for IE to incorporate insights from complex systems theory (Allenby 2006; Ehrenfeld 2007; Dijkema and Basson 2009). By no means does this reduce the information needs, but rather it emphasizes the diverse nature of information that we need to integrate and process collectively.
Thus, we need to get better at managing and using data across disciplines and communities. In other words, we need to increase the effectiveness by which individual learning contributes to the collective IE body of knowledge. Just as our socio-technical systems have emerged as a result of the collective actions of millions, with useful parts being reused in ways unanticipated by the original contributors, in IE we should actively facilitate a similar type of evolution of the collection and reuse of data, information, and knowledge. Much progress, for example, can be made using structured data currently existing in many forms, ranging from spreadsheets to databases, although opportunities also exist for better managing unstructured data of a more narrative descriptive nature.
This calls for the use of state-of-the-art information and communication technology (ICT) to foster said collaboration and enable reuse, curation and expansion of datasets and knowledge, which ultimately would prevent researchers from having to rediscover information already known. This would alleviate them from tedious tasks that are better left to ICT and allow researchers to spend their time doing what they are best at: the intellectually challenging task of interpreting information and using their critical thinking skills to find relevant patterns.
However, as will be further argued in this article, IE is currently lagging behind other fields with regard to the sophistication with which it uses ICT to foster collaboration and enable the study of complex systems.
Already the Web has proven itself as an enabler for collective action, whether for the building of encyclopedias (Giles 2005) or coordination of political protests (Musgrove 2009), which begs the question “how can the IE community use the Web to its full potential in order to facilitate its research?” By this, we mean using the Web in more efficiently building a collective body of knowledge, creating feedback loops so that information once gathered can be more efficiently reused, and ultimately enabling the community to flourish.
Within this article, we will explore the implications of using existing tools available on the Internet today, particularly those that are not actively in use by IE researchers, who may not be aware of their potential or even their existence. To address this question, what will be discussed goes beyond having a simple Web page with contact information, downloading an executable program to run some model exercises, or putting your papers on a preprint server.
The first section of this article addresses the ways in which we handle information and introduces a number of key problems. Subsequently, we discuss how the Web already is being used to change how we collaborate and organize scientific information and knowledge, especially in ways relevant to IE. This will examine the trends occurring on the Web from past to current examples, and will highlight its implications. Here we will address tools and principles that can make the research process more efficient than it currently is.
Finally, after this exploration of existing tools and trends, we will build on the insights obtained and identify best practices that can lead us toward a Web-enabled IE that we have dubbed IE 2.0. What is proposed is not a silver bullet, but rather requires an iterative process and community dialogue where we identify opportunities that can be based on the tools discussed.
Our Relationship with Information
IE research can often be very data intensive due to the nature of the systems that we study. In trying to gather information about a particular system, we may find that some of it may be difficult or impossible to gather, of questionable accuracy, or of such a large amount that we do not know how to effectively navigate it. To begin to overcome these limitations, we need to not just reflect on what we are trying to achieve, but also employ a meta perspective on how we currently try to achieve our goals.
Consider, for example, an LCA study on electric cars. A researcher conducting the study may have access to proprietary data provided to her by various stakeholders or may use a commercial life cycle inventory (LCI) database. In both cases, she may still find that data about certain materials and associated processes are missing. She may then perform a literature review to see if other studies have been published containing process data of interest. Once this information is found, however, it is not always clear what the quality of the data is, and there is no easy way to know whether the data have already been superseded or if others have disagreements with certain aspects of them. Once she has completed a satisfactory data set, the researcher will then manually link together processes based on chains that are relevant.
Reflecting on the process, it may be seen that this is a tedious process that is to some extent subjective; it more often than not involves time-consuming expert consultation and discussions, obtaining and processing feedback from multiple stakeholders. In the process, not only the functional unit, but also the system boundary and allocation rules must be decided upon.
Adopting a meta perspective (i.e., what this means for IE research and advancement of the IE body of knowledge) reveals that without proper data and knowledge management the entire process cannot be replicated. If the specifics of the LCA procedure are not published or otherwise made publicly available, then other researchers interested in LCAs of electric cars will have to repeat the same procedure for the exact same set of data. Even if the underlying data were published and the LCA setup underwent a peer review process, if someone subsequently publishes more accurate material or process statistics, there is no easy way to trace how these improvements would trickle through to our researchers’ LCA results, let alone the conclusions based on them. It may thus be seen that data gathering and management are inefficient, while a large part of the information and knowledge generated during the completion of the LCA study on electric cars is under threat of fading away.
We face further problems related to the quality of data used, as illustrated with a recent study by Clarens and colleagues (2010) regarding an LCA of biofuel production from algae. This study came under fire for making conclusions based on data that were ten years old. It was stated that although the “research was conducted in a sound fashion, it was extremely outdated” (Bhanoo 2010). This raises concerns about having our work out of date even before it is published (Hamilton 2010), even when the work uses IE methods and tools as intended and accredited by a recognized center of academic expertise (e.g., for LCA, Institute of Environmental Studies [CML]; for MFA, Trondheim; for SFA, Yale) and makes use of data supplied by a commercial enterprise such as EcoInvent.
We also face challenges due to the enormous complexity and diversity of human socio-technical systems. For example, it has been calculated that a common Wal-Mart store has 100,000 different items in stock, with estimates placing the number of distinct products worldwide at around 10 billion (Beinhocker 2006, 8–9, 456–457). Clearly, this is a few orders of magnitude above our largest databases. While we desire a holistic view, we also need to consider “the real challenge posed by the systems idea: its message is not that in order to be rational we need to be omniscient but, rather, that we must learn to deal critically with the fact that we never are” (Ulrich 1988, 342; Ryan 2008, 9). While we cannot be omniscient, we can take concrete steps to improve the current situation.
Given these difficulties, we need to be using tools that enable our ambitions. In IE it is not only what we are doing that is new, but also how we are accomplishing our goals. Just as the idea of multidisciplinary work is still relatively new, the tools for fostering this type of collaboration are new and are being explored as well. While in IE some initiatives are emerging—such as in the input/output and LCI communities,1 truly online facilitated collaborative communities are still lacking. This leads to a problem where the field is not progressing as fast as it could and should to help build sustainable societies. For researchers, this is a concern because it has been noted that the tools we develop need to be eco-efficient (Schaltegger 1997), in that the benefit we receive from them needs to be greater than the amount of effort put into creating and maintaining them. If we spend so much time rebuilding existing knowledge, then we are hardly helping ourselves. However, beyond IE, progress has been made through various methods of Internet-enabled collaboration. While in several academic fields researchers are starting to realize benefits from using online tools, concerns have been raised that IE lacks a major online presence and exists primarily as an offline community (Hertwich 2007). This has deep implications beyond publicizing the field and enabling better discussions between researchers through tools such as Web sites and discussion forums.
The Web and IE
The idea that the progress of science can be enhanced through information techniques is hardly new, and over 60 years ago, Vannevar Bush (1945) expressed concerns that although we can enormously extend the record of scientific knowledge, it is much more difficult to make use of it. One of his criticisms related to traditional library indexing schemes as commonly exemplified by the Dewey Decimal System. While these schemes have served an important role in organizing libraries, they are fundamentally limited in that they provide a hierarchical organization scheme for all of humanity's knowledge. Every piece of information must fit within precisely one location. This reflects a reductionist view of science, and is not a scalable solution for multidisciplinary sciences, since it artificially separates topics that people are bringing closer together through their research.
Bush argued that hierarchical indexing schemes are unnatural since they do not mirror the way the brain naturally works. He said that the brain stores knowledge by “associative trails” (6), which can be thought of as a path where facts are connected to other associated facts. This is analogous to how a conversation may drift from topic to topic without an abrupt change, even though the starting and ending topics can be completely different.
He envisioned that researchers would be able to document “a trail of […] interest through the maze of materials” (7) as they connected relevant related facts spread across multiple resources. He took this idea a step further and proposed that users would be able to share and interweave these trails with others, resulting in a group collectively creating a road map of a large body of knowledge. He considered this idea to be quite powerful and went as far as to say “[w]holly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them” (8). One only has to spend time on Wikipedia to notice that this has already happened. Bush's biomimetic reflections on the human mind and information management techniques have already been demonstrated to be viable in helping to organize large amounts of information.
An important aspect of Bush's idea is the ability to leverage the strengths of both humans and computers. He noted that we cannot “equal the speed and flexibility with which the mind follows an associative trail, but it should be possible to beat the mind decisively in regard to the permanence and clarity of the items resurrected from storage” (6). This issue of interaction between humans and computers will be a fundamental aspect of the discussion in the rest of this article.
Returning to the example of an LCA researcher, in her study of electric cars, she may also be interested in including the effects of vehicle-to-grid applications, where the car is used to both store and supply power to the electrical grid (Kempton and Tomić 2005). In her research, she may collaborate with different domain experts in electric car design, electric grid operation, and urban planning, who may not be aware of the interlinkages that they will necessarily have with one another as a result of this technology being realized on a large scale. The result of her research is then not just an LCA, but a set of associative trails documenting an ongoing conversation that may shape the future evolution of vehicle-to-grid systems.
The next section will discuss many concepts and tools that are being used to enable ideas about how the scientific record can be consulted more efficiently. As will be shown, the requirements and guidelines discussed above are inherent in these and part of their design.
The State of IE on the Web
While concerns were raised above about IE being an offline community, there has been a range of discussions about and working examples of tools that leverage IE through the Web. For example, for over a decade the Web site eiolca.net (Matthews and Small 2000) has provided an online interface for economic input-output LCA, where one can calculate the material and energy resources required for a commodity and its related supply chain. Various databases have been compiled, such as for the Stocks and Flows (STAF) Project (Lifset et al. 2002) conducted by Yale, and data on material flows are available online through the work of the Sustainable Europe Research Institute (2010).
Furthermore, since Hertwich (2007) first voiced these concerns, many steps have been taken, and IE research can be found through various blogs,2 Facebook,3,4 and even on Twitter.5
A notable recent example is the Web site GoodGuide.com, founded by Dara O’Rourke.6 This site provides ratings so that consumers can get an overview of the health, environmental, and societal impacts of the products they buy. When you select a specific product, you are able to quickly get a view ranging from the level of hazardous/questionable ingredients, all the way up to various practices of the manufacturer. GoodGuide also has an application for mobile phones, which allows consumers to scan in bar codes using the phone's camera, and then automatically bring up information about the product.
Further innovative uses abound, such as AMEE.com, which provides an application programming interface (API) where people can develop Web applications to monitor carbon footprints and energy consumption (AMEE 2010). One of these is realtimecarbon.org, which takes information on the current level electricity production in the United Kingdom and calculates the resulting real-time CO2 emissions (Butcher 2010). The role of open source is being explored with the openLCA software (Ciroth 2007), and with Sourcemap.org, a community effort started by the Massachusetts Institute of Technology Media Lab to compile information about supply chains for products and their carbon intensity (Nicoll 2009).
These examples are encouraging, and show that the IE community and others are exploring different options available through the Web. However, going back to the original example of an LCA researcher, there are certain needs that are not met yet due to several problems. For example, some of these sites may make data available, but it is often essentially one-way communication, where it is either impossible or difficult for the community to discuss, curate, and improve the data. When sites do allow people to contribute and curate data, such as Sourcemap.org, they are often too limited to be of full use for researchers. For example, the problem with Sourcemap.org is that it only accounts for CO2 emissions and does not contain citations for data sources. A similar issue exists for GoodGuide.com, which has a consumer focus. While providing a large amount of detail on products, this information is presented on an aggregate level that does not give full detail into how the ratings are calculated. Finally, we can find no substantive examples of sites that are facilitating communication among a community of researchers. In the next section we will discuss emerging trends that hold promise in ameliorating these problems.
Toward a Web of Data and Knowledge
One of the ways that researchers are facilitating collaboration on the Web is through wikis. In our own experience using one as part of our daily work7 (Nikolic and Davis Forthcoming), we have found that it serves as a central focal and collection point for the work of the group. In doing so, it prevents relevant information from growing stale, getting lost within e-mail inboxes, or being recreated needlessly. Also, just as science is dependent on peer review for quality, the same applies for wikis. For every edit, there is a record of “who did what when,” and there are easy ways to monitor changes, revert to previous versions if necessary, and hold a community discussion about topics. Since we know “who did what,” people who perform responsible edits build reputation and trust, just as they do in offline communities.
Many notable examples of wikis facilitating data collection exist within the biological sciences (Waldrop 2008). While we study industrial metabolism, biology researchers are already using wikis to collectively map cellular metabolisms (Pico et al. 2008). This is an example of researchers creating their own wiki, and there are others where researchers have decided to use Wikipedia as a collaborative platform. For example, the RNA WikiProject8 started when researchers noticed that the top Google results for RNA were on Wikipedia. Instead of seeing the quality of the pages as a limitation, they saw it as an opportunity and decided to take ownership of them, using their own database to help populate and improve the quality of the pages. They also use further community contributions to these articles as a means to then keep their own database up to date (Daub et al. 2008).
This is a paradigm many people may be uncomfortable with at first, particularly due to concerns about how the quality of data can be maintained when contributions are allowed by people outside the trusted community. However, we should also consider that Wikipedia originates from a failed project, Nupedia,9 which mandated a seven-step editorial process for publishing articles. After three years, 24 articles were completed and 74 were still under review. By letting go of up-front quality control and instead allowing for continual peer review on a global scale, within four years it allowed for the creation of a resource able to match Encyclopaedia Britannica on the quality of science articles (Giles 2005). The key point here is that while having properly verified information is indeed of vital importance, experts should do what they can to avoid the situation where they themselves become the bottleneck to the flow of information, due to their own limited amounts of time and the amount of information they can feasibly process. It can be better to accept incomplete and possibly incorrect information, which can then be peer reviewed and cleaned up by the community, than to not have it at all.
While wikis have clear benefits and provide a platform whereby a community can continually contribute and update data, they are limited in a number of ways. In the next section we will discuss several emerging trends and tools that go beyond the basic functionality that wikis provide. The point is that very significant changes are occurring with regard to how we can manage information, and we need to evaluate the opportunities these can provide with regard to our research goals.
Leveraging Human- and Machine-Readable Formats
The World Wide Web began in a human readable format. In other words, it was composed of Web pages written by people that contained links to pages written by other people. This represented such an advantage that within several years the Web exploded. However, accessing information over the Internet is still relatively slow since it is constrained by the speed at which a person can read. While search engines have sped up the process of locating information, these still only serve us information one page at a time. While humans excel at understanding the context of information, computers are much better at handling large amounts of information and performing actions such as complicated queries and data mining. A large opportunity exists, therefore, for creating applications that better leverage the combination of human and computer strengths. To enable this, Web pages and information should also be machine readable and processable where possible (Antoniou and Harmelen 2008, 3); in other words, software can decode it, make out meaning, perform queries, and essentially perform something useful with the data.
Presently, advanced methods are deployed to combine “machine readability” and “human readability,” notably through the development and application of novel (open) standards. The prominent World Wide Web Consortium (W3C), which introduced the standards behind the Web, is actively developing standards to enable machine-readable formats to be dispersed across the Internet, just as HTML is a ubiquitous standard today for Web pages (Herman 2009). One of the most important standards developed is Resource Description Framework (RDF), which is built upon the Extensible Markup Language (XML) standard. XML already is commonly used for various purposes such as the EcoSpold data format for LCI data (EcoInvent Centre 2009).
One of the limitations of using XML for data exchange is that it does not have a way of representing the semantics or meaning of data (Antoniou and Harmelen 2008, 65). The RDF standard has been created precisely to alleviate this problem. When using RDF, one represents data as a series of triples of the form “subject predicate object” (similar to “apple hasColor green”). While this semantic setup can be used to create meaningful tables, in a table or Web page the subject, predicate, and object each can be represented by a unique identifier in the form of a uniform resource locator (URL), or in other words, a Web address.
To illustrate what this allows, we return briefly to the example of wikis. An interesting aspect of wikis is that they can allow for both unstructured and structured information to coexist on the same page. In other words, as shown in Figure 1, a page may contain a narrative description of a particular wind farm, and then be accompanied by an infobox on the right side describing standard properties such as its location, number of turbines, and maximum capacity. Since wind farms are power-generating facilities, they typically use an infobox that lists properties that are generic to all types of power stations.10 Given this structure, we can already create machine-readable sentences in RDF expressing concepts such as “Tararua Wind Farm”“Owner”“Trustpower” and “Tararua Wind Farm”“Maximum Capacity”“160 MW”, where items such as “Tararua Wind Farm” and “Trustpower” are identified by a URL that corresponds to their wiki page.
Capitalizing on information that has been structured this way on the Web, we can perform queries that search multiple pages, for example, to provide us with a list of the maximum capacity of all power plants. Where things get interesting is that information in infoboxes often points to pages that contain infoboxes themselves. The simple act of connecting relevant pieces of information together on a wiki can result in a large network of structured information, containing paths that can be followed, which are essentially associative trails. Since URLs used by RDF can be any address on the Web, it allows one to connect together data stored on multiple sites across the Internet. In effect, this allows the creation of a World Wide Web of data.
The RDF standard dates back to 1999 and has existed in its current form since 2004 (Klyne et al. 2004). The ability to query Wikipedia like a database started in early 2007 with the DBpedia project (Bizer et al. 2007), which will be described in more detail below. What has changed, though, is that a critical mass seems to be occurring, in a similar way to what happened when the Web first took off. This is happening under the vision of what Berners-Lee and colleagues (2001), the creator of the Web, has termed the Semantic Web. One of the key ideas is that information across the Web can be given semantic annotations to aid machine processing of its meaning, enabling us to better leverage the strengths of both humans and computers in understanding and managing information.
Returning to the example of the LCA researcher, she may be compiling a list of all the companies that make batteries for electric cars, as well as electric car manufacturers. She realizes that she needs to survey the companies involved in order to understand the economic situations in different areas, and to inventory the different types of technologies used. She starts out by compiling the list by hand in an MSWord file where she adds information related to their economic performance, technologies employed, and contact information.
This list quickly grows to many pages long, and she finds that retrieving and organizing information in there is increasingly difficult. Because information is stored as unstructured text, she can only search for specific phrases, and more sophisticated analysis such as finding relationships between things is essentially impossible.
After reading a paper about the use of wikis in scientific research, she next decides to start moving her data into a wiki, where she creates a page for each of the companies and reuses an existing company description infobox11 to structure information about them. Besides just compiling information about individual companies, she starts to make links between those pages to document the supply chains. Without attempting to do so, she has created not just a list, but a network of interrelated concepts representing the companies, their supply chains, and the technologies.
Entering information in a structured way on a wiki is just the start of a number of interesting opportunities. While the wiki has helped her to organize information, she can still only read one page at a time. For examining vast amounts of information, another step is required to allow for machines to do more sophisticated processing of it.
DBpedia.org is a project that has taken information from Wikipedia12 and converted it into a form available using Semantic Web technologies, such as RDF. For every page on the English Wikipedia, you can find a corresponding version on DBpedia. This means that industrial ecologists with Wikipedia articles, such as Brad Allenby13 and Roland Clift,14 already have entries for themselves on the Semantic Web.
One of the powerful standards developed for the Semantic Web is the graph query language SPARQL (Prud’hommeaux and Seaborne 2008). This allows for data across the Semantic Web to be queried in a similar way to how SQL is used as a query language for relational databases (Antoniou and Harmelen 2008, 106). The implication of SPARQL and the web of infoboxes described above is that we now have a network of facts where we can build queries based on chaining together pathways of relationships we wish to follow.
Returning to the example of the LCA researcher, she finds that she wants to get an overview of existing electric cars with physical properties such as their type of engine, weight, and physical dimensions. At the same time, she also wants to understand how these properties may be influenced by properties for the manufacturers. Knowing where the manufacturer is based, may indicate the general market they serve. She is also interested in knowing when the company was founded, since newer companies may be more willing to experiment with novel designs. All of this information can be retrieved now with a single query to DBpedia. The query performing this can be seen at http://ie.tbm.tudelft.nl/ie/index.php/JIEICT. This is a fundamentally different way of searching for information. Berners-Lee (2009) has stated that while Google is good for finding out what other people already know, the Semantic Web is intended to help you answer questions that no one has asked before.
The motivation behind DBpedia is not just an exercise in extracting data from Wikipedia, but it is a clear attempt to create a hub for the Semantic Web (Bizer et al. 2009). One of the benefits of Wikipedia is that since its aim is to create an encyclopedia, only one page will exist for a particular specific topic. This means that for many concepts that people may use in their own data, there likely exists a unique URL that corresponds to it. And by using unique URLs, separate datasets can then be linked together. Furthermore, the Wikipedia community has already worked out issues where one term may be used in multiple ways, through automatic redirects, or through disambiguation pages.
Linking Open Data
The Semantic Web goes beyond just DBpedia, and in an effort to gain a critical mass for it, the W3C and a variety of businesses and academic researchers are involved in the “Linking Open Data” initiative (Heath 2009). An aspect of this involves groups making databases available online using Semantic Web standards such as RDF. Databases are interesting for this initiative for two reasons. First, they potentially contain massive amounts of information, and when interlinked with other sources, this may create enormous value for a community. Second, they are already machine readable, meaning that they can be relatively easily converted to a proper format for the Semantic Web or may be used directly as is when combined with tools that are able to redisplay database information in RDF format to end users of data.
The state of the Semantic Web as of July 2009 is illustrated by Figure 2, which shows the interlinkages in the current cloud of projects that publish data using the RDF standard. As can be seen, large companies such as the British Broadcasting Corporation and Thomson Reuters (via the Open Calais project) are already putting content online based on Semantic Web standards. Other projects involve people taking publicly available data sources and republishing them as RDF and making them available for others to query through SPARQL end points. Notable examples of this include information gathered from U.S. Census Data, the CIA World Factbook, and Eurostat. Also notable is the GovTrack project,15 which aims to increase transparency in the U.S. government by linking together information about members of Congress. This site is especially useful for those who wish to investigate how funding of election campaigns has an influence on voting records. Not yet in this image are efforts by the governments of the United Kingdom and United States to bring their data into this network as well. The United Kingdom has recently launched data.gov.uk, which natively uses Semantic Web standards, and efforts are underway to take the U.S. government equivalent at data.gov and convert it to RDF16 (Ding et al. 2010). Nearly the entire bottom half of the cloud represents the Bio2RDF project17 (Belleau et al. 2008), which is an effort to take documents from publicly available bioinformatics databases and allow them to be accessed and interlinked using Semantic Web standards, which enables mapping of connections and relationships between genes, enzymes, proteins, and organisms.
Returning to the example of the LCA researcher, she next wants to find out what particular parts of legislation might be trying to facilitate the introduction of electric vehicles. She then goes to the site GovTrack.us,18 and then is able to do a query finding all U.S. Congress bills that are of the subject “Hybrid, electric, and advanced technology vehicles,” where she also queries whether the bill has passed or not, and the date at which the vote for this was conducted. If she wishes to investigate the political dimensions behind the bill, she can also list who first introduced the bill and then find the districts they represent, which in turn may give an indication of the particular industries that have a vested interest in the passage of the bill. Using sources such as U.S. Securities and Exchange Commission (SEC) filings (also available via a SPARQL end point),19 she can query other social aspects by asking “given the board of directors for a certain car company, list all the other companies on whose board they also sit.” In doing this, she hopes to find the patterns of influence at work that may steer future decisions of the companies.
The next step in the evolution of wikis has emerged in the form of semantic wikis, which natively combine the functionality of Wikipedia and DBpedia. On the surface, these look exactly like normal wikis, but they differ because they allow for special annotations on the content of pages, which can be directly mapped to an RDF format in the same way as illustrated in Figure 1.
A key advantage of semantic wikis is that they allow people without a background in information technology to take advantage of the trends mentioned in the examples above as both users and contributors. This is similar to how on the Web today people use various applications, without having to be concerned about the software and data structures that support them. Users may simply contribute plain text, semantically annotate sections of text, or enter information into premade forms. Since wikis can facilitate a collaborative process of continual improvement, information initially added as plain text may be gradually annotated and formalized by the community of users. Wiki pages are never “done,” but consecutive edits build upon the structure left by those who contributed before.
One of the limitations of traditional wikis is that it constrains users to viewing contents of a single page at a time. This is not always ideal, particularly when one is interested in aggregating information that is spread across multiple pages. For example, one could create a wiki collecting LCI information, where every process has a dedicated page. These pages would contain information on the size and types of flows into and out of these processes, with documentation describing the literature sources behind this data. At the same time, it would be useful to have a page that could list all the processes that generate electricity. This could be created by manually searching through all the pages, and then including links to all the relevant pages. The problem is that as new processes are added, this list will be out of date. Even worse, it will not be obvious that it is out of date until someone manually searches through all the pages again. If one also wanted to maintain other list pages such as on “processes emitting cadmium” or “processes designated as best available technology,” the administrative burden becomes prohibitively high for a task that is tedious and not the best use of human creativity.
With semantic wikis, you do not manually create lists, but rather you query them. In other words, you create lists by grabbing structured information spread across many pages. For example, a query can be set up to find the population values that are listed on pages that are tagged as describing something that is a type of city. By using premade forms and templates, consistent formats for structured data can be placed on multiple pages. As a result, information for a specific process only exists on a single page, so that when a value is updated, or a new process generating electricity is added, this change is automatically reflected through every other page that queries this information in some way. A further value of this is that it allows many different views on the same set of data, which greatly enables consistency checking. Returning to the power plant query example from DBpedia, one can easily find that properties such as “maximum capacity” and “fuel type” are not specified in a consistent way. Rather than exposing the weaknesses of wikis, this is a very powerful technique to be used with them that allows us to efficiently pinpoint and then correct pages that have problems that would have previously been prohibitively difficult to discover and fix.
Let us return yet again to the example of the LCA researcher. While collecting data, she would create a wiki page for every type of process, economic flow, and environmental flow. On each page for a process, data required for an LCA would be stored within a semantic form, ensuring that information was properly structured. She would also be able to include loose notes, which may be of value to other researchers wishing to know more detail of the process. Additionally, she may tag the wiki page with various keywords such as “bioelectricity” or “energy” to help other researchers later find this entry. Literature sources for the data would be indicated by including a link to a wiki page dedicated to that source. This would also contain structured information such as the author, journal, article title, and so on. This structured information would also be complemented by the inclusion of unstructured information, such as researcher notes on various aspects of the source. On this page, the researcher could also include links to other wiki pages indicating other sources that are related to the current one.
After editing the wiki, she is then interested in taking these data and using them within LCA software. This is accomplished by using a utility that essentially queries the wiki and extracts from it the types of structured information required by the software. This information would be exported into machine-readable RDF. Since RDF is built upon XML, which can already be processed by a number of LCA software packages, this will not require large modifications to existing software.
To make sure that information on the wiki is properly curated, the researcher would set up a watch list for pages that she has edited or is otherwise interested in. This means that whenever someone edits those pages or one of their accompanying discussion pages, she will be alerted and will be able to see who has performed the edit. When she visits the page, she can then click on the history link, which will highlight and pinpoint the exact changes that have occurred between the current page against the previous revision, saving her the trouble of having to read through the entire page every time someone makes an edit. She may find that another researcher is adding information to her page to connect to industry sector data for use in a hybrid LCA.
Although semantic wikis are relatively new technologies, they are starting to come of age. In December 2009, the U.S. Department of Energy (DOE) released their Open Energy Information site,20 which is actually a semantic wiki. Steven Chu, the current U.S. Secretary of Energy, has said “The true potential of this tool will grow with the public's participation—as they add new data and share their expertise—to ensure that all communities have access to the information they need to broadly deploy the clean energy resources of the future” (U.S. Department of Energy 2009). The current data are quite extensive, and already one can query pages describing over a thousand individual renewable energy facilities and then construct graphs of the growth of renewable energy capacity in the United States per fuel type (Davis 2010). This government wiki is open to the public now, and people can simply create a user account and start contributing.
Furthermore, there is already an example of a Semantic Wiki in use by Industrial Ecologists21 (Wouters and Feldmar 2010). Through our own work, we have set this up as a platform for IE master's students to document eco-industrial parks around the world. As of June 2010, it contains 227 parks and 115 literature references used to document them. To the best of our knowledge, it is the largest resource of its kind and was built within a period of only two months.
Conclusion for the Discussion of Existing Tools and Trends
While there are concerns about how effectively IE is using the Web to achieve its goals, interesting innovative examples do exist within the field, and undoubtedly further exploration is currently underway. However, we also should be aware of many of the emerging trends that may substantially change the way we do research. While many of these innovations are largely being driven by those with a background in Information Technology, people in other fields are starting to see the value of this and devising means to facilitate wider usage within society.
Towards Industrial Ecology 2.0
The previous section presented a rather broad view of how the Web is offering new possibilities that can facilitate research. In trying to take advantage of these trends, it is helpful to identify the design aspects that have enabled the success of these projects. This is not just about technology, but it also relates to people and how their research can be better facilitated by it. As such, the IE community and its work itself can be viewed as a type of socio-technical system (Dijkema and Basson 2009), where the technical elements of this system are composed of its methods, tools, data, information, and knowledge, while the social elements are made up of the community and the ongoing conversations between its members.
To give due credit, the detailed discussion of the use of the Semantic Web within IE has already been introduced by Kraines and colleagues (2005, 2006), who present a very thorough primer on the tools being developed and what these eventually could turn into. However, in our opinion, their focus is at too high a level for a beginning point for the IE community. For example, they emphasize using the Semantic Web as a platform for sharing research models. This definitely should be done, but it involves the added complexity and time investment of developing a meta-language to develop a collective representation and understanding of the design and assumptions built into these models. Furthermore, many of the tools described are more suited for advanced users with good programming skills and prior experience in using Semantic Web technologies. To be fair, many of the examples described here took off after the articles by Kraines and colleagues were published.
Our concern here relates to creating a critical mass of people involved (Shirky 2008). Just as with the World Wide Web, the more people get involved, the more interesting it gets and the more types of benefits can be realized (Gilder 1993). For people to engage, however, there needs to be a clear motivation for why they should invest the time to learn these new tools, and they should incur some immediate benefits. Furthermore, while the technologies behind the Semantic Web have existed for quite some time, it has not been exactly clear what they would lead to and how that would happen (Marshall and Shipman 2003). This is not necessarily a critique of the technologies involved, but rather a reflection of how technology development progresses. For example, vacuum tubes found an early use as amplifiers for radios, before it was realized that they could also be used to create logic circuits for early computers (Authur 2009). This could not necessarily be foreseen, and required experimentation and exploration for people to understand the potential. While we now see that there exist promising technology and examples, we need to examine several of the general best practices that we as a community of IE practitioners can implement or apply to get the most out of ourselves, our knowledge, and the Web.
Allowing for Self-Organization and Evolution
In moving forward, we need to be aware of opportunities that allow for better self-organization and sharing of knowledge. This process of self-organization in using the Web and these new technologies is a type of socio-technical evolution where we learn how to use the tools, and the tools shape the way that we work.
To initiative, drive, and shape this evolution, several requirements and guidelines have been identified by Nikolic (2009), which are applicable for using the Web for fostering collaborative research, data, information, and knowledge storage and sharing. First, there must be what can be called “local optimization,” or enabling optimization based on local needs. Each of the domain experts who have developed a set of skills and knowledge relevant for their jobs require ICT to facilitate their work. While they may not possess a systems view, they do one thing, and do it well. Second, we should accept that in generating knowledge, there is no termination criterion. There is a process of “learning by doing,” and the state of knowledge is never perfect or complete. It is “good enough” until further insights are gained. Third, the evolution of knowledge requires that we not place barriers on the continued use of knowledge, as research will always lead to further questions. Fourth, we also must recognize path dependency: each line of scientific inquiry represents a chain building upon previous knowledge and technologies, which represents only a single pathway that is explored. When we now have a “memory” of this pathway, when we arrive at a dead-end we can backtrack instead of rebuilding from scratch. Bush (1945) recognized this in talking about how machines can beat humans with regard to the “permanence and clarity of items resurrected from storage” (6). Information may not be recorded and will be forgotten, or may be stored in a form that is not easy to retrieve. A fifth requirement is that of “modularity.” While path dependence is needed to build structure, we also need the ability to branch from it when necessary in order to explore new pathways and lines of inquiry. If our knowledge consists of modular pieces, it allows us to rebuild it in novel ways and to attach new pieces. Sixth, we need to accept that there is no single correct way of looking at the system, since there are various static, dynamic, and ontological ways of describing and measuring it (Allenby 2006). For example, an electric car exists within multiple subsystems related to transportation, the power system, and social systems, among others. Finally, this entire process must involve a shared effort because it is too large for a single person to accomplish alone. We need there to be an overlap in visions, where there is an ecosystem of contributors ranging from specialized domain experts to generalists with system perspectives who can interconnect and translate different points of view for consumption by others.
The evolutionary perspective presented above hints at a different way of working: instead of converting an overall vision of what is to be into a hierarchical, massive top-down program that is under threat of over-engineering, these items acknowledge things will happen in a web and on the Web. Adopting a bottom-up philosophy, we recognize the self-organizing capabilities of this socio-technical ecosystem and leverage it by providing the niche players with the appropriate and effective tools and connections.
The most appropriate starting point in our view is with sharing data online, essentially in the form of raw data, observations, and annotations of data. The IE community already collects an enormous amount of information about monetary and physical flows from the level of individual processes to industrial sectors, a core activity that may be leveraged through Semantic Web technologies. This is not just about putting data online, but also about how we do it, and what we do once it is there. This will be explained below with a mix of guiding principles and by identifying some promising examples.
Connecting Both Producers and Consumers of Data
IE is clearly a very tools-oriented scientific field. Especially with the trends mentioned in the previous section, it is quite likely that the amount of information we use will increase, and we will need to continue to develop ways to manage and navigate information in intelligent ways. Although a variety of sources such as databases (Curran et al. 2006) and journal articles already exist containing relevant information, it is difficult to navigate these, make relevant connections among them, and perform quality checks. A further problem was illustrated previously with the example of the LCA study on biofuel production from algae (Clarens et al. 2010), where the most up-to-date information was confidential. As will be discussed, opportunities exist to better utilize information we have already, while also reconsidering how to approach the problem of necessary information that we do not have.
While we live in an age where access to information is unprecedented, the ability to make information hard to find can be a crucial factor for a competitive advantage. This duality has been described by Brand (1988, 202) who states:
Information wants to be free because it has become so cheap to distribute, copy, and recombine—too cheap to meter. It wants to be expensive because it can be immeasurably valuable to the recipient. That tension will not go away. It leads to endless wrenching debate about price, copyright, “intellectual property,” and the moral rightness of casual distribution, because each round of new devices makes the tension worse, not better.
If you cling blindly to the expensive part of the paradox, you miss all the action going on in the free part. The pressure of the paradox forces information to explore incessantly.
The point here is that while there are times when confidentiality of information is vital, we argue that the IE community can go quite far in exploring the free part. As we promote and invite interdisciplinary studies, we should not be actively promoting walled gardens of data, and there are very interesting trends happening in the free part from which we can benefit. For example, this free/expensive paradox will soon be explored for LCI data, with the introduction of the United Nations Environment Programme/Society of Environmental Toxicology and Chemistry (UNEP/SETAC) database registry (Ciroth 2009), which, as of this writing, is still under active development. This project is intended to create a single online hub where the suppliers of LCI data can be more effectively connected with the users of the data. This is a long-standing aspiration of the LCI and LCA community, where one of the challenges faced is that actually getting the data describing the life cycle of products is very difficult (Dijkema and Mayer 2001). Database providers will be able to control the level of access to their data, whether allowing it to be freely accessible or to only publish certain aspects to encourage people to buy the dataset. The process of connecting providers and users will be facilitated by a search engine that allows users to search over multiple datasets for the process information they require. One of the principles of this project is that “data quality is result of properties of a data set and what users require from the data set” (4), whereby this is no absolute metric of data quality is created, and users are responsible for how they use the data (Ciroth 2009). This represents a type of local optimization through which people can effectively find what is “useful” for their own needs.
Facilitating Community Discussions and Discovery
A key element of the UNEP/SETAC database registry is the proposed support for creating a community dialogue. Registered users will be able to comment on data sets and flag information that they believe needs attention, and moderators will be in place to help facilitate various activities. This is a significant advancement and enabler for peer review and improving the quality of data. While having dedicated research centers to compile LCI databases has been of tremendous value, they simply cannot know everything due to the complexity, diversity, and dynamic nature of the systems we study. Truly enabling data quality requires constant peer review by a large diversity of researchers, and this has simply been unachievable at such a scale. In describing the process of open source software projects, Raymond (2001) has said that “given enough eyeballs, all bugs are shallow” (19), and we need to employ the same kinds of processes.
Although the ideas embodied in this project are innovative, they are not entirely new, and we hope that this project will provoke the development of further collaborative tools. Already in different scientific fields it is not unusual to find communities collectively managing and curating data (Anonymous 2008; Doerr 2008). For example, the WikiPathways project is facilitating the study of metabolic pathways through the use of wikis and visual annotation tools (Pico et al. 2008). These activities are only possible because of the open source, open standards, open access approach. These tools serve as a means to aggregate individual contributions into an emerging higher level view of their systems of interest.
These examples represent a process of community learning by doing. To help further facilitate this, we have provided a place where we are educating ourselves about the mechanisms, practicalities, and possibilities of the Semantic Web through the use of a semantic wiki. We invite the community to learn and contribute with us through this portal where we have posted how-to's, manuals, and examples, including all those presented in this article.22
Using Machine-Readable Open Formats
A further enabling factor is the use of machine-readable formats, such as RDF, where possible. In particular, tools such as semantic wikis can facilitate this process since they can support a range of unstructured and structured information, while also hiding the technical details from common users. At a basic level, users can enter information, with very little effort required other than that involved in using a text editor. However, if structured information is present, options exist to “liberate” it into a more structured form and to perform operations such as one would with a database. As shown in Figure 3, while not all information belongs in a database, not all information should be left in plain text either. The format in which data exist can lead to different trade-offs. Although databases excel at manipulating structured data, it generally comes at the cost of expressiveness, and the addition of new types of data requires that the schema be properly extended if possible to accommodate them. While plain text can be useful for effectively communicating abstract concepts, it limits the speed at which information can be processed and makes it more difficult to compare across multiple sources. Furthermore, having data in machine readable open standards is a way to avoid issues of technical path dependence, since one is allowed to see the specifications behind them and use this information to write software converters between different file formats.
Utilizing Shared Vocabularies
The IE community typically must deal with, connect, and integrate knowledge and vocabularies from diverse disciplines such as economics, biology, engineering, mathematics, and so on. The collective development of ontologies may accelerate the process of building a truly interdisciplinary community, where data, information, and knowledge are discussed and represented not only in shared, standardized machine-readable formats, but also in shared human-readable formats that foster unambiguous interpretation. Applications using the Semantic Web typically mix different ontologies and vocabularies (Herman 2008). This helps to connect knowledge domains and data in a modular fashion, often in ways that are “good enough” due to incomplete information.
IE is sufficiently diverse that it excludes the possibility of a single master ontology that can be used to describe all information (Mikulecky 2001; Allenby 2006). Indeed, different ontologies are being developed that concern topic areas on which people can generally agree. For instance, there is the Friend of a Friend (FOAF) project (Dodds 2004), which provides a standardized data structure, that is, vocabulary for describing facts about people. There is also the Simple Knowledge Organization System (SKOS) (Isaac and Summers 2008), which aims to provide a vocabulary for creating thesauri, taxonomies, and other classification schemes.
Standardization such as this is important because it enables us to more effectively search over multiple data sources simultaneously on the Internet, since everyone uses the same terms to describe particular types of things. At the same time, for terms that may not be easy to standardize, people are still free to define their own terms in their ontology.
Effectively utilizing shared vocabularies will require an ongoing community discussion as data maintainers learn more about how other datasets may describe the same objects, but with different terms. As mentioned, there will never be a single master ontology, but people will discover opportunities to better standardize certain descriptive elements. This may require significant human attention to ensure that the correct meanings are aligned.
Using Open Standards, Open Source, and Open Data
A further enabling factor is the use of open standards, open source, and open data. In facilitating a community of researchers, we should aim to be open by default, and only include restrictions where necessary.
The use of open standards is essential for interoperability and is a key part of the Semantic Web. The Web that we know today would be impossible without open standards such as HTML (O’Reilly 2004). For example, imagine having to use different browsers, or even different operating systems, for Web sites using various proprietary standards. Proprietary standards inherently limit how widely data can be used, whether intentionally or unintentionally. In using open standards, we allow for data to be used in a predictable way by different software, and it also helps to “future proof” data for when older proprietary standards are no longer supported.
Further problems can arise when using closed source software. For example, if you require specific functionality that does not currently exist, then you need to either convince the developers that they should extend the software for you, or you have to completely build your own software that meets your needs. The first option is not necessarily ideal since it may involve a long wait if you are successful at all. The second option may be impossible if the software is complicated and you are part of a small company or group of researchers who simply do not have the resources to dedicate to this task. This is not a problem with open source software because you are able to extend and modify it. This is indeed one of the motivations behind the openLCA project (Ciroth 2007). A further advantage of open source is that when software projects are abandoned, then others are free to take over instead of having to reconstruct the functionality of the code themselves.
Examples of open data have been discussed within the previous section. This is part of a larger trend where some have argued that we are entering into the age of “big data” (Community cleverness required 2008, 1; A special report on managing in formation 2010, 3-5), where we will soon possess more information than we sensibly know how to handle. As this occurs, the value of experts will increasingly be based on their ability to interpret data, not merely in having it. The principle of open data is a means to help handle this flood.
Recapitulating, we started our discussion by pointing out that IE is a very data- and knowledge-intensive scientific discipline, dealing with a very wide range of system scales. Creating a coherent understanding requires diverse knowledge and data being brought together across these scales. We also discussed that IE is mainly an offline community, that even though positive developments are happening on the Web, it does not use the available technologies to their fullest potential. We then identified several promising technologies, such as the use of human- and machine-readable formats, the Linking Open Data initiative, and semantic wikis. Extracting the design aspects from these developments, we argued that moving toward IE 2.0 is a socio-technical co-evolutionary process that can be shaped and guided. Toward this goal, we presented five best practices, namely connecting the producers and consumers of data, facilitating community discussion and discovery, using machine-readable formats, utilizing shared vocabularies, and using open standards, open source, and open data.
The presented examples are not theoretical, but all exist and are actively being adopted within different scientific disciplines. The challenge for IE is to maintain relevance in a digital age, when others are investigating and using these technologies to help integrate interdisciplinary perspectives, scale their efforts, and increase their effectiveness. This will require an ongoing community discussion to identify opportunities that these tools and principles present. In particular, we believe that structured data already existing in spreadsheets and databases offer the most immediate opportunity, while unstructured data will undergo a more iterative process of community organization through platforms such as wikis.
Furthermore, it should be kept in mind that everything discussed in this article is only a means to an end. The ultimate goal is to understand how our socio-technical systems work, and how we may be able to take appropriate actions to best manage them. This is about rationally dealing with the fact that we cannot be omniscient about the systems we study (Ulrich 1988; Ryan 2008). Although we cannot have perfect knowledge, there are concrete opportunities to make more efficient and effective use of those things that we can and do know through the use of the appropriate tools.
We are grateful for the three anonymous reviewers whose feedback contributed to improving this article.
Chris Davis, Igor Nikolic, and Gerard P.J. Dijkema are affiliated with the Faculty of Technology, Policy and Management within Delft University of Technology in Delft, The Netherlands. Chris Davis is a Ph.D. researcher, Igor Nikolic is an assistant professor, and Gerard P.J. Dijkema is an associate professor.