An ontology for component-based models of water resource systems


  • Mostafa Elag,

    1. Department of Civil and Environmental Engineering, University of South Carolina, Columbia, South Carolina, USA
    Search for more papers by this author
  • Jonathan L. Goodall

    Corresponding author
    1. Department of Civil and Environmental Engineering, University of South Carolina, Columbia, South Carolina, USA
    • Corresponding author: J. L. Goodall, Department of Civil and Environmental Engineering, University of South Carolina, Columbia, SC 29208, USA. (

    Search for more papers by this author


[1] Component-based modeling is an approach for simulating water resource systems where a model is composed of a set of components, each with a defined modeling objective, interlinked through data exchanges. Component-based modeling frameworks are used within the hydrologic, atmospheric, and earth surface dynamics modeling communities. While these efforts have been advancing, it has become clear that the water resources modeling community in particular, and arguably the larger earth science modeling community as well, faces a challenge of fully and precisely defining the metadata for model components. The lack of a unified framework for model component metadata limits interoperability between modeling communities and the reuse of models across modeling frameworks due to ambiguity about the model and its capabilities. To address this need, we propose an ontology for water resources model components that describes core concepts and relationships using the Web Ontology Language (OWL). The ontology that we present, which is termed the Water Resources Component (WRC) ontology, is meant to serve as a starting point that can be refined over time through engagement by the larger community until a robust knowledge framework for water resource model components is achieved. This paper presents the methodology used to arrive at the WRC ontology, the WRC ontology itself, and examples of how the ontology can aid in component-based water resources modeling by (i) assisting in identifying relevant models, (ii) encouraging proper model coupling, and (iii) facilitating interoperability across earth science modeling frameworks.

1. Introduction

[2] In the hydrologic community, solving complex water resources problems has migrated from a monodisciplinary to multidisciplinary approach [Hornberger et al., 2012; Wagener et al., 2010; Scholten et al., 2007]. Earth science models have also grown in recent years from activities undertaken by individuals or small groups, to larger and more collaborative activities [Syvitski et al., 2011; Voinov et al., 2010; Famiglietti et al., 2008; Blackmon et al., 2001]. Together, these trends have increased the sophistication of models, not only in terms of the mathematical representation of physical processes, but also in terms of the software required for constructing state-of-the-art simulation models.

[3] In building modern modeling systems capable of supporting multidisciplinary science within a community of developers and users, we see two basic paradigms that can be adopted. In the first paradigm, the model is designed, built, and controlled by a small group of developers. The model code is under the control of this group, thus simplifying the development and maintenance of the code, but at the same time limiting the size of the community contributing to the model development process. In the second paradigm, a modeling framework is designed, built, and controlled by a small group of developers, but it is possible for a larger community to contribute the models used within the framework. Model developers create their models as components that adhere to standards required for making their model interoperable with other models in the framework. This component-based approach for building models is newer and less well established, however it has been gaining attention in recent years [Moore and Tindall, 2005; Syvitski et al., 2011; Goodall et al., 2011; Peckham et al., 2012].

[4] A key distinguishing feature of component-based modeling is that each model within the system is independent yet able to be integrated with other models in what has been described as a “plug-and-play” manner [Peckham et al., 2012]. Component-based modeling is a key principle in advancing modeling frameworks, providing the flexibility and extendability whereby a system can be assembled out of a set of independent functional units [Argent, 2004]. By decentralizing the model functionality into independent components, there can be a relative freedom from the assumptions controlling the more conventional centralized approach for constructing models. It allows specialists to focus on implementation details for individual components within a system, and it allows stakeholders a way to view holistic systems that are built from these more detailed components. These properties also make component-based modeling an attractive approach for building community-based modeling systems [Voinov et al., 2010].

[5] While there are obvious benefits to using a component-based approach for water resources modeling, there are also challenges that must be overcome in order to encourage broad adoption of the approach. One important challenge associated with the component-based modeling approach is that model integration is not simply the proper assemblage of components from a technical standpoint, rather it also requires scientific knowledge of the underlying coupling between each component [Athanasiadis et al., 2011; Voinov and Shugart, 2013; Elag et al., 2011; Castronova et al., 2013]. Coupled model components often exchange values based on a message-passing scheme where one component requests a particular variable from a second component. Because components may be built and maintained by different groups, the variables passed between components must be well described to ensure basic characteristics such as variable units are consistent between coupled components. This requires both establishing core metadata for concepts such as “variable” that are shared between models and creating software tools that are able to analyze and properly align messages between components [Athanasiadis et al., 2011].

[6] Scientists wanting to couple components across disciplinary boundaries or modeling frameworks face challenges that extend beyond simply ensuring variable units are consistent across coupled models. They must also consider (i) semantic heterogeneity across disciplines due to the variety of terminology used to describe the equations, variables, parameters, and units within models, (ii) diversity of concepts used to define component's functionality and relationships, which results in overwhelming complexity for linked model compositions, (iii) syntactic heterogeneity in metadata structure used to describe a component across modeling frameworks, which hinders a component's reusability, and (iv) coupling inconsistency resulting from mismatched spatial or temporal data exchanges, or from incompatible semantics used by different models [Argent, 2004; Voinov and Shugart, 2013; Peckham et al., 2012; Rizzoli et al., 2008; Elag et al., 2011; Janssen et al., 2011; Argent et al., 2006]. These issues collectively result in a lack of shared understanding and poor communication within and between users of the component-based modeling framework. They are an important reason for why scientists have argued that working in communities to develop models may result in more focus on the process of creating a model rather than the final product of the model itself [Voinov and Shugart, 2013]. Given these challenges, it is clear that if component-based modeling is to be broadly adopted by the community, these issues must be overcome.

[7] We believe that an important step in overcoming these challenges is for the community to agree on an ontology that specifies and organizes the concepts and terminologies related to model components used in water-related disciplines. An ontology is an explicit conceptualization of human knowledge that focuses on shared understanding by defining vocabularies that represent and communicate knowledge about a specific domain [Gruber, 1993]. Establishing a shared understanding of concepts aids in eliminating conceptual and terminological confusion [Beran and Piasecki, 2009; Uschold and Gruninger, 1996]. Furthermore, an ontology is the backbone of the Semantic Web, which was introduced by the World Wide Web Consortium (W3C) as a method and technologies for information integration, processing, and querying on the Web [Berners-Lee and Fischetti, 2001; Fensel et al., 2011]. Gruber [1993] specifies the primary characteristics of an ontology to be (i) a clear structure, (ii) an easily inferred relationship between concepts, (iii) the flexibility to merge with other ontologies, (iv) the extensibility to accommodate any required future modifications, and (v) the ability to overcome semantic mismatches between the information provider and the user. This paper outlines our effort to build an ontology that satisfies these five characteristics for use in component-based modeling of water resource systems.

[8] Our work relates to recent work in other related scientific disciplines to use ontologies to describe their domains. For example, Zhong et al. [2009] introduced an ontology in the geology domain to organize the concepts of fractures in order to facilitate communication among the highly diverse professional and academic communities related to the domain. In the agricultural domain, the System for Environmental and Agricultural Modeling; Linking European Science and Society (SEAMLESS) project developed a modeling framework to integrate approaches from economic, environmental, and social sciences. In this project different ontologies have been created to ensure the semantic and conceptual integration of models and future scenarios. For example, Rizzoli et al. [2008] created an ontology to enrich the semantics of model exchange items including parameters, I/O variables, and state, and Janssen et al. [2011] developed ontologies to ensure the semantic and conceptual integration between coupled models from different domains to assess agricultural land use changes. Finally, Janssen et al. [2009] created the assessment project ontology to unify the concepts used among stakeholders, scientists, and modelers in implemented scenarios across models, policy problems, and scales. These ontologies were created for specific use cases, and none of the ontologies address the objective of this work: to define a component model used within the water resources domain.

[9] The major contribution of this research is, therefore, an ontology that provides a unified and structured view of water resources model components. We acknowledge that creating a complete and robust ontology for water resource model components is beyond the scope of a single paper because doing so is an iterative process that requires input and refinement from a larger community [Janssen et al., 2009]. Given this, our goal is for the ontology to serve as the starting point for a community agreed upon ontology for water resource model components. If such an ontology can be established, it will enhance component-based modeling activities both within the water resources community and across disciplinary boundaries by establishing an agreed upon understanding of the knowledge underlying model components. This process cannot proceed without a beginning ontology that establishes the core knowledge framework where experts from different domains can contribute their our conceptualizations and metadata needs. Our work is meant to provide this beginning ontology that builds on related efforts to create ontologies within the Earth science community, but will likely evolve as more developers and users engage in the design process.

[10] The remaining sections of the paper are organized as follows. Section 'Methodology' discusses the methodology used to create the ontology along with background theory on ontologies to orient the reader. Section 'Results and Discussion' presents the proposed ontology, which we have named the Water Resources Component (WRC) ontology, and provides examples of how the proposed ontology can be applied to support modeling activities. Finally, we summarize our work and discuss possible directions for future research in section 'Summary'.

2. Methodology

[11] In designing the water resources component ontology, we used the widely accepted skeletal methodology described by Uschold and Gruninger [1996], which has been successfully applied for building many ontologies [e.g., Kim, 2005; Brilhante and Robertson, 2001; Patil et al., 2005; Biletskiy et al., 2004; Bermudez and Piasecki, 2006]. The approach (summarized in Figure 1) begins by first defining the purpose of the ontology and its design requirements. Second, building the ontology is accomplished in three phases: (i) concept capture, (ii) coding, and (iii) integration with complementary ontologies. These steps are necessary to define the concepts used within the community, determine the method of presenting these concepts, and benefit from prior efforts in building related ontologies, respectively. Third, the ontology is evaluated before sharing with the community to ensure consistency of concepts and their relationships. Fourth, documentation of all important assumptions and key concepts is included within the ontology in the form of natural language. The fifth and final step is to describe the guidelines used in building the ontology. The following sections elaborate on these steps, especially on the ontology building phases because these are arguably the most complex and challenging steps in creating an ontology.

Figure 1.

Methodology used for development of the Water Resources Component (WRC) ontology, adopted from Uschold and Gruninger [1996].

2.1. Purpose and Design Requirements

[12] Defining the ontology purpose, use, and its target audience provides a clear focus in the subsequent building stages [Scholten et al., 2007; Uschold and Gruninger, 1996; Beran and Piasecki, 2009; Athanasiadis et al., 2011]. The purpose of the Water Resources Component (WRC) ontology is to promote the interoperability of components across water-related disciplinary boundaries and modeling frameworks. We aim to provide modelers using a component-based modeling approach with a tool that helps them in selecting the correct components to be coupled and aids in minimizing coupling conceptual error. Lastly, we intend for the ontology to be a tool adopted by researchers, educators, and practitioners in water-related disciplines.

2.2. Building the Ontology

2.2.1. Concept Capture

[13] Uschold and Gruninger [1996] describe the concept capture phase as identifying and defining the basic ideas, relationships, and terms corresponding to a domain. In the WRC ontology, concepts and terminologies were identified from the analysis of domain metadata initiatives and other ontologies developed in related domains. Specifically, two component-based modeling frameworks, one knowledge management system for water quality modeling, and one web-based hydrodynamic simulation system were analyzed to capture the coupling process concepts. In selecting these examples, the aim was to capture key initiatives in the Earth science community that directly relate to the purpose and design requirements identified in the prior step of the methodology. Below we summarize each example.

[14] The Earth System Curator (ESC) is a project to develop metadata describing the digital resources used in climate simulations [Dunlap et al., 2008]. The aim of the work is to develop a metadata schema that describes numerical climate simulation software as well as their output data sets. Dunlap et al. [2008] defined three tasks of the metadata to describe numerical climate models. First, the metadata has to relate the software and its associated output data sets. Second, it should provide sufficient detail about the degree of technical compatibility between two simulation models. Third, it should describe the climate modeling software interface where the models can be coupled. ESC recognizes six metadata keys to define these three features. These keys are represented as packages using the Unified Modeling Language (UML) to group data elements organized in a hierarchical structure. There are four packages used to describe a model: (i) the configuration package that contains data describing the model arrangement for a simulation, which is required for reproducing simulation runs, (ii) the modeling package for describing the technical, numerical, and scientific model details, (iii) the grid package for defining the spatial metadata used by a model, and (iv) the coupling package for defining technical aspect of model coupling (Table 1). The concepts used in the modeling package are heavily based on the Numerical Model Metadata (NMM) schemata and Earth System Modeling Framework (ESMF) metadata. The fifth package describes output data sets, and the sixth package is used for describing general metadata about any resource.

Table 1. Basic Concepts and Properties Used to Describe Models in Earth System Curator (ESC) [Dunlap et al., 2008]
AuthorName, organization name, position, telephone, address.
Institution/AgencyAddress, telephone, e-mail, fax, hours of service.
ProjectPurpose, resource provider, owner, principle investigator.
ReferencesTitle, ISBN, edition, publisher, author.
TechnicalPlatform, programming language, supported compilers.
ScientificParameters (name, values used in simulations), model initial conditions.
NumericalSpatial and temporal discretization of the component, numerical method information.
InterfaceModel I/O file (format and name), Model I/O data set.
CouplingExchange data (quantity type, physical units, min/max value, data relation with the model, field dependency).
GridProjection, vertical/horizontal coordination system, geometry, discretization, etc.

[15] The Community Surface Dynamics Modeling System (CSDMS) is a community-based modeling environment established on open source software modules that focus on simulating a wide variety of Earth surface processes that interact over various time and space resolutions [Syvitski et al., 2004]. CSDMS has three principal components: Standard Utilities, Modules, and a Toolkit [Syvitski et al., 2004]. Standard Utilities handle data structure, graphics rendering, module connectors, and a web interface [Anderson et al., 2004]. We used two sources for capturing concepts from the CSDMS. First, we used a questionnaire that the CSDMS team asks model developers to complete when submitting their model to CSDMS. The information collected from the questionnaire is used primarily to build help documents along with a reference key related to the model that are made available through the CSDMS web site. Second, we used the recently developed Model Metadata File (MMF) that CSDMS requires when models have been componentized to follow their Initialize, Run, Finalize (IRF) standard. Table 2 depicts the fields in the questionnaire and MMF that are used to describe a model. Within the MMF file, CSDMS uses a scheme for standardizing names that follows an object + quantity pattern ( This work to standardize names was not used in the knowledge capture because standardizing names can be viewed as a related but parallel effort to establishing a knowledge framework [Villa et al., 2009].

Table 2. Basic Concepts and Properties Used to Describe Models in the Community Surface Dynamics Modeling System (CSDMS) [Peckham et al., 2012]
ModelerFirst/last name, type of contact, institute/organization, address, email address, phone, fax.
SummaryName, synonym, type, future plan.
IdentityDomain, description.
DocumentationKey papers, manual, module forum, comments.
TechnicalPlatform, programming language, memory requirement, typical runtime, license type, optimal processor, module availability type, source code location.
SoftwaresPre/postprocessing, visualization.
Data fileType (I/O or calibration), file description, file physical location.
I/O parameterParameter description, parameter format.
ProcessDescription, equation, parameter, time/space constrains, limitations.
CouplingInterface type, architecture type, framework availability.
Spatial resolutionSpatial dimensions, spatial extent.
Temporal resolution 
Development history and statusRevision dates.

[16] Chau [2007] introduced an ontology-based knowledge management system for water flow and quality models. It adapts a three level architecture for intelligent decision support, namely the object, application, and description levels. The object level stores knowledge sources and information about models. Users interact with the object level through the application level, which is commonly a Graphical User Interface (GUI). The description level identifies two ontologies: (i) the information ontology, which includes generic concepts and attributes of knowledge sources and information stored in the object level, and (ii) the domain ontology, which includes key concepts and attributes of water quality models and the output flow. Chau [2007] defined features and conditions to describe water quality models in the domain ontology: (i) features describing the numerical model (e.g., numerical method, scheme, time stepping algorithm, initial conditions and boundary conditions, etc.), (ii) conditions for the model parameters (e.g., discretization method, dimension of influence, and the spatial and temporal conditions of the model parameter). Each feature and condition has one or more possible values, and these values are indexed with a unique identifier. Therefore, a numerical model can be described by a combination of one or more indices.

[17] Islam and Piasecki [2008] introduced an ontology for the data associated with running a hydrodynamic Web-Based Simulation (WBS) system. The WBS system depends on communication between the user and the central simulation server via a web browser. Its environment consists of a (i) simulation domain ontology, (ii) code and coding language, (iii) Graphical User Interface (GUI), and (iv) data storage system. The simulation ontology focuses on describing the model, as well as the geospatial and hydrodynamic data of the modeling environment. The ontology has three basic concepts: MetadataModel, MetadataModelData, and MetadataGeospatialData. First, the MetadataModel stores information about the model itself (e.g., name, description, start time, end time, etc.). Second, the MetadataModelData includes information about the variables used by the model, including metadata related to the grid, the organization of I/O data, and the flow of the data stream. Third, the MetadataGeospatialData contains the grid, vectors, and rasters. The Grid concept describes the numerical grid contents (nodes and elements) and the relationships of the grid contents with the model data.

[18] From these examples, we extracted a collection of unstructured information that needed to be organized in an intentional semantic structure, meaning that the information is structured independent of any specific interpretation or situation. There are three approaches for accomplishing this knowledge organization and restructuring task: top-down, middle-out, or bottom-up [Uschold and Gruninger, 1996]. Top-down starts with the highest concept and moves down for details while, logically, bottom-up begins with details and tries to generalize them. The middle-out approach conserves a balance in terms of the level of details and has the advantage over the other approaches that it allows higher-level classes to arise naturally [Uschold and Gruninger, 1996; Fernández-López et al., 1997]. This middle-out approach starts with the important concepts and then gradually abstracts the higher-level concepts; therefore details arise depending on their necessity. For this reason, and based on the recommendation from prior studies [Uschold and Gruninger, 1996; Beran and Piasecki, 2009; Scholten et al., 2007], we elected to use the middle-out approach in this study.

2.2.2. Coding the Ontology

[19] The process of coding the ontology is where the domain metaontology (i.e., information about ontology components) gathered in the concept capture phase is represented in a formal specification using an ontology coding language [Uschold and Gruninger, 1996]. The advantage of using a formal language is to establish the information about the concepts and their relationships in the form of axioms. Many ontology coding languages are based on the eXtensible Markup Language (XML) including the Ontology eXchange Language (XOL) and Ontology Markup Language (OML). XML is a flexible, self-describing markup language format designed for data exchange over the World Wide Web (WWW) [Bray et al., 1997]. It provides a hierarchical structure for encoding data and its relationships; however, the approach can suffer from syntactic heterogeneity because data can be organized in many different ways. To address this shortcoming, ontologies based on Resource Description Framework (RDF) have become popular, for example the Ontology Interface Language (OIL) and Web Ontology Language (OWL). RDF is often described as a simple yet powerful data model and language for describing Web resources where a “resource” is defined as an object that is uniquely identified by a Uniform Resource Identifier (URI) [Antoniou and Harmelen, 2009; Fensel et al., 2011; Uschold and Gruninger, 1996]. RDF is used to represent data as statements about resources where a graph is used to connect resource nodes to their property values with arcs labeled with properties [Abadi et al., 2007]. However, RDF by itself does not provide a mechanism for defining internal relationships between properties and concepts [Garrido and Requena, 2011]. OWL provides a means for extending RDF with larger vocabularies and stronger syntax for describing properties and concepts in a human and machine-understandable format [Antoniou and Harmelen, 2009; Hitzler et al., 2009]. Thus, OWL enables a machine to infer relationships between concepts and make decisions like an expert user.

[20] OWL has been accepted and recommended as an ontology language by the World Wide Web Consortium (W3C). It is XML based and applies RDF syntax [Antoniou and Harmelen, 2009]. We elected to use OWL as the formal ontology coding language for these reasons and because it is currently the most prominent ontology language for the semantic web, and it is compatible with most querying languages [Antoniou and Harmelen, 2009; Hitzler et al., 2009; Garrido and Requena, 2011]. OWL provides sophisticated modeling constraints such as explicit cardinalities, universally and existentially quantified property constraints, and class definitions based on the union, intersection, or complement of other classes. These constraints provide a semantically rich conceptual model with advanced inferencing capabilities [Dunlap et al., 2008].

[21] There are three versions of OWL, each with increasing complexity: OWL-Lite, OWL-Description Logic (OWL-DL), and OWL Full [McGuinness and Van Harmelen, 2004]. OWL Full uses the same semantics and syntactics as RDF, however users cannot verify how the machine understands the coded relationships between concepts [Antoniou and Harmelen, 2009]. OWL-DL is sublanguage of OWL Full and is compatible with the Description Logic, which allows the ontology to be changed into a mathematical relationship (e.g., “concept A is a subconcept of B” is interpreted as A ⊂ B). This allows a reasoner to interpret the implicitly coded relationships between concepts into explicit relationships based on the machine understanding, which helps the user to revise the ontology relationships. OWL-Lite is a subset of OWL-DL that excludes some language constructors used in OWL-DL. We chose to use OWL-DL because it provides the needed expressiveness for concepts and relationships, and its consistency can be checked using efficient reasoning in a short time compared to OWL full [Antoniou and Harmelen, 2009; McGuinness and Van Harmelen, 2004].

[22] Some of the available tools for creating OWL documents are OntoEdit, OilEd, and Protégé. The latter is developed in Java and is an open source, user friendly, and extensible software system. Protégé also enables an ontology structure to be defined by automatically generating forms that facilitate knowledge acquisition. While the other tools considered for this work, OntoEdit and OilEd, offer powerful ontological formats, we concluded that they lack the comprehensiveness and user-friendly features of Protégé. Therefore, we elected to use the Protégé (version 4.2 alpha) which supports OWL (version 1) for our work.

[23] An OWL ontology consists of namespaces, classes, properties, and individuals. Namespaces are the first component declared in the ontology document and are used to identify the resources of the standard used to define the ontology and its elements. All namespaces used by the ontology are declared at the beginning of the ontology document. Ontology elements are prefixed with a tag referring to a namespace to declare that the element resides in this namespace. Classes define the concept that groups individuals (instances) that have common properties. Classes can be organized in a specific hierarchy using the “Is-A” relationship. For example, the Independent Variable class “Is-A” subclass of the Variable class. In this relationship, the Variable class is called a superclass. Properties can be one of two types: an object property, which describes the relationship between classes, or a data property, which defines a data type and value of an instance. Individuals, as stated before, are instances of a class. Property restrictions are used to subset a group of individuals from one class into a new subclass in which its members participate in a specific relationship.

2.2.3. Integration With Existing Ontologies

[24] Ontologies are built to be reused in different applications [Fernández-López et al., 1997]. One common way ontologies are reused is through integration with other ontologies. Not only does this help in speeding up ontology construction, but it also allows reuse of definitions and properties already built and tested, providing consistency and robustness to new ontologies. However, it is necessary to check that the metaontology of the imported ontology corresponds to the design needs and that the imported ontology provides similar semantics before implementing the concepts within the new ontology [Fernández-López et al., 1997].

[25] An example of ontology reuse is describing measurement units. Units, of course, are a fundamental concept in the water resources modeling. They are used to define variables, parameters, and universal constants, and are therefore necessary metadata for exchanging quantities between components. The Semantic Web for Earth and Environmental Terminology (SWEET) is a project developed to create a knowledge base for improving the shared semantic understanding of Earth science data [Raskin and Pan, 2005]. It provides a higher-level representation of keywords used in Earth sciences through a collection of formal ontologies coded in OWL. One of the ontologies encoded within SWEET is a units ontology (SWEET, that describes the hierarchical structure of units as depicted in Figure 2. Concepts, structure, and relationships are derived from the Unidata's library created by the University Corporation for Atmospheric Research (UCAR) to support manipulation of physical quantity units. The SWEET unit ontology satisfies our concepts about units because it includes an extensive database of units, definitions for prefixed units, and conversion factors between compatible units [Steward et al., 2009]. In addition, it is extendable to reflect units used in other scientific domains. Therefore, the SWEET unit ontology was integrated into the WRC ontology for describing units.

Figure 2.

The hierarchical structure of the Semantic Web for Earth and Environmental Terminology (SWEET) Unit ontology, which integrates with the Water Resources Component (WRC) ontology to describe symbols units.

2.3. Evaluation, Documentation, and Guidelines

[26] Evaluation of the ontology includes both validation and verification [Gómez-Pérez et al., 1995]. Validation guarantees that the ontology definitions correspond to the system that they are representing. The validation also provides information about whether the ontology definitions are sufficient and necessary [Gómez-Pérez et al., 1995]. Verification ensures that the structure of the ontology is built correctly. It includes individual definitions and axioms, imported definitions from other ontologies, and the inferred axioms from the collection of definition and explicitly mentioned axioms. We used Pellet as a reasoner for ontology evaluation because it is the most common reasoning engine used with Protégé OWL and because it is compatible with the DL system. Sirin et al. [2007] and Konstantinou et al. [2008] summarized the advantages of using the Pellet reasoner including its ability to (i) check the standard reasoning services such as cardinality restrictions, complex subproperty axioms, and user-defined datatypes, (ii) ensure consistency of an ontology axioms, (iii) compute for each named class the expected subclass relationship to create the complete class hierarchy, (iv) infer the most specific classes that an individual belongs to, and (v) determine the possibility of a class to accept instances. Documentation of the metaontology facilitates the ontology handling among community members and it guides users in the updating process. Finally, documentation of the WRC ontology is straight forward and uses natural languages through the facilities provided by Protégé. Attached with each class in the WRC is a comment defining the underlying concepts, assumptions, and relationships, along with a reference to the concept source if the source is available.

3. Results and Discussion

3.1. WRC Ontology

[27] An overview of the WRC ontology developed using the methodology described in the previous section is presented in Figure 3 and is available at The analysis accomplished in the concept capture phase results in identifying 18 superclasses to describe a model component. The superclasses used to describe a component are grouped into four ontological layers: resources, scientific, coupling, and technical. In this section, we describe these four layers of the WRC ontology including their class hierarchy, relationships, axioms, and restrictions. We organized the concepts into these four layers because we wanted to center the ontology around the component concept. Other layer groupings, for example one that focused on model engines and model instances, could also be used to describe the ontology, and using a different layer grouping would not result in any major changes to the ontology concepts themselves.

Figure 3.

An overview of the Water Resources Component (WRC) ontology, describing the basic four layers and the superclasses of each layer.

3.1.1. Resources Layer

[28] The Resources layer has five superclasses that are collectively used to describe the component's digital resources (Figure 4). The Development Level class defines the component's development stage based on a four level development scheme proposed by Argent [2004]. These development levels range from Level I, signifying a model that is developed for research purposes, to Level IV, signifying a model used in planning and policy analysis. The Organization class identifies the agency or institute where the component is developed. Currently it consists of two subclasses, University and Company, but can easily be expanded to include other subclasses. The Organization class is related to the Developer class that stores information about the component's development team. Although not shown in Figure 4 due to space restrictions, both the Organization and the Developer classes include properties adopted from the Observation Data Model (ODM) [Horsburgh et al., 2008].

Figure 4.

Resource layer of the Water Resources Component (WRC) ontology, including classes and relationships.

[29] The Data class is divided into two subclasses: Data File and Data Value. The Data File has four subclasses: Geospatial, Tabular, Time Series, and Extensible Markup Language data. The Data Value class stores the numerical or categorial values used by the component. The relationship between the Component and Data classes can be one of three types: input, output, or associated data. Examples of associated data include model parameters or source code files. Identifying these existing data resources, and describing the exact format of the data document, could also enable components to utilize remote data sources in an automated manner. The Project class is used to define information about projects in which components are coupled to form a workflow. When a component is a part of a modeling workflow, it is also necessary to know where and how it is used, within that project including any specific project requirements.

[30] In addition to the class definitions themselves, another important feature of the WRC ontology expressed in Figure 4 are the relationships between classes. For example, the many-to-many relationship between individuals of the Component class and those of the Developer class means that every component must have one or more developers, and that each developer must develop at least one component. Also, because the component's development level classification will change during the component's life cycle, we use Sufficient and Necessary conditions to classify components based on their development level. Sufficient and Necessary conditions mean that, if an individual is a member of a class, then it must satisfy specific conditions, and likewise if any individual satisfies these conditions, then it must be a class member [Motik et al., 2009]. A data property is assigned to each individual of the component class to define its current development status. Each subclass of the Development Level class uses a Sufficient and Necessary condition to capture components with corresponding development status. For example, the condition for class Level-I states that each component that has a development status value equal to Level-I is considered a member of Development Level-I class. The Pellet reasoner will capture the new classification of components after any updates are made in the component properties.

3.1.2. Coupling Layer

[31] The Coupling Layer is designed to answer three questions about component coupling: What are the coupling standards used by the component? In which frameworks can the components be coupled? What is the computational resolution of the component? The Coupling Layer uses four classes to address these questions: (i) Modeling Framework, (ii) Standards Interface, (iii) Architecture, and (iv) Computational Resolution (Figure 5). The role of a Modeling Framework is to provide an environment for components to be coupled. In component-based modeling, a Modeling Framework couples components that adopt a specific Standards Interface and Architecture. A component can be used within a Modeling Framework if the Standards Interface and Architecture used are consistent with those supported by the Modeling Framework. A Modeling Framework is classified into two subclasses based on the allowed level of interaction between components: (i) Concurrent is where the framework allows components to communicate during the time horizon of the simulation and (ii) Sequential is where the framework allows components to communicate after the conclusion of the simulation time horizon.

Figure 5.

Coupling layer of the Water Resources Component (WRC) ontology, including classes and relationships.

[32] The Computational Resolution class covers both the temporal and spatial resolution of the component model. The Temporal Resolution class introduces the order of permissible operating time steps of the component to keep it numerical stability. The Spatial Resolution class is based on the CSDMS definition of model space resolution and includes two subclasses: (1) the Spatial Extent defines the smallest/largest grid size that the component can operate on while maintaining its numerical stability and (2) the Spatial Dimension specifies the model's ability to be solved in different dimensions (e.g., 1-D, 2-D). We expanded this to also include the Chau [2007] definition of the plane of dimension for the computational models (e.g., 2-D vertical). We use these definitions as individuals in the Spatial Dimension class. The computation resolution (temporal and spatial) min/max values are defined using a data type property in the OWL document attached to each individual component.

3.1.3. Scientific Layer

[33] The objective of the Scientific Layer is to describe the component's equations, I/O variables, parameters, purpose, and mathematical classification. The Equation class stores all the equations used by the component using the Math Markup Language version 3.0 (MathML, as a means for describing mathematical notation and capturing both its structure and content [Carlisle et al., 2009]. The information required to apply the equation is represented in the Assumption, Initial Condition, Boundary Condition, Equation Type, and Numerical Simulation classes, which are subclasses of the Equation (Figure 6). The first three subclasses define, as their names imply, the assumptions, initial conditions, and boundary conditions applied to the equation for use in the model. Individuals of these subclasses are stored as mathematical equations expressed in MathML that a machine can interpret and understand.

Figure 6.

Scientific layer of the Water Resources Component (WRC) ontology, including classes and relationships.

[34] As an example of how model assumptions can be expressed as mathematical relationships within the ontology, consider the case of a model that assumes nonaccelerating flow. This assumption can be expressed mathematically as δvt = 0 and is linked to the component model through the hasAssumption relationships established by the WRC ontology. Using this relationship, a model coupling framework could be designed to ensure that model assumptions are not violated within the framework. In the case of this example, the modeling framework would check that flow velocity within the model component does not change over the simulation time period. By expressing model assumptions within the ontology, model developers are able to focus on the model code itself rather than implementing assumptions and error checking routines that can be handled by a modeling framework and shared across modeling components.

[35] The Equation Type is divided into three subclasses covering all equation types, Algebraic, Differential, and Integral. The Differential class has subclasses describing the types and orders of differential equations (Ordinary or Partial). The Numerical Simulation class contains (i) the Numerical Technique describing the method used to discretize the differential equation in space (e.g., backward, forward, etc.) and (ii) the Time Difference Scheme defining how the differential numerical equation is discretized with respect to time (e.g., explicit, implicit, semi-implicit).

[36] The symbols used by the equations and components are grouped into the Symbol class, which classifies symbols into Variable, Parameter, or Constant classes. Each symbol must have a unique and unambiguous name and we anticipate that experts of each subdiscipline will use well known sources for standard names to define a symbol. For example, the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI, and its Hydrologic Information System (HIS, have developed an ontology to support the discovery of time series hydrologic observations including physical, chemical, and biological measurements [Beran and Piasecki, 2009] that could be used within the WRC ontology. The standard names in development within the CSDMS community could be used to describe I/O variables and assumptions related to a component model. Finally, the standard names defined by the NetCDF Climate and Forecast (CF) Metadata Convention and the U.S. Geological Survey Glossary of Hydrologic Terms are other sources for developing standard names. However, if standard names across multiple communities are used, then work must be done to remove redundancy by defining equivalent terms.

[37] Because symbols can be shared among many components using different equations, we use the 3-ary relationship to establish a definite relationship between component, symbol, and equation (Figure 7a). The 3-ary relationship links three individuals to describe a specific context where none of the individuals can be considered as the primary subject [Koivunen, 2001]. In the 3-ary relation shown in the figure, both the equation and variable are linked and point to the same component. One of the advantages of this approach is that it minimizes the semantic heterogeneity between components by providing a more complete description of variables (Figure 7b). For example, in the figure the two equations, Darcy's equation and a advective contaminant transport equation, have a common variable that would allow for coupling the two models. However, the variable has a different symbol within the two model components so it may not be obvious to the framework that the two models can be coupled. If both symbols are related to the same variable concept, it allows the framework to understand that the two components can be coupled.

Figure 7.

(a) Excerpt from the WRC ontology illustrating the 3-ary relationship between a component, equation, and variable. (b) Description of the pore water velocity variable, which is passed between the two coupled model components, using the WRC ontology.

[38] The design purpose of the component model is represented by the Water Resources Domain, which contains classes describing the water resources subsystems such as the Hydrology and Ground Water classes (Figure 6). Each subsystem class has subclasses representing the basic processes of the system. For example, the Evapotranspiration class “Is-A” Hydrology class which is “Is-A” Water Resources Domain class. A complete hierarchical structure of the classes representing the water resources subsystems is not defined in this ontology because it would need to be provided through a collaborative effort that included experts across a variety of water resources domains. Finally, the Mathematical Classification class is used to define how the variables are treated in space and time, and if they are deterministic or stochastic as described by Chow et al. [1988] (Figure 6). Therefore, the Mathematical Classification class is divided into two subclasses: Deterministic and Stochastic (Figure 6).

3.1.4. Technical Layer

[39] The Technical Layer answers questions about the required computer architecture that enables a user to (i) run a component simulation, (ii) edit or update the component code, (iii) determine the computational resources required by the component, and (iv) optimize the simulation time given available computational resources. This layer is based on the metadata used by the CSDMS to describe the technical requirements of a model and includes four classes (Figure 8). The Operating System class defines the different operating systems that are compatible with the component. The Programming Language class determines the language used in writing the component. This helps the user to know how to edit or update the component code. The Memory Required class is used to describe the required memory capacity for supporting a single component simulation. Finally, the Number Of Processors class includes elements representing the number of processors that the component is able to leverage.

Figure 8.

Technical layer of the Water Resources Component (WRC) ontology, including classes and relationships.

3.2. Example Applications

[40] Three simple example applications are presented to illustrate the capabilities of the WRC ontology for aiding in component-based water resources modeling. The examples assume a hypothetical case where a user wants to estimate stream discharge through the outlet of a given study watershed. We further assume that the user has access to hourly temperature and rainfall observations for the study region, as well as typical watershed geospatial data (e.g., Digital Elevation Model (DEM), soil properties, and land cover). Lastly, we assume that the user wishes to only model evapotranspiration and runoff processes for the purposes of the study. Given these assumptions, the following example applications demonstrate how the ontology can be used to guide the user in (i) identifying appropriate model components, (ii) correctly linking model components into a workflow, and (iii) utilizing model components from a discipline other than the modeler's primary discipline.

3.2.1. Model Component Selection

[41] Searching for an appropriate model can be a difficult task, especially when using a component-based modeling framework that may contain models which are not in the core area of expertise of the model user. The WRC ontology can assist in guiding users to the appropriate model component by relating needs expressed by the user with concepts expressed in the ontology. In the hypothetical scenario explained earlier, the user wishes to identify two model components: one that computes evapotranspiration and a second that computes runoff. In relation to the ontology, we can say that the evapotranspiration component must meet the following criteria: (i) the hasPurpose property points to the Evapotranspiration class, (ii) the hasInputVariable property links the component to the temperature variable only, and (iii) the hasOutputVariable property points to variables that have isUsedAsInput property with runoff components.

[42] Figure 9 shows the coded information and relationships for three components that meet the first criterion (hasPurpose = Evapotranspiration): Penman-Monteith, Priestley-Taylor, and Hargreaves-Sarnani. While all three of these components can be used to calculate evapotranspiration, they use different inputs as shown in Figure 9. Taking the second criterion into account (hasInputVariable = temperature), the WRC ontology is used to further filter the list of possible components to just one component: Hargreaves-Sarnani. Finally, the third criterion is used to check against the ontology that the Hargreaves-Sarnani can be coupled with a runoff model, based on the compatibility of the data exchanges between the two components (i.e., that the input of each runoff component is the output of the evapotranspiration component) and their coupling attributes (i.e., components use the same standard interface).

Figure 9.

Excerpt from the WRC ontology illustrating the stored information about three evapotranspiration interchangeable components. This information can help users in the process of selecting the appropriate component to represent the evapotranspiration process for a given study.

[43] The second component required by the user, the runoff component, has to satisfy three criteria: (i) the hasInputVariable property points to two variables only, evapotranspiration and precipitation rates, (ii) this evapotranspiration rate is the hasOutputVariable of one of the previously named evapotranspiration components, and (iii) the hasInputData property is linked to the typical watershed data (DEM, soil properties, and land use). In this example, the two components used to calculate runoff are (i) a component that leverages the Green-Ampt Model for infiltration and (ii) a component that implements TOPMODEL [Beven, 2001]. The WRC ontology filters the runoff component list and selects the Green-Ampt component because it meets the two criterions and uses Potential Evapotranspiration (PET) rate as the input variable, which can be supplied by the Hargreaves-Sarnani component. The TOPMODEL component, on the other hand, only accepts standardized reference Evapotranspiration (ETsz) as input (Figure 9).

[44] Filtering of components based on a set of criteria like this can be easily automated by designing a software application that translates the user requirements into ontology querying statements, such as the questionnaire provided by Chau [2007] for guiding users to a recommended water quality model. Also, this example can be extended to highlight the advantages of the WRC ontology in overcoming search challenges such as inconsistent semantics used by different modeling communities. For example, suppose users from two different disciplines use different terms––“stream outflow” and “stream discharge”––to search for a component that computes the same variable. In the WRC ontology, semantic mediation can occur by having components that calculate either “stream discharge” or “stream outflow” grouped together because they both represent the same concept within the ontology. Therefore, the problem of differences in semantics between disciplines that may limit more basic text matching mechanisms for search can be overcoming using the ontology [Grossman and Frieder, 2004]. Ideally, the definition of variables in WRC ontology will leverage and extend when necessary other initiatives for defining hydrologic variables including work under the Consortium of Universities for the Advancement in Hydrologic Science, Inc. (CUAHSI) Hydrologic Information System (HIS) project [e.g., Piasecki and Bermudez, 2003; Beran and Piasecki, 2009; Horsburgh et al., 2009].

3.2.2. Component Coupling Consistency

[45] One of the challenging aspects of component-based modeling is that, while coupling two components may be technically feasible, it may not be conceptually correct. Examples of such situations are presented by Voinov and Shugart [2013] and include the potential of temporal mismatched scales between coupled components. The axioms in the WRC ontology can be used to automate the compatibility check between components proposed by the user to be coupled. This facilitates the coupling process, minimizes conceptual errors, and encourages proper use of both models and data.

[46] To extend our scenario, suppose that the user wishes to couple the Hargreaves-Sarnani component with the TopModel component due to the topographic features of the watershed. As we discussed in the prior scenario, this coupling would not be correct because of inconsistencies in the data exchanges between the two components, namely that the Hargreaves-Sarnani component supplies PET and the TopModel component accepts ETsz (Figure 9). This is a subtle difference that may not be obvious to a nonexpert. The WRC ontology can be used to avoid such errors by checking the consistency of technical, coupling, and scientific features of these components inform the user about inconsistencies, and even suggest approaches for overcoming inconsistencies.

[47] To do this in a general sense, the software would perform two checks. The first check would be at a broad level to test the technical and coupling feasibility of the two components based on the information stored in the ontological layers. In the case of incompatibility, the check would raise an error and the ontology would be used to provide the user with the reason for the coupling error. The second check would be at a fine level to test the scientific properties of the linked components in order to verify that both components are compatible in terms of the space and time context, as well as the exchange of variables. In the case of inconsistency, a recommendation could be offered to the user such as changing the time step of one component or providing the required unit conversion factor between the two variables.

[48] In the example scenario where the Hargreaves-Sarnani cannot be coupled with the TopModel, either the Hargreaves-Sarnani component must be modified so that it outputs ETsz or the TopModel component must be modified so that it inputs PET. A third option for overcoming the inconsistency would be to create a translator between the two components that simply multiplies the ETsz by a crop coefficient (Kc) to estimate the PET, thereby adapting the output of the Hargreaves-Sarnani component to make it consistent with the required input of the TopModel component. The relationship between ETsz and (Kc) to estimate the PET can be expressed in the WRC ontology itself as depicted in Figure 10.

Figure 10.

Ontological representation of the relationship between two variables, the Potential Evapotranspiration (PET) rate and the standardized reference Evapotranspiration (ETsz), in the Water Resources Component (WRC) ontology, and the conversion factor between them.

3.2.3. Multidisciplinary Modeling

[49] One of the objectives of the WRC ontology is to enhance the component metadata interoperability between modeling frameworks. As an example of this, the WRC ontology could be used to overcome semantic and syntactic heterogeneity between metadata schemas used by different frameworks. Metadata interoperability aids the user in correctly applying a component from a modeling framework used by a different scientific community because the metadata of the component for the unfamiliar framework can be mapped to the ontology used by the familiar modeling framework.

[50] To illustrate this concept, we focused on two modeling frameworks in the following example. The first, HydroModeler, is a plug-in application of the CUAHSI-HIS that extends the core HydroDesktop application to support model integration using the component-based approach [Castronova and Goodall, 2010]. It uses the Open Modeling Interface (OpenMI) standard and OpenMI Association Technical Committee (OATC) Software Development Kit (SDK) to provide a “plug-and-play” modeling framework within HydroDesktop. The second, Community Surface Dynamics Modeling System (CSDMS), uses the Common Component Architecture (CCA) to provide a plug-and-play environment for components [Peckham et al., 2012]. Both frameworks use an XML schema that describes the component's exchange variables metadata [Peckham et al., 2012; Castronova and Goodall, 2010], but with different syntactic structure.

[51] As an extension to our scenario, suppose that the user finds a component model in the CSDMS framework that calculates runoff based on infiltration model that uses Richard's Equation. The user is interested in including this component in a HydroModeler application, but the user is not familiar with the CSDMS and wishes to implement the same component in the HydroModeler framework. Using the metadata available in CSDMS framework, the user is able to populate the WRC ontology for the component. This allows the user to visualize the model metadata (e.g., equations, assumptions, initial conditions, etc.) in a way that is consistent with how native HydroModeler components are visualized. Eventually an ontology could be used to express complete models as abstract entities away from the technical means for implementation of the model as a software tool. Scientists would then express models as an ontology that could be implemented across modeling frameworks using software to automate the process of translating models expressed as ontologies into software components for specific modeling frameworks.

4. Summary

[52] A well-designed and broadly accepted ontology is needed to support current trends toward multidisciplinary, community-focused modeling of water resource systems. If such an ontology can be established, it has the potential to elevate component-based water resources modeling to a level of abstraction where modeling becomes a knowledge representation processes. It will be possible then to use reasoners to automate classification, comparison, and search of models and their related elements. Such functionality is needed to support proper use of models both within and across disciplinary boundaries.

[53] We have taken steps along this path by creating the WRC ontology presented in this paper. The ontology attempts to serve as a knowledge-level specification for the joint conceptualization used in defining model components across disciplinary boundaries and frameworks. In creating the WRC ontology, we have seen that experts view components from different perspectives. The WRC ontology is an attempt to provide the environment for representing these perspectives as layers defining component resources, coupling, scientific, and technical information using a formal ontology language (OWL). The ontology defines concepts associated with the component in an explicit format that is readable for both the user and the system.

[54] In establishing the WRC ontology, our goal was to satisfy the five primary characteristics of a robust ontology defined by Gruber [1993]: clarity, coherence, extendability, flexibility to merge, ability to overcome semantic heterogeneity. We described in this paper our approach for achieving these goals including (i) introducing examples to illustrate the classes and relationships definitions, (ii) checking the consistency of concepts and relationships using the pellet reasoner, (iii) increasing the WRC ontology extendability by minimizing the ontology commitment and encoding bias as recommended by Gruber [1993], (iv) merging with existing ontologies (e.g., SWEET Unit ontology), and (v) emphasizing through selected example applications how WRC ontology can help users in overcoming the semantic heterogeneity between water resources related modeling communities. However, we acknowledge that our work is only a first step in building a robust model ontology to support component-based modeling within the water resources communities. It will not be possible to conclude that the WRC ontology has satisfied all five of these characteristics (e.g., “clear structure”) until the ontology has gone through additional testing from a wider set of users.

[55] While the ontology will no doubt benefit heavily from additional use case testing and from seeking community input from a wide set of potential users, this input will not result in a static and perfect ontology. Ontologies are like models in that, even when an ontology is agreed upon, revisions will be required to compensate for unforeseen conditions and new knowledge. As the underlying conceptual model evolves, new versions of the ontology can be released, and presumably the ontology will grow to become an increasingly robust means for representing component-based water resources models. Therefore, while we are certain that this proposed ontology will be revised in the future, we believe that this revision process must begin with a beginning ontology, which we have provided here.

[56] Future work should focus on conducting further tests of the ontology including a broad set of users and through incorporation into multiple component-based modeling systems. Ways that the ontology can be incorporated into such software systems is one area of future work and potential paths forward include: (i) developing software to guide users' in the model building process by interactive questionnaires, especially when using components outside of the user's core domain of expertise, (ii) automating the consistency check process of coupling components in different modeling frameworks, and (iii) identifying existing sources of data, understanding their composition, and automating data processing steps. In general, we see great potential in combining data and model ontologies in order to improve the connection between data resources and required model input data sets. Also, using ontologies can provide intelligence to modeling frameworks by providing a description of information in a way that enables the framework to search, discover, and couple components. Work in this area will benefit hydrology by aiding in the tasks required for multidisciplinary, community-based studies.


[57] The authors wish to acknowledge the National Science Foundation (NSF) for supporting this research under the award CBET: 08–46244 “CAREER: Integrated Modeling for Watershed Management.” The authors also wish to acknowledge Michael N. Huhns, Chair of the Computer Science and Engineering Department at the University of South Carolina, for his helpful guidance and comments on the paper.