SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Related Work
  5. Semantic Description Matching
  6. Matching Method
  7. Creation of Atomic Queries
  8. Generalization of Query Classes
  9. Generalization of Query Properties
  10. Addition of Rules
  11. Creation of Instances Implied by Complex Class Definition
  12. Semi-automatic Pruning of Matching Results
  13. Conclusions
  14. Acknowledgements
  15. References

Researchers begin new research by acquiring pre-existing explicit scientific knowledge that is potentially relevant to the research subject. In order to find some potentially relevant explicit scientific knowledge items, such as knowledge whose content is similar to the targeted research, a researcher must examine the semantics of each item. In this paper, after reviewing related work, an automated semantic description matching-based approach is presented for comparing items of explicit scientific knowledge. This approach obtains a matching score between semantic descriptions of two items of explicit scientific knowledge that indicates their similarity. Three dimensions are considered in this approach: matching granularity, similarity scale for instance classes, and logic similarity scale. In order to match two semantic descriptions, a six-step method is presented: creation of atomic queries, generalization of query classes, generalization of query properties, addition of rules, creation of instances implied by complex class definition, and semi-automatic pruning of matching results. Finally, some conclusions regarding the approach are presented together with plans for future work.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Related Work
  5. Semantic Description Matching
  6. Matching Method
  7. Creation of Atomic Queries
  8. Generalization of Query Classes
  9. Generalization of Query Properties
  10. Addition of Rules
  11. Creation of Instances Implied by Complex Class Definition
  12. Semi-automatic Pruning of Matching Results
  13. Conclusions
  14. Acknowledgements
  15. References

Explicit scientific knowledge (e.g. research papers, conference proceedings, workshop reports, presentations, and databases) is both the main deliverable and a major resource for scientific researchers. Researchers transform tacit knowledge such as inferences from experimental results into explicit scientific knowledge deliverables that contribute to the human knowledge repository. Other researchers can then draw items of knowledge from this repository and merge them with their own knowledge for furthering their own research work. This leads to the production of new explicit knowledge. The result is a continuous process of enlargement of the human knowledge repository and the development of human society.

More concretely, researchers starting a new research activity need to look for items of explicit scientific knowledge with similar content to help them avoid “reinventing the wheel”. For small numbers of explicit scientific knowledge, a researcher can examine each to find similarities. However, this manual approach does not scale well to large repositories of explicit knowledge. Information technologies are expected to play a critical role to support this task. A key problem is how to automatically calculate the similarity of pairs of explicit scientific knowledge. If it is possible to make accurate similarity assessments between pairs of explicit scientific knowledge automatically, then we can repeat the calculation in order to obtain the similarities within a large knowledge repository. This paper focuses on the problem of automatic similarity assessment and presents an approach based on logic inference.

Our basic approach is as follows. First, we describe each item of explicit scientific knowledge in a format that can be interpreted semantically by a computer. We have developed the Expert Knowledge Ontology-based Semantic Search (EKOSS) system to support the sharing, discovering, and integration of scientific and other types of expert knowledge that are not amendable to simple key word indexing. Through EKOSS (Kraines et al, 2005; Kraines et al, 2006a, Kraines et al, 2006b), researchers are empowered to create by themselves semantic descriptions of their explicit scientific knowledge in a computer-interpretable format. The semantic description takes the form of a network of labeled instances of semantic classes that are interconnected by semantic properties. The semantic classes and properties are formally defined in a domain ontology based on description logics (DL) that has been built for that particular scientific field. We then calculate a matching score (from 0 to 100) that expresses the degree of similarity between the pair of knowledge items by comparing the two semantic descriptions that represent them. This paper focuses on how to calculate the match score between two semantic descriptions.

The rest of this paper is organized as follows. In the next section, we discuss some of the work in the literature related to the semantic matching. In section 3, we analyze the problem of semantic description matching and propose three dimensions of matching: matching granularity, similarity scale for instance classes, and logic similarity scale. In section 4, we present a six-step method to implement semantic description matching. The first five steps yield a matching score that indicates the similarity of the two semantic descriptions. We describe an interactive tool that is used by the researcher to view and manipulate the details of the matching result, which comprises the sixth step. Finally, we conclude the paper with a summary and a description of future work.

Related Work

  1. Top of page
  2. Abstract
  3. Introduction
  4. Related Work
  5. Semantic Description Matching
  6. Matching Method
  7. Creation of Atomic Queries
  8. Generalization of Query Classes
  9. Generalization of Query Properties
  10. Addition of Rules
  11. Creation of Instances Implied by Complex Class Definition
  12. Semi-automatic Pruning of Matching Results
  13. Conclusions
  14. Acknowledgements
  15. References

Matching is a critical operation in many application domains, such as the semantic web, data warehouses, and e-commerce. One application is catalog integration (Bouquet et al., 2003; Velegrakis et al., 2005). When a company chooses to participate in some marketplace, it must determine matches between the entries of its own catalogs and the entries of the marketplace catalog. Another application is the exchange information between peers in P2P networks (Lenzerini, 2002; Giunchiglia & Zaihrayeu, 2002). Each peer potentially has a unique language for describing information, so the peers need to identify matches between the terms in each of their languages. Message exchanges between two autonomous and independently designed agents also require matching (van Eijk et al., 2001). Agents use their own agent communication languages (ACL's) to create messages. Agents that are using different ACL's must match the elements of their languages in order to translate the message. Additionally, matching plays a central role in discovery and integration of web services (Paolucci et al., 2002). Web services describe their services using their own schemata, so web service discovery and integration needs some way to match the different schemata. All these matching applications have the same basic goal of translating semantic descriptions from one concept model to another. This is different from the goal in the work that we present here, which is to match descriptions of explicit scientific knowledge that are made under the same concept model.

Several kinds of solutions to the semantic matching problem have been proposed in the field of knowledge representation, e.g. (Bouquet et al., 2003; Velegrakis et al., 2005; Lenzerini, 2002; Giunchiglia & Zaihrayeu, 2002; Giunchiglia et al., 2004; van Eijk et al., 2001; Paolucci et al., 2002; Do & Rahm, 2001; Doan et al., 2003; Kang & Naughton, 2003; Shvaiko & Euzenat, 2004). Many of these are methods and tools for matching different schemata or ontologies. We can divide them into two kinds: element matching and structure matching. For element matching, the main methods are string-based methods (Do & Rahm, 2001) and language-based methods (Giunchiglia et al., 2004). For structure matching, the main methods are graph-based methods (Do & Rahm, 2001) and taxonomy-based methods (Shvaiko & Euzenat, 2004). Element matching methods can find instances with similar labels or meaning, but they cannot find similar structural patterns between elements in a semantic description. Structure matching methods can find similar structures in data such as graphs, but they cannot find similar semantics.

Our approach differs from these existing solutions in that it is based on reasoning against semantic descriptions. For example, our approach uses “addition of rules” and “creation of instances implied by complex class definition” methods to get matching results for descriptions of explicit scientific knowledge that are more semantically accurate than can be obtained using either the element matching methods or the structure matching methods (see Section 3).

Semantic Description Matching

  1. Top of page
  2. Abstract
  3. Introduction
  4. Related Work
  5. Semantic Description Matching
  6. Matching Method
  7. Creation of Atomic Queries
  8. Generalization of Query Classes
  9. Generalization of Query Properties
  10. Addition of Rules
  11. Creation of Instances Implied by Complex Class Definition
  12. Semi-automatic Pruning of Matching Results
  13. Conclusions
  14. Acknowledgements
  15. References

A semantic description describes an item of explicit scientific knowledge in a computer-interpretable way using a domain ontology. The EKOSS system provides users with “wizard-like” web interfaces, including an “ontology browser” and a “graph viewer”, to help them to create semantic descriptions (Kraines et al., 2006b). Figure 1 shows an example of a graph view of a semantic description for the following natural text: “In the city of Tokyo, an electric train participated in a land shipping activity that transported some vehicles with big wheels and engines that use high-octane gasoline as fuel. The train collided with something on the track”. In the figure, instances are indicated with boxes where the first line of text gives the instance name and the second line of text gives the instance class. The different shapes and line types of the boxes indicate different upper-level classes: boxes with rounded corners are activities, boxes with hexagons are events, boxes with dashed lines are substances, boxes with double lines are spatial locations, and boxes with solid lines are physical objects. Properties are shown by directed arrows labeled with the property name.

thumbnail image

Figure 1. Graph view of first semantic description.

Download figure to PowerPoint

thumbnail image

Figure 2. Graph view of second semantic description.

Download figure to PowerPoint

Figure 2 shows another example for the following natural text: “A freight lorry was transporting light-weight vehicles with engines in the city of Columbus. The freight lorry collided with a low-rise apartment building under an old road bridge”. The symbols in figure 2 are the same as in figure 1.

We use the two examples shown in Figures 1 and 2 to illustrate our method for semantic description matching in this paper. These examples are simpler than most semantic descriptions of explicit scientific knowledge, such as the findings described in a research paper. However, even these simple examples illustrate how each of the steps in our method take advantage of the rich semantics provided by the use of the ontology, including multiple inheritance class hierarchies, multiple properties for relating different kinds of classes, different logical characteristics of properties, use of DL to specify class characteristics, and use of rules to augment DL reasoning. The reader is encouraged to visit the EKOSS website to view examples of semantic descriptions created for actual items of explicit scientific knowledge such as research papers and simulation models (www.ekoss.org).

As shown in figures 1 and 2, a semantic description can be represented as a network graph whose nodes are instances of classes in the ontology, and whose arcs are properties that indicate specific semantic relations between the instances. The ontologies in EKOSS, which are specified using the OWL-DL web ontology language (www.w3.org/TR/owl-guide), provide the language for expressing concepts in the particular domain.

This is referred to as the TBox or terminological component in DL. A semantic description in EKOSS is then constructed as an ABox or assertional component, also using OWL-DL. In particular, both ontologies and semantic descriptions in EKOSS can support object and data type properties, property restrictions (including cardinality restriction, universal value restriction and existential value restriction), and property characteristics (including transitive, symmetric, inverse of, functional, and inverse functional). The SCINTENG ontology, which was used to create the examples shown in figures 1 and 2, makes considerable use of all of these types of logical characteristics in class and property definition. Details of the SCINTENG ontology are given on the EKOSS website (www.ekoss.org).

We define semantic description matching as the process of comparing two semantic descriptions in order to find the overlapping subsets of both descriptions. In the matching process, one semantic description is the target description and the other is the search description. The overlapping subsets include the instances of the target description that have matching instances in the search description according to the conditions of the semantic description matching process. The matching process is directed in the sense that two semantic descriptions will typically have different matching scores depending on which is selected as the search description.

We propose that the following three main aspects are important for semantic description matching.

Matching granularity: The problem of determining whether or not two semantic descriptions match can be evaluated using reasoning software such as RacerPro (www.racer-systems.com) by adding one of the semantic descriptions, the target description, into the ABox in the reasoner's knowledge base and evaluating the other semantic description, the search description, as a query to that knowledge base. However, reasoning software that matches a semantic query with a semantic description can only tell if the semantic query is completely satisfied by the semantic description and cannot tell if a part of the semantic query is satisfied by the semantic description. In order to evaluate partial matches from overlapping subsets of the semantic descriptions, we can split the search description into atomic semantic queries that represent the minimum semantic network scale that must match in order for there to be considered at least a partial match between the semantic descriptions. We call the size of this minimum chain of instances and properties comprising an atomic semantic query the “matching granularity”.

By dividing the search description into atomic semantic queries with chain lengths equal to the matching granularity and then matching each atomic semantic query with the target semantic description, we can evaluate partial matches as the fraction of atomic queries that were found to match. If we require that the matching granularity be the size of the entire search description, a match will only occur when all of the instances and properties in the search description match against instances and properties in the target description. At the other extreme, if we let the matching granularity be a single instance, then the matching process is equivalent to a keyword or thesaurus-type matching that ignores the properties given in each description. Different situations may have different optimal matching granularities.

Similarity scale for instance classes: Each instance in a semantic description belongs to a single class in the ontology. An instance from the search description that is represented by one of the classes in an atomic semantic query will match with any instance in the target description whose class is the same or is a subclass of, and therefore is subsumed by, the query class. Therefore, if the class in the query is a highly specific class that is low in the ontology subsumption taxonomy, relatively few matching instances will be found. Often we will want to discover a match between two instances that are of similar classes even if there is no subsumption relationship. For example, the SCINTENG ontology contains the classes “truck”, “passenger car” and “building”. The classes “truck” and “passenger car” are both direct subclasses of the class “wheeled vehicle with engine”. Although “truck” and “passenger car” are certainly more similar than “truck” and “building”, the class “truck” in a semantic query will not match with an instance of “passenger car” in a knowledge base ABox, nor will the reverse match occur, because the two classes do not have a subsumption relationship. However, if we change the class in the search query to the subsuming super class, i.e. change “truck” to “wheeled vehicle with engine”, then a match will be inferred from the subsumption relationship of “wheeled vehicle with engine” and “passenger car”.

We can determine which instance classes to change based on the requirements of the semantic description matching application. The most precise matching results will be given when we keep the original classes. However, if we want to increase the recall of the semantic description matching in order to get more matching results, we can choose to change all query classes to some super class (with some logic to decide what to do in the case of multiple inheritance) or use a class-specific approach where the particular query class is changed based on the properties of that class (for example, changing any subclass of “vehicle” to “vehicle”). Different approaches for changing classes can be combined by using multiple versions of the atomic queries with each type of class modification including the case where the original classes are used, but the costs of computing the matches must be weighed against the benefits of increasing the number of queries.

Logic similarity scale: The semantic descriptions are created based on DL, so semantic description matching can make use of DL reasoning in order to increase recall. However, additional forms of reasoning, such as rule-based reasoning or reasoning about equivalent classes, can also be included. Through rule-based reasoning, we can increase the number of matching results based on special relationships between the properties of the ontologies. For example, consider the description “vehicle1 has part engine1, vehicle1 has location city1” and the query “engine2 has location city2”. From inspection, it is clear that both descriptions include engines that are located in cities. Simply using the transitivity of has part and has location, we cannot infer a match. However, if we use the rule “if A has part B and A has location C, then B has location C”, then the query will match the description because the rule will cause the reasoner to infer the relationship “engine1 has location city1”.

We can also find more matching results by temporarily adding instances to the reasoner's knowledge base that are implied to exist through the complex class definitions given by DL. For example, consider a query for a “vehicle” that has part “engine” being matched against a description containing an instance of the class “vehicle with engine”. Because “vehicle with engine” is defined in the SCINTENG ontology as a subclass of “vehicle” that has part some values from “engine” (there is an existential restriction on the definition of the class “vehicle with engine” requiring that some values of “has part” point to instances of the class “engine”), an instance of “engine” is implied to exist that is part of the instance of “vehicle with engine”. By adding the implied instance of “engine” to the description with an “is part of” relationship with the “vehicle with engine”, the query will match.

Matching Method

  1. Top of page
  2. Abstract
  3. Introduction
  4. Related Work
  5. Semantic Description Matching
  6. Matching Method
  7. Creation of Atomic Queries
  8. Generalization of Query Classes
  9. Generalization of Query Properties
  10. Addition of Rules
  11. Creation of Instances Implied by Complex Class Definition
  12. Semi-automatic Pruning of Matching Results
  13. Conclusions
  14. Acknowledgements
  15. References

In the research presented here, we want to obtain as many interesting semantic description matching results as possible. Therefore, we set the “matching granularity”, the “similarity scale for instance classes”, and the “logic similarity scale” as follows. We use a “matching granularity” of two instances with one linking property in order to find the most partial matches possible while preserving the relationship information between the instances. We use a “similarity scale for instance classes” large enough such that the semantic coverage of our matching process is increased but not so large that we end up with meaningless matching results between top level classes in the ontology. We use a large “logic similarity scale” to find as many interesting semantic matching results as possible.

The six steps of the matching method that we have proposed are as follows. In the first step “creation of atomic queries”, we decompose the search semantic description into all of its constituent triples. Each triple forms one atomic semantic query that is matched against the target semantic description by an inference engine using DL reasoning. The percentage of atomic queries that match indicates the similarity of the two semantic descriptions. The second step, “generalization of query classes”, replaces the classes of the two instances in each atomic semantic query with upper level classes that are a pre-determined distance up the subsumption (is a) hierarchy given by the ontology. This step expands the semantic coverage of the atomic semantic queries derived from the search description. Similarly, the third step, “generalization of query properties”, replaces the property of each atomic semantic query with a super property that is a pre-determined distance up the property subsumption hierarchy in order to further expand the semantic coverage of the atomic semantic queries. In the fourth step, “addition of rules”, we augment the DL reasoning with a set of prescribed rules in order to infer additional matching results that cannot be determined by logic-based inference alone. In the fifth step, “creation of instances implied by complex class definition”, instances implied by complex class definitions in the target description are temporarily added to the ABox. For the sixth step, “semi-automatic pruning of matching results”, we have developed a “matching chooser” tool that helps users to choose from multiple possible matches of a particular instance in the target semantic description with each of the classes in the atomic semantic queries in order to establish a completely resolved overlapping semantic network between the two semantic descriptions.

Details for each of the six steps are described next. We use the two examples in Section 3 to illustrate. In all inference evaluations, RacerPro (www.racer-systems.com) is used as the basic reasoner.

Creation of Atomic Queries

  1. Top of page
  2. Abstract
  3. Introduction
  4. Related Work
  5. Semantic Description Matching
  6. Matching Method
  7. Creation of Atomic Queries
  8. Generalization of Query Classes
  9. Generalization of Query Properties
  10. Addition of Rules
  11. Creation of Instances Implied by Complex Class Definition
  12. Semi-automatic Pruning of Matching Results
  13. Conclusions
  14. Acknowledgements
  15. References

First, we decompose the search description into atomic semantic sentences. As a result of the matching granularity that we have selected, each property with its domain instance and range instance forms one atomic semantic sentence. Second, we convert each atomic semantic sentence to an atomic semantic query. Third, we match each atomic semantic query with the target description using the reasoner. Fourth, we calculate the total score for the match as the percentage of the atomic semantic queries created from the search description that match with some set of instances in the target description. If there are a total of x atomic semantic queries, and y atomic semantic queries match with the target description, then we can compute the matching score as (100*y/x). We use the semantic description in figure 1 as the search description and the semantic description in figure 2 as the target description. Figure 3 shows all of the atomic semantic queries generated from the search description, where the symbols are the same as those in figure 1.

thumbnail image

Figure 3. All of the atomic semantic queries generated from the search description.

Download figure to PowerPoint

Generalization of Query Classes

  1. Top of page
  2. Abstract
  3. Introduction
  4. Related Work
  5. Semantic Description Matching
  6. Matching Method
  7. Creation of Atomic Queries
  8. Generalization of Query Classes
  9. Generalization of Query Properties
  10. Addition of Rules
  11. Creation of Instances Implied by Complex Class Definition
  12. Semi-automatic Pruning of Matching Results
  13. Conclusions
  14. Acknowledgements
  15. References

In order to increase the number of potentially interesting matching results, we change some of the classes in the atomic semantic queries to their super classes. We use pre-established mappings for generalizing from lower level classes to higher level classes. Figure 4 shows one set of generalization mappings for classes in the “activity” sub-tree of the SCINTENG ontology.

thumbnail image

Figure 4. Some generalization mappings for query classes.

Download figure to PowerPoint

Figure 5 shows an example of a match that was made possible by generalization of query classes. The boxes in the figure show sets of matching instances, where the upper instance in black type is from the search query and the lower instance in grey type is from the target description. The search description has the atomic semantic sentence “electric train is activity participant of shipping big wheel vehicles”, where “electric train” is an instance of “train” and “shipping big wheel vehicles” is an instance of “land shipping activity”. The target description includes the predicate “freight lorry is actor of transporting vehicles”, where “freight lorry” is an instance of “truck” and “transporting vehicles” is an instance of “land transportation”. By replacing the class of the “electric train” instance with “vehicle”, a super class of both “train” and “truck”, the “electric train” instance matches with the “freight lorry” instance. By replacing the class of the “shipping big wheel vehicles” instance with “transportation activity”, a super class of both “land shipping activity” and “land transportation”, the “shipping big wheel vehicles” instance matches with the “transporting vehicles” instance. The property “actor of” is subsumed by “activity participant of”.

thumbnail image

Figure 5. An example of generalization of query classes.

Download figure to PowerPoint

Generalization of Query Properties

  1. Top of page
  2. Abstract
  3. Introduction
  4. Related Work
  5. Semantic Description Matching
  6. Matching Method
  7. Creation of Atomic Queries
  8. Generalization of Query Classes
  9. Generalization of Query Properties
  10. Addition of Rules
  11. Creation of Instances Implied by Complex Class Definition
  12. Semi-automatic Pruning of Matching Results
  13. Conclusions
  14. Acknowledgements
  15. References

We can also replace properties in the atomic semantic queries with their super properties in order to expand the scope of matching and find more potentially interesting matching results. As with the generalization of the query classes, we change properties according to pre-established mappings of lower level properties to upper level properties.

Figure 6 shows an example of a match that was made possible by generalization of query properties, where the symbols are as described in figure 5. The search description has the atomic semantic sentence “Collision event on the track has affected event participant electric train”. The target description has the predicate “Collision with building has event participant freight lorry”. If we change the property “has affected event participant” to its super property “has event participant”, then we obtain a match.

thumbnail image

Figure 6. An example of generalization of query properties.

Download figure to PowerPoint

Addition of Rules

  1. Top of page
  2. Abstract
  3. Introduction
  4. Related Work
  5. Semantic Description Matching
  6. Matching Method
  7. Creation of Atomic Queries
  8. Generalization of Query Classes
  9. Generalization of Query Properties
  10. Addition of Rules
  11. Creation of Instances Implied by Complex Class Definition
  12. Semi-automatic Pruning of Matching Results
  13. Conclusions
  14. Acknowledgements
  15. References

In order to get more interesting matching results, we use rules to entail additional predicates in the target description. Different ontologies can be given different sets of rules. The SCINTENG ontology contains rules such as those shown in Table 1.

Table 1. Some rules of the SCINTENG ontology.
Rules
IF A has location B and A in physical contact with C, THEN C has location B
IF A has location B and A activity participant of C, THEN C has location B
IF A has location B and A is part of C, THEN C has location B
IF A has location B and A event participant of C, THEN C has location B
IF A has location B and A is sub-activity of C, THEN C has location B
IF A has location B and A is end event of C, THEN C has location B
IF A has location B and A is start event of C, THEN C has location B
IF A has location B and A physically contained by C, THEN C has location B

The rules are used to entail additional predicates to the target description before executing the actual matching step by the DL reasoner. For example, the target description has the predicates: “Freight lorry is actor of transporting vehicles” and “transporting vehicles has location Columbus”. The rule “if A has location B and A is activity participant of C, then C has location B” will cause the new predicate “Freight lorry has location Columbus” to be temporarily added to the ABox because “is actor of” is subsumed by “is activity participant of”. The atomic semantic query “Tokyo is location of electric train” will then match with the target description, as shown in figure 7. The symbols in figure 7 are as described in figure 5.

Creation of Instances Implied by Complex Class Definition

  1. Top of page
  2. Abstract
  3. Introduction
  4. Related Work
  5. Semantic Description Matching
  6. Matching Method
  7. Creation of Atomic Queries
  8. Generalization of Query Classes
  9. Generalization of Query Properties
  10. Addition of Rules
  11. Creation of Instances Implied by Complex Class Definition
  12. Semi-automatic Pruning of Matching Results
  13. Conclusions
  14. Acknowledgements
  15. References

In DL ontologies, complex classes can be described as being subclasses of some subsuming class together with some logical restriction conditions. For example, in the SCINTENG ontology, the class “vehicle with engine” is defined as “a vehicle that has part at least one instance of the class engine”. In another words, an instance of the class “vehicle with engine” is an instance of the class “vehicle” that has the further condition of a “has part” property connecting it to some instance of “engine” (although this may not explicitly be stated in the semantic description). The reverse is not necessarily correct, that is an instance of “vehicle” that “has part” an instance of “engine” is not necessarily an instance of “vehicle with engine” because we are using subsumption rather than class equivalency to define complex classes.

For example, in the target description shown in figure 2, there is an instance “light-weight vehicle with engine” of the class “vehicle with engine”. One of the atomic semantic queries created from the search description contains an instance “big wheel vehicle” of the class “vehicle” that “has part” an instance of the class “engine” (figure 3). Unfortunately, the basic implementation of the semantic description matching that we have described requires a binding of each class in the atomic semantic query with a class instance in the target description. Because there is not actually an instance of “engine” in the target description, the query for a “vehicle” that “has part” an instance of “engine” will not be found to match with a target description containing only an instance of “vehicle with engine”.

In order to generate these matches, the “creation of instances implied by complex class definition” step temporarily adds instances for each of the existential property restrictions (some values from) on a complex class. An instance of the class “vehicle with engine” is defined to have some values from the class “engine” on the property “has part”. Therefore, a temporary instance of engine, labeled “??”, is created with a “has part” relationship with the “light-weight vehicle with engine” in the target description. The atomic semantic query from the search description “big wheel vehicle has part engine” will then match with the instance in the target description “light-weight vehicle with engine”. Figure 8 shows the matching result that is made possible by using this method, where the symbols are as described in figure 5.

thumbnail image

Figure 7. An example of rule-based matching.

Download figure to PowerPoint

thumbnail image

Figure 8. An example of a matching result with complex classes.

Download figure to PowerPoint

After the above five steps, the atomic semantic queries from the search description are matched against the target description using the reasoner and the matching score is calculated. For this example, the matching score for the search description shown in figure 1 to the target description shown in figure 2 is 85. The matching score when the descriptions are switched, that is when the semantic description shown in figure 2 is used as the search description, is 71. We can calculate the bidirectional matching score of the two semantic descriptions as the average: 78 = (85+71)/2. This is a high matching score indicating that the two items of explicit scientific knowledge are similar.

Semi-automatic Pruning of Matching Results

  1. Top of page
  2. Abstract
  3. Introduction
  4. Related Work
  5. Semantic Description Matching
  6. Matching Method
  7. Creation of Atomic Queries
  8. Generalization of Query Classes
  9. Generalization of Query Properties
  10. Addition of Rules
  11. Creation of Instances Implied by Complex Class Definition
  12. Semi-automatic Pruning of Matching Results
  13. Conclusions
  14. Acknowledgements
  15. References

In order to help the researchers to study the results of the semantic matching in detail, we have created a user interface tool to show and manipulate the matching result. Following the step “creation of atomic queries”, we find matches for the atomic semantic queries of the search description as given by the matching granularity. In order to establish the overall subset of the instances in the search description that best match with instances in the target description, we need to assemble the matching atomic semantic queries together. However, it is possible that two successfully matching atomic semantic queries may share a particular instance in the target description in such a way that both of the atomic semantic queries cannot be simultaneously evaluated as true. Alternatively, it is possible that one successfully matching atomic query may match with more than one set of instances and properties in the target description.

In order to establish a meaningful overlap between the two semantic descriptions, each instance in the search description that is found to match should match with one and only one instance in the target description. Although we could create an algorithm that fulfills the “one to one” matching automatically, we believe that this stage of the matching process benefits from interaction of the human user. So, we adopt a semi-automatic method to fulfill the “one to one” matching. We have developed an interactive “matching chooser” tool to help users to decide which pair of instances should be matched. Table 2 shows the matching candidates for each atomic semantic query. In the table, the instances and properties from the atomic semantic queries are shown in bold type, and the matching candidate instances and properties from the target description are shown in normal type. Temporary instances for the target description created according to the step “Creation of instances implied by complex class definition” are indicated by “??”. Matches made using logic inference are indicated with the symbol “*” in the Property column, matches made using rule-based inference are indicated with “**”, and matches made according to complex class definitions are indicated with “***”.

Table 2. The matching candidates for each atomic semantic query.
Domain instancePropertyRange instance
Tokyois location ofelectric train
Columbus**Freight Lorry
Columbus**light-weight vehicle with engine
Electric trainactivity participant ofshipping big wheel vehicles
Freight Lorry*transporting vehicles
Light-weight vehicle with engine**transporting vehicles
Collision Event On the trackhas affected event participantelectric train
Collision with Building*Freight Lorry
Collision with Building*low-rise apartment
Electric trainphysically containsbig wheel vehicles
Freight Lorry*light-weight vehicle with engine
Big wheel vehiclehas partEngine
Freight Lorry***??
Light-weight vehicle with engine***??
Big wheel vehiclehas partBig wheel
Freight Lorry***??
Light-weight vehicle with engine***??
thumbnail image

Figure 9. The one-to-one matching result.

Download figure to PowerPoint

After selecting the most suitable matching instances, the remaining matches are removed, and the user can obtain a one-to-one match between the search and target descriptions, as shown in figure 9. The symbols in figure 9 are as described in figure 5.

Conclusions

  1. Top of page
  2. Abstract
  3. Introduction
  4. Related Work
  5. Semantic Description Matching
  6. Matching Method
  7. Creation of Atomic Queries
  8. Generalization of Query Classes
  9. Generalization of Query Properties
  10. Addition of Rules
  11. Creation of Instances Implied by Complex Class Definition
  12. Semi-automatic Pruning of Matching Results
  13. Conclusions
  14. Acknowledgements
  15. References

Examining repositories of explicit scientific knowledge to find the knowledge with similar content is an important part of the scientific research process. A key problem is how to assess the similarity between two items of explicit scientific knowledge. After discussing some of the work in the literature that is related to this problem, we presented an approach to calculate a matching score that gives the degree of similarity between two items of explicit scientific knowledge. The approach applies logical inference and other information technologies to semantic descriptions created to describe items of explicit scientific knowledge based on a domain ontology grounded in description logics. The semantic descriptions are created using the semantic description authoring tools provided by the EKOSS system that we have developed in previous work. We presented three dimensions to fine-tune the semantic description matching process: matching granularity, similarity scale for instance classes, and logic similarity scale. In order to compute the partial match between two semantic descriptions, we adopted a six step method: creation of atomic queries, generalization of query classes, generalization of query properties, addition of rules, creation of instances implied by complex class definition, and semi-automatic pruning of matching results.

In future work, we will pursue two main directions of research. The first is the development of additional methods to obtain more interesting matching results between individual semantic descriptions. The second is to develop methods and tools for analyzing the entire matching results among large sets of knowledge resources in order to identify interesting relationships between different knowledge resources that, for example, could point to important new avenues of integrative research.

The EKOSS system is being applied to several different science domains, including life sciences, technologies for sustainable development, and knowledge of failure mechanisms in engineering disasters. Domain ontologies have been created, and repositories of semantic descriptions are being constructed in each of these domains (see the EKOSS website www.ekoss.org for details). We are working to apply the method for semantic matching described in this paper to these repositories, and we have achieved promising initial results that we are in the process of publishing.

One important impediment in our approach is that the manual creation of semantic descriptions is rather time consuming and error-prone. In other work, we are developing a semi-automatic method to assist researchers in creating accurate semantic descriptions of their explicit scientific knowledge.

The EKOSS system and the semantic matching approach described in this paper can be applied in many knowledge intensive areas outside of the sciences, such as publishing services and job matching services. For example, if the publishers of academic books and journals were to adopt this method in their online services, they could provide more intelligent knowledge searching and matching services to their users. A job agency could also apply the semantic description matching to match resumes with job descriptions based not only on the particular keywords, but also on the semantic relationships between those keywords. We welcome inquiries from people interested in using our approach in these and other applications.

Acknowledgements

  1. Top of page
  2. Abstract
  3. Introduction
  4. Related Work
  5. Semantic Description Matching
  6. Matching Method
  7. Creation of Atomic Queries
  8. Generalization of Query Classes
  9. Generalization of Query Properties
  10. Addition of Rules
  11. Creation of Instances Implied by Complex Class Definition
  12. Semi-automatic Pruning of Matching Results
  13. Conclusions
  14. Acknowledgements
  15. References

We gratefully acknowledge support from the Japan Science and Technology Agency (JST) through the Shippai Chishiki Database Project. We also thank the partial supports from the National Natural Science Foundation of China (NSFC) with grant number 70431001, 70771019, and the Liaoning Province Science and Technology Agency with grant number 20061063.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Related Work
  5. Semantic Description Matching
  6. Matching Method
  7. Creation of Atomic Queries
  8. Generalization of Query Classes
  9. Generalization of Query Properties
  10. Addition of Rules
  11. Creation of Instances Implied by Complex Class Definition
  12. Semi-automatic Pruning of Matching Results
  13. Conclusions
  14. Acknowledgements
  15. References
  • Bouquet, P., Serafini, L., & Zanobini, S. (2003). Semantic coordination: A new approach and an application. Lecture Notes in Computer Science (LNCS) 2870, 130145.
  • Do, H.H., & Rahm, E., (2001). COMA – a system for flexible combination of schema matching approaches. In proceedings of the Very Large Data Bases Conference (VLDB), 610621.
  • Doan, A., Madhavan, J., Domingos, P., & Halevy, A. (2003). Learning to map ontologies on the semantic web. In proceedings of the International World Wide Web Conference (WWW), 662673.
  • Giunchiglia, F., Shvaiko, P., & Yatskevich, M. (2004). S-Match: an algorithm and an implementation of semantic matching. In Proceedings of the European Semantic Web Symposium (ESWS), 6175.
  • Giunchiglia, F., & Zaihrayeu, I. (2002). Making peer databases interact – a vision for an architecture supporting data coordination. In Proceedings of the International workshop on Cooperative Information Agents (CIA), 1835.
  • Kang, J., & Naughton, J.F. (2003). On schema matching with opaque column names and data values. In Proceedings of the International Conference on Management of Data (SIGMOD), 205216
  • Kraines, S. B., Batres, R., Kemper, B., Koyama, M., & Wolowski, V. (2006a). Internet-based integrated environmental assessment, Part II: Semantic searching based on ontologies and agent systems for knowledge discovery. Journal of Industrial Ecology, 10 (4), 124.
  • Kraines, S. B., Batres, R., Koyama, M., Wallace, D. R., & Komiyama, H. (2005). Internet based integrated environmental assessment: Using ontologies to share computational models. Journal of Industrial Ecology 9 (3), 3150.
  • Kraines, S. B., Guo, W., Kemper, B. E., & Nakamura, Y. (2006b). EKOSS: A knowledge-user centered approach to knowledge sharing, discovery, and integration on the Semantic Web. Lecture Notes in Computer Science (LNCS) 4273, 833846.
  • Lenzerini, M. (2002). Data integration: A theoretical perspective. In proceedings of the Symposium on Principles of Database Systems (PODS), 233246.
  • Paolucci, M., Kawamura, T., Payne, T., & Sycara, K. (2002). Semantic matching of web services capabilities. Lecture Notes in Computer Science (LNCS) 2342, 333347.
  • Rahm, E., & Bernstein, P. (2001). A survey of approaches to automatic schema matching. The International Journal on Very Large Data Bases (VLDB), 10 (4), 334350.
  • Shvaiko, P., & Euzenat, J. (2004). A Survey of Schema-based Matching Approaches. Technical Report, DIT-04-087, University of Trento.
  • van Eijk, R., de Boer, F., van de Hoek, W., & Meyer, J. J. (2001). On dynamically generated ontology translators in agent communication. International Journal of Intelligent System, 16 (5), 587607.
  • Velegrakis, Y., Miller, R.J., & Mylopoulos, J. (2005). Representing and querying data transformations. In Proceedings of the International Conference on Data Engineering (ICDE), 8192.