Research Article
Mining Taverna's semantic web of provenance
Article first published online: 22 AUG 2007
DOI: 10.1002/cpe.1231
Copyright © 2007 John Wiley & Sons, Ltd.
Issue
1532-0634/asset/cover.gif?v=1&s=6094df24c795ce080ff6df6ff3b6bcec19adb708)
Concurrency and Computation: Practice and Experience
Special Issue: The First Provenance Challenge
Volume 20, Issue 5, pages 463–472, 10 April 2008
Additional Information
How to Cite
Zhao, J., Goble, C., Stevens, R. and Turi, D. (2008), Mining Taverna's semantic web of provenance. Concurrency Computat.: Pract. Exper., 20: 463–472. doi: 10.1002/cpe.1231
Publication History
- Issue published online: 1 MAR 2008
- Article first published online: 22 AUG 2007
- Manuscript Accepted: 1 MAY 2007
- Manuscript Revised: 22 APR 2007
- Manuscript Received: 28 NOV 2006
Funded by
- EPSRC. Grant Numbers: GR/R67743, EP/D044324/1, EP/C536444/1
- Abstract
- Article
- References
- Cited By
Keywords:
- workflow;
- provenance;
- semantic annotation
Abstract
- Top of page
- Abstract
- 1. INTRODUCTION
- 2. GENERATING PROVENANCE INFORMATION FOR THE CHALLENGE
- 3. ANSWERING THE CHALLENGE QUERIES
- 4. OUZO FOR TAVERNA'S BIOINFORMATICS LANDSCAPE
- 5. MINING OUZO WITH PROQA
- 6. CONCLUSION
- Acknowledgements
- REFERENCES
Taverna is a workflow workbench developed as part of the UK's myGrid project. Taverna's provenance model captures both internal provenance locally generated in Taverna and external provenance gathered from third-party data providers. This model also supports overlaying secondary provenance over the primary logs and lineage. This design is motivated by the particular properties of bioinformatics data and services used in Taverna. A Semantic Web of provenance, Ouzo, is built to combine the above different provenance by means of semantic annotations. This paper shows how Ouzo can be mined by a provenance usage component, Provenance Query and Answer (ProQA). ProQA supports provenance retrievals as well as provenance abstraction, aggregation, and semantic reasoning. ProQA is implemented as a suite APIs which can be deployed as provenance services to compose system provenance workflows that analyse experiment results using the provenance records. We show how these features of Taverna's provenance support us in answering the questions from the provenance challenge workshop and a set of additional provenance queries. Copyright © 2007 John Wiley & Sons, Ltd.
1. INTRODUCTION
- Top of page
- Abstract
- 1. INTRODUCTION
- 2. GENERATING PROVENANCE INFORMATION FOR THE CHALLENGE
- 3. ANSWERING THE CHALLENGE QUERIES
- 4. OUZO FOR TAVERNA'S BIOINFORMATICS LANDSCAPE
- 5. MINING OUZO WITH PROQA
- 6. CONCLUSION
- Acknowledgements
- REFERENCES
There is consensus about the importance of supporting provenance in various systems 1. In order to understand the capabilities of different provenance systems and the rationale behind the designs, the first provenance challenge workshop was held. In this workshop, nine provenance-related queries (see the Editorial 1) were used to examine 18 participating provenance systems, their provenance representations, query capabilities, and the scope of provenance.
This work has successfully answered all the challenge questions, and proposed a set of additional provenance queries. This query capability is underpinned by providing a Semantic Web of Provenance, Ouzo, and a provenance query component, Provenance Query and Answer (ProQA). Ouzo includes a broader scope of provenance information than most existing provenance systems. It combines the internal provenance data generated locally in a workflow system with the external provenance provided by the third-party data publishers. It also overlays secondary provenance over the primary logs and data lineage to express scientists' or third-parties' abstraction and interpretation about the primaries. The choices of Ouzo's representation (Resource Description Framework (RDF)/RDF Schema (RDFS)) and scope of information are motivated by the particular properties of bioinformatics data and services used in Taverna.
In the following, we describe how Ouzo and ProQA answer the provenance challenge questions (see Sections 2 and 3). Section 4 presents the particular challenges that motivate Ouzo and Section 5 shows the additional queries provided by ProQA for mining Ouzo. We conclude in Section 6 by comparing our research emphasis with others along the Editorial matrix 1.
2. GENERATING PROVENANCE INFORMATION FOR THE CHALLENGE
- Top of page
- Abstract
- 1. INTRODUCTION
- 2. GENERATING PROVENANCE INFORMATION FOR THE CHALLENGE
- 3. ANSWERING THE CHALLENGE QUERIES
- 4. OUZO FOR TAVERNA'S BIOINFORMATICS LANDSCAPE
- 5. MINING OUZO WITH PROQA
- 6. CONCLUSION
- Acknowledgements
- REFERENCES
Our prototype is implemented under the context of Taverna 2, a workflow workbench that is targeted at supporting bioinformaticians and provided by the U.K. e-Science pilot project myGrid (http://www.mygrid.org.uk). To address the queries of the provenance challenge workshop, we implement the challenge workflow (see Figure 1 in the Editorial 1) using Taverna's Simplified conceptual workflow language (Scufl) 2 and annotate it with semantic metadata (introduced later in Section 2.2). A Scufl workflow defines the order of the services (Processors) performed, the location of these services, the data (Ports) passed between the services, and the datalinks between the Ports. A datalink links an input Port of a Processor with the output Port of another Processor.
The challenge workflow is simulated in Taverna using pseudo-data in order to avoid processing data of large volume. Although Taverna has been successfully applied to applications processing data of large volume, the reliability of the provenance collection plug-in was unproven at the time of the workshop.
The following describes the properties of our provenance information and how it is collected during the run of the challenge workflow.
2.1. Properties of provenance representation
Taverna's provenance information is represented using RDF in order to reflect its graph data structure. The provenance information generated in each run is a single provenance graph; and the information generated in different runs of the same workflow or different workflows form multi-graphs and mega-graphs, respectively.
RDFs flexible graph model and open identification system (i.e. Uniform Resource Identifiers (URIs)) allow provenance metadata to be merged from different runs and sources. These merged provenance graphs form a Provenance Web. Everything in our Provenance Web is named by a Life Science Identifier (LSID) 3, i.e. a URI. LSIDs are recognized as a community standard and have been adopted by the major life science databases to publish their provenance information 4. RDFS is used to define a provenance ontology. These three Semantic Web technologies enable us to build a Semantic Web of Provenance information, the Ouzo Web: which can be viewed from four views:
The process view, which is similar to the traditional event logs, tracks the ‘how, when and what’. For example, what inputs (DataObject) were used and what outputs (DataObject) were produced during the ProcessRun of the Process ‘reslice’.
The data view: is about the origin and data lineage of a data product, which describes the ‘what and which’, i.e. the order of the ProcessRuns and DataObjects that derived a DataObject. For example, the outputs of the ProcessRun of ‘reslice’ are derivedFrom the outputs of the ProcessRun of ‘align_warp’.
The organizational view: includes the ‘who’, the Experimenter who created a Workflow (createdBy), launched a Run (launchedBy), or owns a DataObjects or a Process, etc. The process, data, and organizational view capture the primary provenance.
The knowledge view: keeps the ‘what and why’ to provide an abstraction or interpretation about the primaries. It provides the upper property userPredicate that can be extended for denoting user-specified views over primary provenance, and the property instanceOf to associate an interpretation about a DataProduct. For example, for answering Question 8-9 from the challenge workshop, we extend the userPredicate with the property center, to denote the centre where the image is obtained, such as UChicago; and the property studyModality, to denote the mode of the image data, such as speech, video or image. If fully implemented, the ranges of these properties can be controlled terms from a domain ontology about image centres or modes.
2.2. Collecting the challenge provenance
The organizational and knowledge provenance can be obtained from three different sources: users' annotations of the Scufl workflows through a knowledge template plug-in; service descriptions from the myGrid semantic service discovery component Feta 5; and provenance published by the third-party data providers 6. To address the challenge queries, we used the knowledge template approach that is introduced below.
The knowledge template is a plug-in to the Taverna workbench that allows the scientists to enrich their workflows by annotating the relationship between the Ports of a Processor or annotating the semantics of the Ports using some ontological concepts. This enables the scientists to build a domain oriented over the actual data derivation path. For example, to answer Question 8, the input Port of the Processor ‘align_warp’ is associated with the concept of UChicago by the property of center. These annotations are kept as metadata about these Ports within the Scufl workflow. During the workflow runs, these annotations about the Ports are passed to their corresponding actual data products and kept as their knowledge provenance.
As shown in Figure 1, once the workflow is annotated with semantic users' interpretations and provided with actual inputs, it can be invoked in Taverna. As the workflow runs, when a data product is passed to the enactor, it is allocated a Taverna LSID by the Taverna LSID authority. This identity is associated with the data product when it is stored or passed to invoke other services. The data products acquired by a run are stored in a customized database or a local ‘catch all’ store, the relational data store Baclava. This identity is also used to store the RDF provenance metadata of this data product in the metadata store, KAVE 6. If the data product is gathered from external sources, its external identity is kept along with its Taverna LSID in KAVE 4.
3. ANSWERING THE CHALLENGE QUERIES
- Top of page
- Abstract
- 1. INTRODUCTION
- 2. GENERATING PROVENANCE INFORMATION FOR THE CHALLENGE
- 3. ANSWERING THE CHALLENGE QUERIES
- 4. OUZO FOR TAVERNA'S BIOINFORMATICS LANDSCAPE
- 5. MINING OUZO WITH PROQA
- 6. CONCLUSION
- Acknowledgements
- REFERENCES
The questions from the challenge workshop are answered by the RDF queries implemented in ProQA. ProQA is implemented as a suite of APIs that support provenance usage of different complexities:
The RDF access API which supports the Core API accessing the underlying RDF provenance repository, KAVE. For the challenge queries, the RDF query language TriQL 7 is used. TriQL is part of the Named Graphs for Jena (NG4J) 7 framework and supports retrieving provenance information within a particular run.
The Core API, which provides a collection of generic queries and pre-canned queries for retrieving an Ouzo resource by its provenance metadata, or retrieving the provenance metadata or graph of an Ouzo resource. An Ouzo resource can be a workflow, a run, a service or its invocation, a data product, or an experimenter. The retrieval can be either semantic free or semantic rich, depending on whether semantic reasoning is used in the retrieval. The Core API also provides three provenance graph manipulation operations for aggregating, integrating and comparing multi- and mega-graphs.
The Analysis API that makes use of the Core API to realize provenance-based tasks. From our understanding, Question 7 is an impact analysis task that uses mega-graphs (see Section 2.1) from two different runs of two different workflows. This task is supported by the graph comparison operation from the Core API.
Question 8 can be expressed using TriQL as the following:
This query uses the data and knowledge provenance information. The select clause defines what to be returned, i.e. a data product. The where clause defines that the query is performed within the provenance information generated during the run of urn:lsid:www.mygrid.org.uk:experimentinstance:HXQOVQA2ZI0. It also defines that the query searches for a data product derived from an anatomy image that is annotated with a key-value pair center=UChicago. This example is implemented as a pre-canned query in the Core API.
If the key-value pair is expressed using a vocabulary from a domain ontology, such as an ontology describing image centres, then we can search for anatomy images that are from centres within a distance of 100 km from Chicago, or within the state of Illinois, etc. This is the semantic reasoning about provenance supported by the Core API.
Question 7 is implemented by the impact analysis task in the Analysis API. Alternatively, this task can be implemented as a provenance workflow. The Core API can be deployed as services and used by the scientists to build provenance workflows. This provenance workflow approach helps the scientists use provenance to interpret their experiment results without composing queries themselves. This approach is consistent with the experiment practice of Taverna's users, and enables the provenance of the data analysis and interpretation process to be automatically collected during the runs of these workflows.
4. OUZO FOR TAVERNA'S BIOINFORMATICS LANDSCAPE
- Top of page
- Abstract
- 1. INTRODUCTION
- 2. GENERATING PROVENANCE INFORMATION FOR THE CHALLENGE
- 3. ANSWERING THE CHALLENGE QUERIES
- 4. OUZO FOR TAVERNA'S BIOINFORMATICS LANDSCAPE
- 5. MINING OUZO WITH PROQA
- 6. CONCLUSION
- Acknowledgements
- REFERENCES
Particular provenance challenges are presented by the properties of the bioinformatics data and services used in the Taverna system, and by the design and technologies adopted in Taverna. The following discusses these challenges that impact on the content and scope of Ouzo.
Describing and abstracting complex data derivations: Bioinformatics workflows are often data pipelines that frequently generate data collections and then iterate over each item in the collection. Iterations are provided in Taverna to support workflows processing collection data products 2. Nested workflows are created for building modularized workflows and for more efficient workflow reuse. The challenge workflow can be revised using iterations and nested workflows, which leads to provenance information containing complex data derivation paths. This means that the provenance model should not only accurately describe these derivation paths, but also provide a more abstract view over the primary logs and lineage in order to save the scientists from overwhelmed.
Building a user-specified abstract view over the logs: Bioinformatics data resources and services are published in a highly autonomous manner. This autonomy leads to massive heterogeneity within those resources. There is no trans-domain-type system used by bioinformatics services. Each tool provider exposes its sequence record in different representation format. Shim services 8 are provided in Taverna to manage the mismatches between the heterogeneous data, such as re-formatting the data for further processing, mapping data identities from one source to another, etc. The logs of these shims are faithfully captured in Ouzo, even though they may not always be interesting to the scientists. Scientists need a mechanism to specify a more abstract view over the logs to hide the shim steps.
Connecting internal provenance with external provenance: The user-specified and primary provenance together constitute the internal provenance as they are generated within the confines of Taverna. Provenance may also be gathered from external domain services or third-party data publishers. This is the external provenance, which contains external logs that track how the state of an external domain service changed during its invocation (e.g. 0 min 3.17 s CPU time was used in the computation); and the external knowledge about the data resources. For example, the AnatomyImages published by Center X have a reputation for their high quality. For the scientists, this external provenance information is important for verifying the data quality and ownership. This motivates us to integrate this external provenance with the internal Ouzo Web.
Linking together the multi- and mega-graphs collected in different runs: Bioinformatics data sets and services are frequently updated. This volatility causes repetitive experiment effort. Scientists need to repeat their workflows in case an updated data product could be gathered or computed 4. Varying the settings of repeated runs might lead to completely different or completely identical outcomes. Provenance information generated in repeated runs need to be linked together and compared in order to interpret and explain the results.
Our provenance model supports describing the complex data derivation paths of iterations and nested runs by defining different types of invocations, and collection versus atomic data products. The process ontology refines a ProcessRun as a ProcessRunWithIteration that iterates more than one ProcesssIterations. It also refines a WorkflowRun as a NestedWorkflowRun. The data ontology refined a DataObject as a DataCollection and an AtomicData, and describes their data derivation. An abstraction over the primary logs and lineage is obtained by either pre-defined typed views or user-specified abstraction.
This work focuses on integrating the external knowledge provenance. Many life science database providers are starting to publish their data by the LSID protocol and annotate them with semantic annotations 4, such as UniProt, GenBank, and Affimetrix, etc. Some of these annotations are expressed as RDF statements or can be translated into RDF using technologies, such as XSLT, XPath, etc. The graph-based RDF model enables the external RDF graphs to be merged with the internal Ouzo RDF graphs.
The aggregation of the multi- and mega-graphs is supported by the graph-based RDF representation of Ouzo and the unique identities given to the same data appearing in the multi- and mega-graphs. The previous work 4 has analysed and showed how we managed to achieve the graph aggregation by reconciliating the polyonymous identities given to the same data product that appears in the multi-graphs.
5. MINING OUZO WITH PROQA
- Top of page
- Abstract
- 1. INTRODUCTION
- 2. GENERATING PROVENANCE INFORMATION FOR THE CHALLENGE
- 3. ANSWERING THE CHALLENGE QUERIES
- 4. OUZO FOR TAVERNA'S BIOINFORMATICS LANDSCAPE
- 5. MINING OUZO WITH PROQA
- 6. CONCLUSION
- Acknowledgements
- REFERENCES
In order to mine this rich Ouzo Web, ProQA supports extra provenance usage than retrievals. Previous work has shown ProQA supports aggregating the multi- and mega-graphs 4. The following shows ProQA supports presenting an abstraction over Ouzo, and reasoning about the knowledge provenance (either internally specified or gathered from external sources).
5.1. Abstraction over Ouzo
ProQA supports abstracting over the primary provenance either by a set of typed views or by users' specifications. These are introduced below.
Typed views: In order to provide a more abstract view over the complex logs and lineage caused by iterations and nested workflows, this work defines typed views. A typed view contains a subset of the concepts and properties of the provenance ontology in order to present a subset of the provenance graphs. Four typed views are defined to present the Ouzo Web according to the four views given in the provenance ontology. For example, the process view presents only the logs and the data view presents only the data linage. In addition, two other typed views are defined in order to provide a more abstract process or data view for the scientists:
An abstract process view: which represents the logs of a workflow run, but hides the details of a nested run or an iteration;
An abstract data view: which represents the data lineage of a data product, but hides the elements of a collection data product and their data provenance information.
Conceptually, these views are similar to the views from relational databases. They are defined at the schema level using the concepts and properties of the provenance ontology, and the provenance information satisfying a typed view is obtained by a pre-canned query from the Core API.
User-specified abstraction: Despite of these pre-defined typed views, this work allows the users to obtain an abstract view that hides the shim steps by specifying a virtual link between two Ports. This virtual link is specified at the workflow level, and stored in the Scufl workflows, the same as the user-specified provenance given before (Section 2.2). During the workflow runs, this virtual link between the Ports is passed to the actual data products that correspond to these Ports. A query over the user-specified view of provenance will present this user-specified abstract view to hide the shim steps.
For example, users can use the knowledge template plug-in to annotate the challenge workflow with a relationship between the output port (X_image) of the ‘convert’ service and the output port (Average_image) of the ‘softmean’ service as slicedAtXAxis. This helps the scientists who are solely interested in how an X_image is derived from an Average_image by hiding how the Average_image is sliced and converted. A query from the Core API can retrieve the user-specified provenance to show only this user-specified relationship between an X_image and an Average_image and hide the logs of the ‘slice’ and ‘convert’ services.
The user-specified interpretative annotations are mostly expressed by users' keyword tags (such as slicedAtXAxis, etc.), which are created on the fly, under the context of a particular workflow. The users' tags are not necessarily from an ontology. We envisage that as all such tags contributed by a community are gathered together, they will form a folksonomy 9 that contains a classification of these uses' tags. This folksonomy will enable a smarter search over the Ouzo Web using a shared vocabulary, and provides a more light weight and flexible approach than the ontology-based reasoning of provenance 10.
This provenance abstraction (by the typed views or users' specification) is similar to Zoom's user view11. It differs from Zoom in two ways: (1) it provides not only an abstraction over the logs and data lineage but also an interpretation of the abstraction by the users' tags; and (2) the user-specified abstraction is more flexible than the pre-defined user views in Zoom by allowing different interpretations to be expressed by different users, over the same pair of data products.
5.2. Interpretation of the Web
The interpretation about the data products on the Ouzo Web can be internally specified by the scientists or gathered from third parties. Previous work 4 has shown how we preserve the external LSIDs of a data product along with its internal Taverna LSIDs in order to integrate its external knowledge provenance. This internal and external knowledge provenance of a data product can be retrieved by a query over the knowledge view of provenance. If the knowledge is expressed by ontological concepts, a semantic reasoning about the integrated knowledge can be performed. The following example retrieves the internal data provenance and user-specified provenance of an AnatomyImage about HumanBody from Ouzo.
This query shows how the integrated internal and external provenance can be retrieved by ProQA. This query returns provenance of an image of human body, including chest, legs, arms, etc. This semantic reasoning exposes the implicit links between provenance, i.e. workflows studying any part of a human's body. Similar semantic reasoning about provenance has also been achieved in related work 10. However, our approach aims to integrate the external knowledge rather than creating them by ourselves. Most of these external knowledge published by the life science data providers are expressed in a domain-specific taxonomy, such as a taxonomy about organism, which simply contains a collection of structured and classified keywords instead of providing a complex property-based description. This limits the extent of semantic reasoning can be performed.
To summarize, ProQA can provide the following extra provenance queries, in addition to answering the challenge queries:
ProQA supports querying a revised challenge workflow containing iterations and nested workflows, as well as presenting an abstraction over the primary logs and data lineage.
The challenge queries require only the internal provenance, while ProQA can support querying both internal and external provenance.
ProQA can perform ontological reasoning and folksonomy-based reasoning for provenance queries.
The scope of the target information is not specified in the challenge queries, while our queries can return provenance information generated within one run or from multiple runs, demonstrating our capability of controlling the scope of queries.
ProQA provides provenance workflows to analyse provenance for the scientists and save them from composing provenance queries.
6. CONCLUSION
- Top of page
- Abstract
- 1. INTRODUCTION
- 2. GENERATING PROVENANCE INFORMATION FOR THE CHALLENGE
- 3. ANSWERING THE CHALLENGE QUERIES
- 4. OUZO FOR TAVERNA'S BIOINFORMATICS LANDSCAPE
- 5. MINING OUZO WITH PROQA
- 6. CONCLUSION
- Acknowledgements
- REFERENCES
This paper shows how myGrid addresses the queries from the provenance challenge workshop, and introduces how Taverna supports possible variations of the challenge workflow, how our provenance information contains broader information than the event logs and data lineage, and how ProQA can answer extra provenance questions. We summarize our system using the matrix defined in the editorial.
Characteristics of myGrid provenance system: myGrid provenance information is generated in the Taverna workflow system and therefore is dependent on the execution environment. This provenance information is represented using RDF and queried using RDF query languages. An RDF query language TriQL is used to implement the challenge workshop. This information is collected during simulated workflow runs using pseudo-data to avoid processing data of large volume. Our research focuses on executing (E) workflows, and recording (R) and querying (Q) the provenance information. Two issues remain to be solved: (1) increasing the reliability of provenance collection for workflows processing large volume of data or containing complex nested workflows 11; and (2) optimizing the provenance storage, e.g. avoiding the preservation of repetitive provenance information of the same generated during repeated runs 12.
Properties of myGrid provenance representation: A layered model presents the content of myGrid provenance information at four views. The logs include the workflow and services used for a workflow run, the invocation events occurred during a run, and their start and end time. The logs do not include the causal flow of the events, unlike the works from PASOA 13, etc. The data lineage describes the data derivation paths of data products, either an atomic or a collection. Users' annotations reflect their understandings about the workflow, its data products and their relationships. They are included as part of the knowledge provenance in our model. Everything in myGrid provenance is named by a URI. This is also supported by other projects, such as 14–16.
Additionally, myGrid supports presenting an abstraction over the provenance information by two means: one is the users' specified annotations that draw an interpretative link between the ports of the Scufl processors, and the other is the typed views that hide or expose the execution details of an iteration or a nested run, or the data lineage of a collection and its elements. Similar approach is also provided in Zoom 11.
Finally, our provenance model describes not only the primary logs and data lineage but also the secondary abstraction and interpretation over the primaries. The secondary provenance can be internally contributed by Taverna users or integrated from third parties that are external to Taverna. The latter is the external knowledge provenance. This integration is supported by our open RDF-based provenance representation. Alternatively, this external provenance can be collected by an open architecture, such as provided by EU PASOA 13, etc.
Acknowledgements
- Top of page
- Abstract
- 1. INTRODUCTION
- 2. GENERATING PROVENANCE INFORMATION FOR THE CHALLENGE
- 3. ANSWERING THE CHALLENGE QUERIES
- 4. OUZO FOR TAVERNA'S BIOINFORMATICS LANDSCAPE
- 5. MINING OUZO WITH PROQA
- 6. CONCLUSION
- Acknowledgements
- REFERENCES
The myGrid project, grant numbers GR/R67743, EP/D044324/1 and EP/C536444/1, is funded under the UK e-Science programme by the EPSRC. The authors would like to acknowledge the other members of the myGrid team for their contributions. We thank David De Roure and Antoon Goderis for their useful comments.
REFERENCES
- Top of page
- Abstract
- 1. INTRODUCTION
- 2. GENERATING PROVENANCE INFORMATION FOR THE CHALLENGE
- 3. ANSWERING THE CHALLENGE QUERIES
- 4. OUZO FOR TAVERNA'S BIOINFORMATICS LANDSCAPE
- 5. MINING OUZO WITH PROQA
- 6. CONCLUSION
- Acknowledgements
- REFERENCES
- 1, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , . The first provenance challenge. Concurrency and Computation: Practice and Experience 2007; DOI: 10.1002/cpe.1233.
- 2, , , , , , , , , , , , , , , , . Taverna: Lessons in creating a workflow environment for the life sciences. Concurrency and Computation: Practice and Experience 2006; 18(10):1067–1100.Direct Link:
- 3, , . The impact of life science identifier on informatics data. Drug Discovery Today 2005; 10(22):1566–1572.
- 4, , . An identity crisis in the life sciences. Proceedings of the 3rd International Provenance and Annotation Workshop, Chicago, U.S.A., May 2006; 254–269, extended paper.
- 5, , , . Feta: A light-weight architecture for user oriented semantic service discovery. Proceedings of European Semantic Web Conference, Heraklion, Greece, 2005; 17–31.
- 6, , , , , . Using semantic web technologies for representing e-science provenance. Proceedings of the 3rd International Semantic Web Conference, Hiroshima, Japan, 2004; 92–106.
- 7, , , . Named graphs. Journal of Web Semantics 2005; 3(4):247–267.
- 8, , , , , . Deciding semantic matching of stateless services. Proceedings of the 21st National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, Boston, U.S.A., 2006; 1319–1324.
- 9. Ontology of folksonomy: A mash-up of apples and oranges, 2005 [cited January 2007]. Available at: http://tomgruber.org/writing/ontology-of-folksonomy.htm.
- 10, , , , . Provenance trails in the wings/pegasus system. Concurrency and Computation: Practice and Experience 2007; DOI: 10.1002/cpe.1228.
- 11, , , . Addressing the provenance challenge using zoom. Concurrency and Computation: Practice and Experience 2007; DOI: 10.1002/cpe.1232.
- 12, . Automatic capture and efficient storage of escience experiment provenance. Concurrency and Computation: Practice and Experience 2007; DOI: 10.1002/cpe.1235.
- 13, , , , , . Extracting causal graphs from an open provenance data model. Concurrency and Computation: Practice and Experience 2007; DOI: 10.1002/cpe.1236.
- 14, , , , , . Tracking provenance in a virtual data grid. Concurrency and Computation: Practice and Experience 2007; DOI: 10.1002/cpe.1256.
- 15, . Tracking provenance semantics in heterogeneous execution systems. Concurrency and Computation: Practice and Experience 2007; DOI: 10.1002/cpe.1253.
- 16, , , , . From computation models to models of provenance: the RWS approach. Concurrency and Computation: Practice and Experience 2007; DOI: 10.1002/cpe.1234.

1532-0634/asset/olbannerleft.gif?v=1&s=a4e4e145787de94e1d91eaab3c8c29d8a9d96a26)
