The Intersection Between Semantic Web and Materials Science

The application and benefits of Semantic Web Technologies (SWT) for managing, sharing, and (re‐)using of research data are demonstrated in implementations in the field of Materials Science and Engineering (MSE). However, a compilation and classification are needed to fully recognize the scattered published works with its unique added values. Here, the primary use of SWT at the interface with MSE is identified using specifically created categories. This overview highlights promising opportunities for the application of SWT to MSE, such as enhancing the quality of experimental processes, enriching data with contextual information in knowledge graphs, or using ontologies to perform specific queries on semantically structured data. While interdisciplinary work between the two fields is still in its early stages, a great need is identified to facilitate access for nonexperts and develop and provide user‐friendly tools and workflows. The full potential of SWT can best be achieved in the long term by the broad acceptance and active participation of the MSE community. In perspective, these technological solutions will advance the field of MSE by making data FAIR. Data‐driven approaches will benefit from these data structures and their connections to catalyze knowledge generation in MSE.


Main
Innovative research often relies on advanced technological solutions to exploit (data) potential. However, the diversity of scientific data formats and structures leads to compatibility issues and slows down progress. Data-driven science benefits from consistent data generation, organization, storage, and sharing. In this context, systems need to become interoperable to enable better automation in data access and analysis. Semantic Web Technologies (SWT) are able to efficiently address these requirements.
Materials Genome Initiative (MGI) in 2011, [11] which aims to shorten typical development and innovation cycles, that SWT became relevant and found higher acceptance in the broader MSE domain (Figure 1c). Inspired by this new paradigm of a data-driven research approach, more initiatives for open science and collaborative material data spaces were funded. [12] The European framework provides such initiatives (We exclude initiatives not related to Semantic Web and Materials Science) that are involved mainly with Horizon 2020 (https://www.horizont2020.de/), a research funding program that enables the building of a knowledge-and innovation-based society and a competitive economy, while contributing to sustainable development in Europe. Concerning digital infrastructure building, the FAIR-DI (https://www.fair-di.eu/fair-di/) [13,14] association also promotes an infrastructure for data from materials science, engineering, and astronomy with its European member institutes. In addition, GAIA-X (https://www.datainfrastructure.eu/GAIAX/Navigation/EN/Home/home.html/) aims to build a competitive, secure, and trustworthy data infrastructure for Europe. Another example at the national level is the Platform Industry 4.0 (https://www.bmbf.de/bmbf/de/forschung/ digitale-wirtschaft-und-gesellschaft/industrie-4-0/industrie-4-0/), a German government project with the mission to better connect machines and processes in the manufacturing industry using digital technologies. Further German initiatives such as the Platform MaterialDigital (https://www.materialdigital.de/) have been established to exploit SWT and especially ontologies in the MSE domain. MatPortal (https://matportal.org/), a repository for MSE ontologies, was also recently set up. Beyond this, various consortia of the German national research data infrastructure (NFDI) (https://www.nfdi.de/) have committed to the goal of systematically providing access to valuable data from science, which is also being attempted with a close connection to the European Open Science Cloud (EOSC) (https://digital-strategy. ec.europa.eu/de/policies/open-science-cloud/). All these researchand industry-driven efforts aim to trigger increased community use and harmonizing appropriate digital tools and accepted standards.
So far, no detailed study has yet addressed the implementation of SWT in the MSE in a classified and structured way, detailing both the benefits and the challenges at this intersection. In this regard, it is necessary to make use of these technologies comprehensible and explicit to a larger community, as SWT has the potential to change the conventional MSE landscape through appropriate extrapolation. Therefore, it is not surprising that the literature describes the benefits, implicitly calling for the need for an overview of the state of the art. For example, the potential of using ontologies in materials science was reported before. [15] An ontology that maps a standardized characterization method and is shared publicly could be used by anyone for organizing resulting data and metadata. [12] Moreover, the next step here could be for manufacturers of testing machines to prescribe this data structure already as an output format. The organized formal knowledge in ontologies could also be used as a teaching tool to provide a basic understanding of the characterization method itself, the structure of the data and metadata, as well Figure 1. Temporal connection between SW standards, ontological resources, and the advent of SWT in MSE. a) Evolution of important SW standards provided by the W3C. [49] b) Chronology of ontology development with important milestones: The agreement on Gruber's definition, [18] a global successful applied example in the life sciences (GO [1] ), the appearance of upper-level ontologies (SUMO, [61] DOLCE, [6] BFO, [7] PROV-O [43] ), the creation of the QUDT, [47] a rich ontology for quantities, units, and dimensions, or the availability of ontology repositories (BioPortal, [62] OntoPortal [63] ). Advanced midlevel ontologies with worldwide scope have also been included (IOF, [64] CCO, [65] EMMO [45] ), as well as the conversion of BFO [7] not as a standard de facto but de iure as an ISO document. [66] c) Increasing interest in SWT implementation in MSE with the MGI [11] as an enabler. In addition to the globally active Research Data Alliance (RDA), [67] which aims at open data exchange, initiatives from the European region are shown here. This is to illustrate how development in a limited geographic scope in particular is increasing rapidly. The section "Explanations of technical terms" contains terms not described here, e.g., RDF, OWL, or SPARQL, which are also used in the remaining text.
as the application of important technological tools already in education. [16] Our article aims to provide stakeholders from both fields with an overview of existing approaches and implementations at this intersection. Relevant works are identified (Section 2) and summarized according to created categories (Section 3) providing insights based on current examples. The reader is guided through a distillation of challenges and opportunities that arise from approaching these overlapping areas (Section 4).

Explanations of Technical Terms
API: Computer programs can communicate with each other via an Application Programming Interface (https://en. wikipedia.org/wiki/API).
FAIR Data: The application of the FAIR principles [17] (https:// www.go-fair.org/fair-principles/) is aimed at increasing the Findability, Accessibility, Interoperability, and Reusability of (research) data.
IRI: Internationalized Resource Identifier (https://www.rfceditor.org/rfc/rfc3987) can be used instead of a URI to uniquely identify resources.
MSE: Quoting "Materials Science and Engineering combine engineering, physics, and chemistry principles to solve realworld problems associated with nanotechnology, biotechnology, information technology, energy, manufacturing, and other major engineering disciplines" (Quoting from https://mse.umd.edu/ about/what-is-mse/).
Ontology: Gruber defines "An ontology is an explicit specification of a conceptualization". [18] Ontology creation involves the description of knowledge through specified terms (formalized vocabularies) and their relationships to other terms, usually covering a specific domain represented by a community of users. The semantic web languages OWL (https://www.w3.org/ OWL/) (Web Ontology Language) and its revised extension OWL2 (https://www.w3.org/TR/2012/REC-owl2-overview-20121211/) enable the creation and sharing of ontologies over the Web aiming at making Web content more accessible to machines.
SWT: Refers to Semantic Web Technologies that apply linked data and RDF concepts from [4] aiming to provide RDF graphs from MSE knowledge data.
W3C: The World Wide Web Consortium (https://www.w3. org/) provides standards for the World Wide Web.

Literature Search Strategy and Selection Process
For the literature research, we used a specific search strategy to identify recent and relevant publications applying SWT in MSE approaches. This literature search strategy and the arising research questions are described in this section.

Selection Methodology
The merits of our literature search strategy are to provide insights into the existing research relevant to MSE approaches using SWT and to present this knowledge compactly in a written report. The literature analysis will help in understanding the benefits provided by SWT. The search strategy used in this work ( Figure 2) follows a formal systematic literature review process. In particular, this study builds on the guidelines proposed in refs. [23][24][25][26]. Other surveys, for example, [15,27] of relevant journals as well as on related topics, such as MSE and SWT, are also considered.

Research Questions
This work aims to provide an overview of the application of SWT in different areas of MSE, highlighting challenges and opportunities. To achieve this goal, we aimed to answer the following general research question: Which
www.advancedsciencenews.com www.advintellsyst.com the reader with a detailed overview of the current status and the benefits of using SWT in MSE.
Here we give an overview of how different the various techniques and approaches to ontology creation are in the selected works. Zhao and Qian [29] proposed a semantic integration method that extracts a material database schema semiautomatically, Table 1. Selected Publications with Authors, Year, and Title. The matrix illustrates the Areas of Intersection by the following categories: The first includes works that deal with Ontology Creation (e.g., for the representation of MSE methods and knowledge). RDF Ontology Application includes approaches that use ontologies for specific tasks. RDF Instance Creation involves the conversion of data into RDF triples. RDF Information Retrieval includes approaches for querying triples via APIs with SPARQL. Provenance covers the collection of metadata in terms of full traceability and reproducibility of generated MSE experiment and simulation data according to FAIR data principles. Ontology Reasoning involves automatable inference and reasoning, in this way obtaining logical conclusions from a collection of axioms or claimed facts on ontologies. Ref  The process encodes a particular subset of the reality into an ontology, considering parameters like initial requirements, range of applicability, or methodology used. b) Ontology Application. The type of applications of ontologies in MSE is far from being fully explored. However, here we present the most common in our selection: i) Combining. This is the creation of a new ontology from a previous set of ontologies to be used together. ii) Reusing. An ontology's maintenance also includes reviewing its components to be adapted to new requirements. iii) Mapping. When the same reality is described independently with different approaches, it is helpful to map entities to extend their use. iv) Knowledge Extraction. This describes the process of using ontology as a framework for terms and relationships extraction. c) RDF Instances Creation and Information Retrieval. The process of creating RDF instances from tabular data using an ontology is shown. Once the data is in a triple-store format, the query language SPARQL makes information retrieval in federated data systems possible. d) Provenance of MSE Experiments. A structured way to encode the many steps of an experiment is to use an ontology that allows many degrees of modularization. e) Ontology Reasoning. The use of reasoners is possible with OWL. Different reasoners use different logic paradigms that allow different degrees of deduction.
www.advancedsciencenews.com www.advintellsyst.com providing an effective way to merge heterogeneous data from different sources. They used this approach to build the ontology of the material comprising five classes (material, structure, properties, processing, and performance) and subclasses. Zhao and Wang et al. [30] provided in their work the design of an ontology following the NanoMine XML data schema and its underlying principles. This ontology represents the processing-structureproperty knowledge about polymer nanocomposites with over 350 parameters. Here, a more general material data vocabulary is integrated, targeting interoperability and a wider field of application. Hakimi et al. [31] applied a text analysis approach to a gold standard biomaterials literature set to identify the key terms for the creation of the Devices, Experimental scaffolds, and Biomaterials (DEB) ontology. The DEB ontology, which represents the field's terminology (and rules), allows unique naming, classifying, and organizing of manufactured biomaterials such as implants and medical devices. Another way of creating ontologies is to extend and re-use existing semantic artifacts, that is, build on already available vocabularies. Vardeman et al. [32] presented an ontology design pattern that facilitates ontological reuse and modeling of material transformations. This allows additional capturing relationships between raw materials, intermediate components, and final products. Li et al. [33] used a phrase-based topic model approach and a formal topical concept analysis on 600 abstracts to find additional concepts and axioms for extending and improving NanoParticle and eNanoMapper, two nanotechnology ontologies. Nikooie et al. [34] presented the tool-based extension of another ontology. The user is guided step by step through an iterative development process. By extending the Materials Design Ontology (MDO) (by 29 concepts and 27 additional axioms), the effectiveness of this phrase-based approach is demonstrated. In the rule-based approach by Zhang et al., [35] ontologies were built on the structure of the open knowledge base YAGO by defining appropriate keywords. YAGO combines facts from Wikidata with a standard ontology schema and a stringmatching algorithm to obtain metallic materials concepts. The concepts were used to extract domain materials knowledge and to generate the Metallic Materials Ontology (MMOY).
For such an approach, it is particularly important to use fundamental vocabularies and defined terminologies, which are rich, abstract, and complete at best.
Ontologies and their associated terminology are ideally created and harmonized in collaborative environments. Here ontology engineers and MSE domain experts can reach an agreement and achieve a shared conceptualization. Moreno et al. [36] illustrated the experts' interaction within this creation process, where the ontology engineer is at the center of the test description process to constructively guide the semantic transformation. Bayerlein et al. [37] highlighted the collaborative environment and its importance for digitizing MSE research. In particular, agile processes concerning data-driven materials research and development were pointed out in the context of the Mat-o-Lab framework. The work of Garabedian and Schreiber et al. [38] describes another example of collaborative work. Here, the domain experts derived a controlled vocabulary describing tribological processes and objects with basic semantics in a MediaWiki-based database. This is the template for the creation of the FAIR tribological experiments ontology (TriboDataFAIR Ontology).
To make data reproducible and reusable on a larger scale, additional information needs to be collected in the form of metadata. Romanos et al. [39] present CHADA, a transferable approach to structure generic material characterization data and metadata. Captured information, for example, about the sample, operator, laboratory conditions, calibration procedures, etc., increases quality and provides specific insights into the experiment and data evaluation.
Ontological representation can also improve automation by facilitating human-machine communication through the creation of processable knowledge structures. In addition, programming language or data structuring standards can be created. Note that already standardized processes are particularly well suited for digital representation since the sequence of individual steps and actions and specific terminology are well-defined and agreed on. [42] The dissemination of ontologies helps to manifest these emerging standards and to create data interoperability in communities according to the FAIR principles.
Furthermore, using well-structured data, interoperable systems will improve knowledge generation from data-driven research, for example, by increasing the effectiveness of machine learning methods. For this, it is necessary to create connections to upper-level ontologies. Although it is not the aim of this study, a trend is observed in the use of specific top-level ontologies in the works reviewed. For example, the most frequently used ontologies by [30,33,34,36,37,40] include the W3C Provenance Ontology (PROV-O), [43] the Basic Formal Ontology (BFO), [7] and the Chemical Entities of Biological Interest (ChEBI). [44] Other ontologies used several times in [8,28,30,32,34,38,40] were DOLCE, [6] EMMO, [45] EXPO, [46] QUDT, [47] and SIO. [48] A necessary future work would be to conduct a detailed survey on relevant ontologies for MSE. Besides the suitability of certain upper-level ontologies, existing material-specific representations could also be highlighted.

Ontology Application
Ontologies and their creation described above provide the basis for the applications described below. Note that in this section we mainly focus on describing work and approaches that go beyond organizing MSE data according to modern standards from W3C. [49] Therefore, the RDF ontology application of the studied works is discussed together with RDF instance creation and information retrieval in the next section.
In total, ontology-based applications were observed in 14 of the selected papers. [28][29][30][32][33][34][35]37,38,40,41,[50][51][52] These applications include the reuse, combination, and extension of existing ontologies and semantic resources (Figure 3b,i,ii). An example is given by Li et al. [33] demonstrating the extension of two nanotechnology ontologies with new concepts and axioms in a two-step procedure. The tool-based extension of the MDO was shown by Nikooie et al., [34] where the user is supported and guided throughout the process.
An additional important application is the mapping or alignment of one ontology to another (Figure 3b,iii). An et al. [50] www.advancedsciencenews.com www.advintellsyst.com illustrated this by introducing the two-component system OTMapOnto. In this process, terms in different ontologies are identified and mapped with each other using ontology embedding and an optimal transport approach. The approach was applied in the MSE domain bringing improvements such as gain in precision and recall. Great potential for the application of ontologies also lies in knowledge extraction (Figure 3b,iv) and discovery through natural language processing (NLP) techniques, such as named entity recognition. These methods particularly benefit from detailed and rich semantic vocabularies. In this context, Greenberg et al. [51] presented the "automatic" linked data ontology application HIVE-4-MAT (Helping Interdisciplinary Vocabulary Engineering for Materials discovery). The application demonstrates the power of the combined use of ontological resources with NLP by extracting knowledge and rules from unstructured text in a structured way.
The possible applications for ontologies and well-founded vocabularies are manifold. In this respect, the development of ontology infrastructures for MSE is essential for the next generation of approaches involving the convergence between MSE and SWT, bringing benefits such as the description of materials science knowledge, interoperability on heterogeneous data resources, and knowledge extraction with NLP.
Further important criteria for the application and reuse of (MSE-relevant) ontologies are precise and domain-appropriate term definitions, user-friendly documentation, referenceability via persistent IRIs, and ongoing maintenance and curation of the ontology. Midlevel ontologies are the most likely candidates to succeed in applying these criteria. A prominent example of applying these criteria is the PMD Core Ontology (PMDco) ( 26 https://github.com/materialdigital/core-ontology). This midlevel ontology is maintained and curated by the Platform MaterialDigital (PMD), based on continuous MSE community interactions that serve as a semantic intermediate layer and amplifier for future application ontologies.

RDF Ontology Application, Instances Creation, and Information Retrieval
The motivation for RDF instance creation is to retrieve information from heterogeneous sources, for example, tabular or nonstructured data, having the benefits of an RDF triple store and SPARQL queries (Figure 3c).
The main application of ontologies concerns FAIR data management and the creation of RDF instances. In this context, Bayerlein et al. [37] emphasized the crucial role of ontologies for data and metadata annotation and structuring. In their work, data and contextual information on aluminum alloys for high-temperature applications were semantically organized and transferred to RDF triple stores. The authors used for this purpose-specific MSE domain ontologies. For such methods and experiments, the rate of data incorporation is greatly increased. Furthermore, data is more consistent and complete.
One method applied to other areas is described by Zhao and Qian. [29] Here, an ontology is not developed by MSE domain experts, as in the previously presented work, but is based on the conversion of a schema of a relational material database by a set of rules. This ontology enables the conversion and integration of further heterogeneous databases by mapping their data to created individuals. Zhao and Wang et al. [30] also aimed at providing a unified and well-structured data representation. Using the developed NanoMine schema and ontology, they curated 182 articles to share polymer nanocomposite material data. With this work, they intended to advance the development of new materials. Therefore, the extraction of knowledge from semantic relations defined in a directed and labeled RDF graph, understood as sets of triples, is introduced. This shows that SWT allow researchers and larger research teams to efficiently obtain valuable, complete, and traceable data. RDF information retrieval tasks also enable the querying of certain information that cannot be easily found due to the complexity of data's non-explicit semantic relations. In this regard, in the work of Li et al., [33] data from a material database is mapped into RDF using the MDO. The query functionality with SPARQL was demonstrated here based on its terminology. Zhang et al. [35] showed the creation and representation of RDF triples with the MMOY they developed. A prototype gave users access to the linked knowledge structure about metallic materials, their fields of applications, their properties, and other information. Another example of the potential of information retrieval is provided in the work of Sadigh et al. [41] In this work, the authors created an ontology for the experiments workflow of wire electrical discharge machining (WEDM). Thus, they aimed the engineering vision of selecting the most suitable machine tool based on manufacturing criteria. Based on this ontology, a platform enables the storage and analysis of information about machine parts and machine tools. This can provide results of capable machine tools to produce desired parts with multiscale features.
Regarding FAIR (research) data management, TriboDataFAIR Ontology was designed and deployed by Garabedian and Schreiber et al. [38] with a specific focus on data interoperability and reusability. In this work, a knowledge graph based on collected experimental tribological data and metadata is created in a scalable environment, enabling the targeted retrieval of information.
We conclude this section by referring to other related works that present the wide potential of ontology application, RDF instance creation, and RDF information retrieval, for example, for domain knowledge representation and FAIR data management. [31,36,39,52]

Provenance
The objective of tracking data provenance (also referred to as "data lineage") is to know where the data comes from and where it was modified (Figure 3d). Data provenance is used to find errors within data and to attribute them to the origin. This way, it helps to assess authenticity, reproduce MSE experiments, and facilitate data reuse. Provenance aims to add another layer of credibility and quality to the data and is the subject of 13 of the selected works which aim to trace the data with the overall goal of improving MSE experiments and their reproducibility. In this context, Romanos et al. [39] presented an approach to organize provenance by describing ontologies that facilitate the traceability of the data. Zhao and Wang et al. [30] specified with their method NanoMine a Semantic Extract, Transform, and Load (SETL) script, in which the data, together with the ontology, enriches the knowledge graph with entity definitions, indicating the provenance and the source of curation of the knowledge. The work from Sadigh et al. [41] provided an ontology for a wire electrical discharge machining (WEDM) experimental workflow, which allowed for obtaining data provenance of MSE experiments. The ontology includes the definition of materials, machines, and tools represented by classes, relationships, axioms, and constraints. They also provided a use case showing the provenance of WEDM experiments. Li et al. [40] developed the materials design ontology (MDO), where the provenance was given using a design pattern that provides information in the repository of ontologies, used together with entities from PROV-O. [43] MSE-related data provenance was approached by Bayerlein et al., [37] Garabedian and Schreiber et al., [38] and Nikooie et al., [34] using the concept of FAIR data, which implies the ability to provide the provenance of the data. Vardeman et al. [32] introduced a work where an ontology design pattern was created for modeling material transformation. The work offered a model of relationships between products, resources, and catalysts in the transformation process and the spatial and temporal constraints necessary for a transformation to occur. The authors used this model for reasoning and provenance. They presented a use case from the sustainable construction area that leveraged the material transformation pattern in combination with the preexisting semantic trajectory ontology design pattern. Moreno et al. [36] provided an approach for data interoperability using ontologies explicitly developed to be applied to MSE tests. They obtained the provenance of experiments with the use of ontologies also to address interoperability, providing the possibility to be traced back, furthermore allowing to visualize relevant information.
Using RDF knowledge graphs to organize MSE data brings benefits, such as data provenance, which allows tracking the information even on heterogeneous datasets. The provenance of data plays an essential role in reproducing experiments and data, which, for instance, helps when applying FAIR data principles.

Ontology Reasoning
Ontology reasoning is applied to infer logical conclusions showing new relations between the concepts using asserted facts or axioms defined in the ontology (Figure 3e). It can also be understood, from a higher perspective, as deriving new axioms from a linked data structure to create a new ontology, as Zhang et al. [35] did to build the ontology MMOY from the YAGO knowledge base. Zhao and Wang et al. [30] provided an ontological framework called NanoMine, where the role of the ontology and knowledge graph is to encode concept relationships that provide bindings of equation variables to specific properties. Thus, "utilizing this resource with coded inferences, existing knowledge, and inferred knowledge can be automatically incorporated rather than manually curated into the knowledge graph". This automation is also part of the aim of Sadigh et al. [41] when they introduce a semantic reasoning process. This process creates appropriate relations between the desired characteristics of a part of a manufactured product and the corresponding machine tools to produce it. The system runs this reasoning using embedded semantic rules in the designed ontology model. These semantic rules, also known as axioms in the context of SWT, are explicitly shown by Vardeman et al. [32] The work proposed an ontology design pattern for material transformation, and the potential of this pattern is better revealed when formalizing the related axioms using description logic. All the papers discussed in this section coincide in their "ontology reasoning" perspective with the conclusions of Lambrix et al. [52] Here it is described as "work to be done" using reasoning "to debug and complete different resources, leading to higher-quality resources" and in the process of querying distributed databases. In summary, it is observed that ontological reasoning is perceived as a tool for improvement and refinement to exploit the full potential of SWT. It could be paraphrased as ontological reasoning is not a basic but an advanced task of SWT. This, together with the fact that the maturity of SWT implementation in the MSE is limited, explains the relatively low amount of publications on the subject.

Challenges and Perspectives
In the previous sections, the state-of-the-art of SWT implementations, differentiated by specific categories, was highlighted for various MSE use cases. While different promising approaches and solutions could be identified, still many open challenges remain in dealing with and accessing long-established SWT standards, tools, and approaches. In the following, prominent challenges are explained and discussed.
Surveyed work approaches in the field of ontology creation vary widely. Key terms and relations of specific MSE subdomains are extracted manually, semiautomatically, as well as fully automatically from multiple sources, such as text bodies, database schemata and ontological resources. At best, developed ontologies are validated by consensus in interdisciplinary collaborative environments between MSE experts and ontology engineers. In this process, it is particularly challenging to establish a fundamental understanding of how ontology creation works and which are its best practices. Depending on the target application, the quality of the created ontologies depends on various factors, such as expressiveness, richness, completeness, and degree of abstraction. For example, quality-controlled ontologies improve the results of semantic searches. Or they already contain the metadata structure, which is of increased relevance, especially in the context of reproducibility and consequently reusability of MSE experiment and simulation data. Furthermore, qualitative statements can be made about the repeatability of experiments, data completeness, and data reliability.
In the evaluation of the reviewed works, we found that essential tools for various tasks are either missing or are not sufficiently known. Especially in the creation, collection, and extension of terminology databases, collaboratively used tools could potentially enable a coordinated, standardized, internationalized, and cross-domain effort. Graphical tools, such as MatVis (Available at https://github.com/Mat-O-Lab/MatVis/) and Ontopanel, [53] can help MSE domain experts to become familiar with the process of ontology creation more easily while fostering interdisciplinary communication. Critical for the reuse and further development of existing ontologies are their findability, availability, documentation, formatting, and ultimately their sustainable maintenance and curation. An ontology with high reuse potential is, for example, the QUDT, [47] which can be used not only, but especially for quantity and unit standards. The use of standardized top-level ontologies, such as the BFO [7] or even the PROV-O, [43] could be observed in several works. Nevertheless, a uniform connection and aggregation via an MSE domain ontology and as a result, crosssubdomain interoperability has not yet been achieved. The resulting benefits of linked data as a basis for advanced techniques, such as machine learning, are still far from being exploited at this stage. For example, the performance of machine learning algorithms increases with the provision of coherent and wellannotated training data, in particular for small-to-medium-sized data sets. Likewise, the NOMAD Artificial Intelligence (AI) Toolkit, [54] which offers interactive AI-based analysis of materials science data from the NOMAD archive, benefits from wellannotated data. Curated repositories have a critical role to fulfill in publishing and organizing ontologies consistently so that, in the long run, the diversity of data can be addressed in a workable way.
The central topic of several papers is the annotation of data and metadata (including provenance) using ontologies. This involves creating RDF instances and making them findable and available in triple stores using URIs and IRIs. Figure 4 illustrates how, in the context of FAIR research data management, for example, unstructured data and metadata from MSE experiments and simulations are migrated through ontologies into machine-processable RDF triples with graph functionality to enable information retrieval and knowledge extraction. This becomes particularly important concerning methods with high-data throughput. It will also be exciting to see how this practice is applied to the processing and analysis of image data from imaging techniques, which were not covered in the selected papers.
The retrieval of semantically structured data bundled in triple stores is mainly performed by the query language SPARQL. Challenging here is the reliable access to data, consistent ontology structure, and stable SPARQL endpoints. The difficulty of formulating SPARQL queries must also be taken care of by tools with appropriate user interfaces, such as Sparklis, [55] leveraging the potential promise of SWT. Another approach is the query formulation in natural language to facilitate the RDF data access for the broad MSE community. [56,57] In the long run, systematic modeling based on recurrent upper-level ontologies will induce self-similar patterns and unify queries through recurrent property paths.
A more challenging technique, although only really addressed in a few of the papers surveyed, is reasoning. Here, only perspectives are given on the quality-enhancing potential for completion, refinement, and consistency checks for resources such as ontologies. The small number of publications on this topic signals the existing complexity and low level of maturity of SWT implementation in MSE as well.
In summary, by focusing on a selection of papers from the MSE-SWT intersection, we have already been able to identify research and technology gaps. This provides the opportunity to derive future research topics and directions. We also observed many isolated approaches and ideas. Fueled by the existing complexity of the ever-evolving SWT, on the one hand, and the creative application of domain experts in the MSE, on the other hand, this range of technological implementations has been a major factor. Nevertheless, certain intersecting themes are already crystallizing. For example, combinations of specific SW modules in the form of tool chains can be observed concisely, which could become established with an increasing adoption in the MSE community.
Beyond the selected works studied in detail, there are other areas in which SWT can be applied that require detailed analysis. One such area is, for example, the enhancement of existing . Schematic representation of the beneficial application of SWT in MSE. The combination of SW standards enables the comparison of FAIR data sets from different sources with multiple formats. For this purpose, unstructured (meta-)data from MSE experiments and simulations are transformed into a machine-processable, unified format in triples using ontologies. The RDF triples form the basis for: a) Knowledge extraction, where new and/or complex relationships are derived to optimize experiments and procedures. b) Information retrieval, i.e., mainly comparison and correlation with data sets according to desired criteria. c) Machine learning models that become more effective using coherent and interoperable data sets. The data provenance increases reliability, as the data can be verified at every stage. Overall, data-driven materials research benefits from SWT applications, for instance in semiconductor optimization.
www.advancedsciencenews.com www.advintellsyst.com material databases. [12] These databases themselves are highly valuable resources of versatile information. Nevertheless, there is a lack of unified access to and interoperability between these heterogeneous structures. A solution in this area is provided by Andersen et al. [58] with OPTIMADE. This API allows data to be retrieved across different databases using a uniform URL path. Material databases (https://www.optimade.org/providersdashboard/) that support OPTIMADE API include Materials Project (https://materialsproject.org/), AFLOW (https://www. aflowlib.org/), COD (http://www.crystallography.net/cod/), TCOD (http://www.crystallography.net/tcod/), NOMAD (https://cms. nomad-lab.eu/services/repo-arch), to name a few. Another important topic, already discussed in Section 3.4, and essential for data reproducibility and reliability, involves tracking and providing complete knowledge of data provenance. In this context, Merkys et al. [59] demonstrated how prospective materials data and corresponding databases can be automatically enriched with provenance information. The simulation results generated with an ontology-based crystal database are refined and made available with AiiDA's provenance tracker (https://aiida.readthedocs.io/projects/aiida-core/en/ v1.0.0b1/concepts/provenance.html). Ghiringhelli et al. [60] highlight the major necessity of a (meta)data mapping, which has to meet the requirements of complex (meta)data structures. Prospectively, ontologies are seen as the key to interoperability. The adaptation of the FAIR principles, supported by the use of SWT, also has considerable advantages for material databases. In detail, however, it is necessary to evaluate to what extent existing databases will benefit, the effort involved, and in the end, what impact it will have on the user.
Fundamentally, user-friendly ways must be established to ensure the proper application of cutting-edge technological solutions for MSE research and its advancement. This can be done via predefined data and metadata input templates in electronic lab notebooks, emulated by guidance with scalable examples, and especially in the context of teaching, in educational institutions and universities, create an early understanding of, for example, the RDF data model and ontologies.

Conclusion and Future Work
With this work, we provide a systematic review of publications from various MSE fields that use and integrate SWT to establish beneficial environments for FAIR research data management, data analysis, and data publication. For this purpose, we identified 19 relevant publications through a search strategy and addressed four research questions (see Section 2). We answered the first question, RQ1, by listing and discussing all surveyed articles. We answer RQ2 by providing a list of all applied SW methods and respective SW resources. For RQ3, in Sections 3 and 4, we discuss the impact of creating and applying ontologies in the context of FAIR data, which is the basic framework for data-driven materials research approaches in MSE. We conclude that ontological knowledge generally provides a comprehensible presentation of key terminologies and relations enabling result classification, where enrichment with contextual information makes experiments and simulations reproducible and therefore increases the reuse of the resulting, more complete data sets. We could identify the following specific challenges and perspectives at the MSE-SWT intersection: Competency Building: Creating a fundamental understanding and competencies for data management and computation in MSE education and teaching, for example, using best practice solutions as educational materials demonstrating benefits.
Reusability Increase: Capturing MSE metadata necessary for reproducibility of experiments and simulations covered by ontologies to ensure reliable, interoperable data for data-driven research.
Technology Use Facilitation: Fostering the development of userfriendly, collaborative ontology and vocabulary tools that enable, for example, the creation and consensus-building process.
Ontology Reuse: Increasing the reuse of quality ontologies by organizing them in a comprehensible and sustainable way in recognized MSE repositories. Quality-enhancing criteria include the use of domain-appropriate, precise definitions of terms, ease of use, and continuous maintenance and curation.
Common Community Standards: Aiming for agreement of the MSE community on well-defined standards, for example, a limited number of upper-level ontologies that are compatible with each other, to achieve subdomain interoperability.
SWT-Driven FAIR Data: Facilitating the generation of FAIR data by more user-friendly and robust SWT, implementing MSE use cases also with more advanced techniques such as reasoning.
We aimed to highlight the intersection between SWT and MSE for experts in both fields. The SWT experts are informed which technologies are used for which purpose and the MSE community is provided with an introductory overview of standard SWT implementations and their benefits and added value. However, the limited number of studies found also shows that this interdisciplinary area is still in its beginning. Technological implementations are mainly used for Big and FAIR data tasks, with ontology creation and its use for data and metadata structuring standing out. At the same time, it is clear from this work that challenges still exist and prevent exploiting the full potential of SWT. This is demonstrated, for example, by a lack of generic tools for converting tabular data into RDF knowledge graphs due to the heterogeneity of the data and other peculiarities of the MSE discipline.
SWT have the potential to become a key driver for data-driven materials science by making materials data available, discoverable, interoperable, and, eventually, reusable. Driven by existing and new project initiatives, we foresee increased development and use of ontologies, workflows, and databases in the future. In summary, with this work, we showcase the application of SWT, identify challenges, and thus signal for future MSE work on this intersection to be oriented and aligned.
in the working group meetings. The authors would like to thank Rukeia El-Athman, Pedro Dolabella Portella, Robert Maaß, and Tilmann Hickel for their particularly valuable comments and discussions. [Correction added on August 21, 2023, after first online publication: Projekt DEAL funding statement has been added.] Open Access funding enabled and organized by Projekt DEAL.