A framework for connecting two interoperability universes: OGC Web Feature Services and Linked Data

Diverse studies have shown that about 80% of all available data are related to a spatial location. Most of these geospatial data are available as structured and semi‐structured datasets, and often use distinct data models, are encoded using ad‐hoc vocabularies, and sometimes are being published in non‐standard formats. Hence, these data are isolated within silos and cannot be shared and integrated across organizations and communities. Spatial Data Infrastructures (SDIs) have emerged and contributed to significantly enhance data discovery and accessibility based on OGC (Open Geospatial Consortium) Web services. However, finding, accessing, and using data disseminated through SDIs are still difficult for non‐expert users. Overcoming the current geospatial data challenges involves adopting the best practices to expose, share, and integrate data on the Web, that is, Linked Data. In this article, we have developed a framework for generating, enriching, and exploiting geospatial Linked Data from multiple and heterogeneous geospatial data sources. This proposal allows connecting two interoperability universes (SDIs, more specifically Web Feature Services, WFS, and Semantic Web technologies), which is evaluated through a study case in the (geo)biodiversity domain.

The main contribution of this article is a framework for generating, enriching, and exploiting geospatial Linked Data from multiple and heterogeneous data sources. This proposal allows a connection between two interoperability universes, that is, SDIs and Semantic Web technologies, offering a promising approach to link SDIs, and more specifically Web Feature Services (WFS), to mainstream information technologies, to enable further areas of SDI application, and to facilitate the reuse and connection of multiple and heterogeneous geospatial information. Herein, we describe the process of generation, enrichment, and exploitation, illustrating them with a case study in the (geo)biodiversity domain. Furthermore, an important innovation of this work is that we consider geometries, which are the key component for connecting geospatial data, allowing that they can be retrieved and interlinked on an unprecedented level in the context of these two universes of interoperability (SDIs and Linked Data).
The remainder of this article is structured as follows. We start by providing a description of a case study, which is the application context of our proposal, in Section 2. In Section 3, we present a brief overview of the WFS and geospatial Linked Data, and related work. Our framework for generating, enriching, and exploiting Linked Data from geospatial data sources is described in Section 4, detailed using the proposed case study. Section 5 provides a discussion. Finally, we summarize some conclusions and identify future work in Section 6.

| A C A S E S TUDY WITH B I OD IVER S IT Y DATA SOURCE S
Global biodiversity is changing at an unprecedented rate as a complex response to several human-induced changes in the global environment. The magnitude of this change is so large and so strongly linked to ecosystem processes and society's use of natural resources that biodiversity change is now considered an important global change in its own right (Sala et al., 2000).
According to the Convention on Biological Diversity (https://www.cbd.int/), Colombia is listed as one of the world's "megadiverse" countries, hosting close to 10% of the planet's biodiversity. Worldwide, it ranks first in bird and orchid species diversity and second in plants, butterflies, freshwater fishes, and amphibians. With 314 types of ecosystems, Colombia possesses a rich complexity of ecological, climatic, and biological components. This varied richness represents a significant challenge for implementing sustainable development initiatives.
In recent years, the biodiversity informatics community has made tremendous strides by creating shared common vocabularies such as the Darwin Core (DwC) (Wieczorek et al., 2012) and publishing mechanisms such as the Integrated Publishing Toolkit (IPT) (Robertson et al., 2014). However, data published in this kind of system are stored in an isolated way for each dataset, which limits its analysis in an integrated way with context information. Hence, coordinating, integrating, and performing effective use of the vast amount of environmental data, which are generated from many different providers and across research domains, remain challenging (Stucky et al., 2014;Walls et al., 2014).
Driven by this scenario, we present a case study where heterogeneous (geo)biodiversity data sources are semantically integrated, enriched, and exploited in order to connect data from different domains and communities. These data sources are created and maintained by diverse national and international institutions associated with the biodiversity domain, such as the Alexander von Humboldt Research Institute (https://www.humboldt. org.co/en), the Institute of Hydrology, Meteorology and Environmental Studies (IDEAM; https://www.ideam. gov.co), the Global Biodiversity Information Facility (GBIF; https://www.gbif.org), the Colombian Biodiversity Information System (Sistema de Información sobre Biodiversidad de Colombia, SIB Colombia), and the European Nature Information System (EUNIS; https://eunis.eea.europa.eu/). This case study focuses on (geo)biodiversity information of the Cundinamarca Department, which was selected for its diversity of ecosystems and because it has a large amount of data, widely gathered.
The heterogeneity of these data sources arises from its diversity of subdisciplines (e.g. ecosystems/community, marine/freshwater/terrestrial, and plant/animal/microbial biodiversity). In addition, adjacent disciplines in earth and life science, as well as relevant disciplines in the social sciences and humanities, have their own terminologies, specialized measurements, data models, and formats that generate heterogeneity (Reichman, Jones, & Schildhauer, 2011). Therefore, the main challenges of this case study are related to how to deal with such data sources in order to prevent data isolation and enable cross-dataset analysis.
In order to address these challenges, spatial data fusion techniques play a crucial role in fostering an integrated view of distributed spatial data sources on the Web. In addition, flexibility and interoperability are key factors in such data integration. Hence, the use of standards is a basic requirement (Wiemann & Bernard, 2016). In this sense, progress on Linked Data, ontologies, and geospatial semantics (Kuhn, 2005) are considered in this study case, due to the fact that they provide the decisive glue between models, data, and users (Janowicz & Hitzler, 2012). Moreover, the advantages of Linked Data technologies are recognized not only in biodiversity research and its related disciplines (e.g. Madin, Bowers, Schildhauer, & Jones, 2008;Roderic & Page, 2008;Deans, Yoder, & Balhoff, 2012;Parr, Guralnick, Cellinese, & Page, 2012), but also throughout the life sciences (Stevens, 2002;Blake & Bult, 2006;Good & Wilkinson, 2006;Antezana, Kuiper, & Mironov, 2009;Chen, Yu, & Chen, 2013).
Thus, we take different "traditional" (geo)biodiversity datasets and OGC Web Services (OWS) and transform them into machine-processable data. After that, we link these data with other sources of the Web of Data and OWS, using geometrical information related to (geo)biodiversity features for performing spatial comparisons between them. We also enrich existing biodiversity Linked Data with OWS, concretely with WFS such that, after transforming, linking, and enriching the (geo)biodiversity data, we exploit these data, setting connections between two interoperability universes, that is, SDIs and geospatial Linked Data.
In short, this work presents a process for generating, enriching, and exploiting (geo)biodiversity Linked Data from multiple and heterogeneous (national and international) data sources using our developed framework.
Furthermore, a connection between SDIs and Semantic Web technologies is established to facilitate the reuse and connections of contained information within these dissociated universes.

| BACKG ROUND AND REL ATED WORK
In this section, we present a brief introduction to WFS and geospatial Linked Data. Also, we provide a description of the main proposals related to our work.

| WFS
WFS (https://www.opengeospatial.org/standards/wfs; Sinha, 2008) is an interface specified by the OGC (https:// www.opengeospatial.org/) that allows the exchange of geographic data across the Web. It defines the rules for requesting and retrieving geographic information using HTTP. The interface describes the data manipulation operations on geographic features. Extensible Markup Language (XML)-based Geographic Markup Language (GML) is used for exchange of information. It should be noted that WFS supports the vector data model.
Within this kind of service, a feature may contain one or many geometries, optionally with attribute values.
The standard operations of WFS are based on the GetCapabilities, DescribeFeatureType, and GetFeature requests (Vretanos, 2010).
The GetCapabilities operation generates a service metadata document describing the WFS provided by a server. This document is an XML one composed of the following sections: • Service. This section provides information about the service itself. Therefore, it contains service metadata, such as name, title, abstract, keyword, etc.
• Capabilities. This section describes the operations that are supported by a specific service. According to the WFS specification (Vretanos, 2010), this service must support, at least, GetCapabilities, DescribeFeatureType, and GetFeature operations.
• Feature type. This section contains a list of feature types that the WFS can return. Besides, it defines transactions and query elements that are supported by each feature type.
Listing 1 shows an example of how a GetCapabilities request can be sent to the WFS server.
LISTING 1 An example of GetCapabilities request https://www.example.com/wfs?SERVICE=WFS&REQUEST=GetCapabilities The DescribeFeatureType operation returns a schema description of feature types offered by an instance of the WFS. All features are composed of a set of properties, which are described in an XML schema. This schema description can be obtained through the DescribeFeatureType operation, as shown in Listing 2.
LISTING 2 An example of DescribeFeatureType request https://www.example.com/wfs?SERVICE=WFS&VERSION=1.0.0& REQUEST=DescribeFeatureType&TYPENAME=FEATURE_ID&SRSNAME=EPSG:4326 The GetFeature operation returns a selection of features from a data store. WFS processes a GetFeature request and returns a response document to the client that contains zero or more feature instances that satisfy the query expressions specified in the request. In this operation, the client asks for one or more features in a specific WFS encoding version, as shown in Listing 3. Besides, this operation has the capability to perform queries using their own attributes or parameters, such as bounding box, type name, feature ID, etc. Finally, the client receives a document, which can be encoded in different formats (e.g. KVP [key/value pair], XML, or GML), that contains the feature and its attribute information.

| Geospatial Linked Data
The World Wide Web Consortium (W3C) has been working on a new generation of standards with capabilities for semantic expression and data integration to make the Semantic Web a reality. This Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation (Berners-Lee, Hendler, & Lassila, 2001). In this context, Linked Data has been identified as one of the best practices of the Semantic Web for creating a shared information space through interlinking single, contextspecific data fragments within a distributed environment (Bizer, 2009;Auer, Lehmann, Ngomo, & Zaveri, 2013).
Since the representation and publication of geospatial data as Linked Data has only recently been addressed, we provide some background on the Linked Data initiative. The principles of Linked Data were first outlined by Berners-Lee (2006) using the following four guidelines: (1) use of URIs as names for things; (2) use of HTTP URIs so that people can look up those names; (3) when someone looks up a URI, provide useful information, using standards, such as the Resource Description Framework (RDF) and SPARQL Query Language for RDF (SPARQL); and (4) include links to other URIs, so that they can discover more things.
In these aforementioned guidelines, two key elements stand out in the Linked Data scenario: RDF and SPARQL.
On the one hand, RDF (https://www.w3.org/TR/rdf11-concepts/) is the standard knowledge representation language for the Semantic Web, an evolution of the World Wide Web that aims to provide a well-founded infrastructure for publishing, sharing, and querying structured data on the Semantic Web (McDonald & Levine-Clark, 2017).
RDF provides a framework for representing information on the Web, of which syntax (a data model) has two key data structures: (1) RDF graphs are sets of subject-predicate-object triples, where the elements may be URIs, blank nodes, or datatyped literals -these graphs are used to express descriptions of resources, among other things personal information, social networks, metadata about digital artifacts, and so on, as well as providing a means of integration over disparate sources of information; and (2) RDF datasets are used to organize collections of RDF graphs, and comprise a default graph and zero or more named graphs.
On the other hand, SPARQL (https://www.w3.org/TR/sparql11.query/) is a semantic query language able to retrieve and manipulate data stored in RDF. This language can be used to express queries across diverse data sources, whether the data are stored natively as RDF or viewed as RDF via middleware. Besides, SPARQL contains capabilities for querying required and optional graph patterns, along with their conjunctions and disjunctions.
SPARQL also supports aggregation, subqueries, negation, creating values by expressions, extensible value testing, and constraining queries by source RDF graph. The results of SPARQL queries can be result sets or RDF graphs.
In the geospatial context, the transformation and publication of geographical data as Linked Data was pioneered by initiatives from GeoNames (https://www.geonames.org/ontology/documentation.html), OpenStreetMap (Auer, Lehmann, & Hellmann, 2009), and Ordnance Survey (Goodwin, Dolbear, & Hart, 2009). After these initiatives, an increasing number of geospatial datasets have been published in the Linked Data cloud (https://lodcloud.net/). Currently, Basic Geo Vocabulary (https://www.w3.org/2003/01/geo/) (WGS84 lat/long) is one of the top five vocabularies by usage in datasets (https://stats.lod2.eu/stats) in the context of the Web of Data among the existing vocabularies for describing geospatial Linked Data. This is a basic vocabulary that provides a namespace for representing lat(itude), long(itude), and other information about spatially located things, using WGS84 as a reference datum. Listing 4 shows an example using the WGS84 vocabulary.
OGC has defined an approach for representing and querying geospatial data on the Semantic Web. This effort, called GeoSPARQL (Perry & Herring, 2012), defines a vocabulary for representing geospatial data in RDF, and an extension to the SPARQL query language for processing geospatial data. In addition, GeoSPARQL is designed to accommodate systems based on qualitative spatial reasoning and systems based on quantitative spatial computations. This proposal allows distinct kinds of geometry (e.g. polygons, lines, points, multipoints, etc.) to be described, adding multiple coordinate reference systems, and bringing to the Linked Data cloud the possibility of expressing spatial relations for querying geographic datasets (e.g. intersects, touches, overlaps, etc.). Listing 5 shows an example of a geometry expressed according to GeoSPARQL. Geometries are defined by the class Geometry and the coordinates can be encoded in an RDF literal of type Well-Known-Text (WKT), which corresponds to Herring (2010), using a single RDF property, namely asWKT.
GeoSPARQL also offers the possibility of using GML to encode geometries. In this case, the data type (GMLLiteral), property (asGML), and the URL for the geometry type have to be changed accordingly. Further details about GeoSPARQL are described in Perry and Herring (2012).
The geospatial Linked Data is stored and managed by triple stores, also known as RDF store or knowledge bases. According to Battle and Kolas (2012), they are capable of better handling several types of problems which relational databases struggle with, or are not intended to perform: queries with many joins across entities (Weiss, Karras, & Bernstein, 2008), queries with variable properties (Weiss et al., 2008), and ontological inference on

| Related work
Several works have contributed to provide geospatial context to the Web of Data. Thus, different proposals appear in the state-of-the-art associated with geospatial reference data (Varanka, 2008 In this context, different proposals have focused concretely on the transformation process from geospatial data to RDF. Schade and Cox (2010) present a proposal for transforming GML to RDF and for extending data types offered through WFS to other types, such as GML, RDF, and HTML. Usery and Varanka (2012) describe a conversion from vector and raster data to RDF. The authors encode vector data in GML format, by means of pre-computing of topological relations, and then convert them into RDF triples. Another proposal is described in van den Brink, Janssen, Quak, Stoter, and Kadaster (2014), where the authors present a semi-automatic transformation from geographic information models and GML data to RDF data.
Likewise, different tools have appeared in order to make the transformation of "traditional" geospatial data (shapefiles or geospatial databases) into RDF a little easier. Oracle spatial, etc.). This tool is based on Geometry2RDF and uses the WGS84 vocabulary and several geometric types of GeoSPARQL. Kyzirakos, Vlachopoulos, Savva, Manegold, and Koubarakis (2014) describe GeoTriples, a tool that allows the transformation of geospatial data stored in spatially enabled relational databases and raw files. This tool is implemented as an extension to the D2RQ platform (https://d2rq.org/) and uses GeoSPARQL and stSPARQL as the target vocabulary. The main limitation of these tools is that they deal only with "traditional" geospatial data (shapefiles, GML, and geospatial databases) and do not support OGC Web services.
On the other hand, there are some efforts where interactions between SDIs and the Semantic Web have been collected. In Lutz and Klien (2006), the authors present an approach to ontology-based geographic information retrieval that contributes to solving problems related to semantic heterogeneity using a graphical user interface (GUI) and a well-known domain vocabulary. Roman and Klien (2007)

| A FR AME WORK FOR G ENER ATING , ENRI CHING , AND E XPLOITING LINKED DATA FROM G EOS PATIAL DATA SOURCE S
In this section, we present the core of our approach for the generation, enrichment, and exploitation of geospatial information using Linked Data principles in combination with SDIs. We have developed a framework, called GeoLOD (https://github.com/jasaavedra/GeoLOD), that gives support to this approach, and which is summarized graphically in Figure 1. Our framework consists of the following five main components: (1) generation, which performs a transformation from geospatial data to RDF; (2) linking, which sets connections with (geospatial) Linked Data and WFS; (3) enrichment, which enables us to enrich Linked Data with attributes collected from WFS; (4) repository, which stores the obtained results after applying our framework components; and (5) exploitation, which allows the display and querying of geospatial data using Linked Data principles in combination with SDIs.
In order to guide the application of our framework, two scenarios have been defined: generation and connection, and exploitation (see Figure 1). These scenarios consider access restrictions and needs related to data. The first scenario uses four components (generation, linking, enrichment, and repository) of our framework. On the other hand, the second scenario of our proposal is associated with exploitation, which is implemented in our framework through visualization and data query.
Next, we briefly describe these components, which are flexible and were tested with a case study in the (geo) biodiversity domain, as a running example, using the workflow shown in Figure 1 and data sources described in Section 2.

| Generation and connection
In this scenario, transformation, linking, and enrichment components are performed. The applicability of the different components in the workflow depends on two aspects: (1) original data may be transformed and published without restrictions, or (2) there exist some data access limitations, but partial transformation of some relevant attributes may be performed. Next, we provide details associated with the components of this scenario.

| Transforming geospatial data to RDF
A transformation process of geospatial data from Geographic Information Systems (GIS) and SDIs, concretely shapefiles and WFS, to RDF is performed by this component of our framework. We opted for the use of RDF as the normal form for the geospatial datasets to be published, since we want to harmonize different formats of our datasets (databases, shapefiles, Web services, etc.), to avoid using proprietary formats, and because we are pursuing a Linked Data approach. As described in Section 3, RDF is one of the standard languages in which information has to be made available, according to the Linked Data principles. The reason for this is that it offers several advantages, such as the provision of an extensible schema, de-referenceable URIs, and as RDF links are typed, safe merging (linking) of different datasets (Omitola et al., 2010).
In this transformation process, we recommend exploiting all the advantages of the Linked Data through a complete transformation from original geospatial data to RDF, when there is complete access to data. 3 Also, our approach proposes two different alternatives, described in Sections 4.1.2 and 4.1.3, when data access is limited.
We recommend performing this transformation process by using, reusing, or developing ontologies, although Linked Data can be generated with or without the use of a specific vocabulary, 4 since just transforming data to F I G U R E 1 A framework for generating, enriching, and exploiting geospatial Linked Data [Colour figure can be viewed at wileyonlinelibrary.com] RDF does not incorporate any semantics, as pointed out by Jain, Hitzler, Yeh, Verma, and Sheth (2010). Moreover, adding ontologies in this process allows making explicit the meaning of concepts in the datasets used, creating a harmonized model for the considered datasets (using common and shared vocabularies) and making it easier to search and access geospatial information (Vilches-Blázquez et al., 2014).
In order to perform the transformation process, our framework contains two elements, called SHP2GeoSPARQL and WFS2GeoSPARQL, which allow transforming shapefiles and WFS into RDF according to GeoSPARQL vocabulary (Perry & Herring, 2012). These tools are based on: (1)  optional capabilities, such as: reprojection of geometries into another spatial reference system, set spatial relations between different geographical features, and calculating centroids (only for polygons). These operations are carried out thanks to the integrated GeoTools library and according to user specifications for the source and target operation. An example of the input and output of this transformation process using WFS2GeoSPARQL is shown in Figure 2.
After RDF generation, the system stores them into our repository, an RDF triple store (see Figure 1), where outputs can be queried through a SPARQL Endpoint. In the context of our study case, we have deployed Parliament (Kolas, Emmons, & Dean, 2009), since we have found that it offers an excellent tradeoff between load and query performance and besides the support of the capabilities that GeoSPARQL provides (Battle & Kolas, 2012). This triple store is available at https://ec2-54-94-208-47.sa-east-1.compute.amazonaws.com:8080/parliament/. However, as mentioned previously, there exist other RDF triple stores that support GeoSPARQL, such as Apache Marmotta, OpenLink Virtuoso, Strabon, and USeekM.

| Linking data from geospatial data sources
The fourth Linked Data principle is: "Include links to other URIs, so that they can discover more things." Thus, an increasing number of thematic datasets are published as RDF graphs and linked to other datasets by identifying equivalent resources in other datasets (Feliachi et al., 2013). It brings to light that the value of data and its utility increase when it is more interrelated with other data (Heath & Bizer, 2011). The result of this interlinking process is often a list of owl:sameAs links between entities of each dataset. These relationships can be discovered using several tools that provide technological support, such as SILK (https:/silkframework.org) or LIMES (https://aksw.org/ Projects/LIMES.html), which have also started to include some geospatial metrics in the links discovery process.
Therefore, we have taken advantage of the benefits of SILK to provide additional advantages to our framework in the linking process. This tool is used for discovering links between generated RDF from geospatial data and other data of the Web of Data.
Thus, in the context of our study case, we have used SILK for interconnecting our generated RDF previously with information about species, habitats, and sites from EUNIS (https://eunis.eea.europa.eu/about) and DBpedia (https://wiki.dbpedia.org/). For that, we have configured SILK using different similarity metrics (such as Jaro, ja-roWinkler, Levenshtein, and qGrams) and, based on the "scientific name" of species, this discovery process of links is performed. An example of the setting links between both RDF data sources is shown in Listing 7. It is important to notice that SILK focuses just on RDF data, sets owl:sameAs links, and deals with different spatial distances (centroid distance and minimum distance between two geometries) and relations (contains, intersects, touches, crosses, etc.). However, within the geospatial information domain, many features are represented by complex geometries and are collected in different types of geospatial resources (formats), for instance, WFS.
Our framework takes into account these issues by allowing the establishment of connections between two interoperability universes, such as (geospatial) Linked Data and SDIs.
SDIs often maintain directories of public geospatial Web services built from Web services listed in their registries and play the role of discovery node (Lopez-Pellicer, Renteria-Agualimpia, Nogueras-Iso, ZarazagaSoria, & Moreover, the WFS2LD_Connector allows us to return a selection of features from selected WFS, including geometrical and attribute values through the GetFeature operation. If the provided URL cannot be invoked successfully (e.g. the service is not available anymore), our system cannot continue without further human interven- tion. An instance of the GetFeature operation is shown in Listing 10, filtered by typeName and FeatureID, 5 and an excerpt of the output associated with this GetFeature operation is shown in Listing 11. After that, our framework performs a links discovery process, using WFS2LD_Connector, between gathered data from the WFS and a dataset selected from the LOD cloud. 6 For that, after obtaining the data of the WFS, we have to configure our framework to also gather data of the aforementioned cloud. Thus, our framework requires connection parameters to the SPARQL Endpoint of the selected dataset from the LOD cloud and setting a (Geo) SPARQL query in order to collect existing features and geometries within this dataset. An example of a (Geo) SPARQL query is shown in Listing 12. Once we have collected both geospatial datasets (WFS and RDF), the WFS2LD_Connector performs the links discovery based on a spatial analysis process, which allows setting relations (e.g. contain, intersect, touch, etc.)

LISTING 12
between both datasets. Currently, WFS2LD_Connector needs that geometries are serialized as WKT to carry out this comparison. The output of this process is an RDF file where spatial matching is recorded according to the GeoSPARQL vocabulary. Furthermore, geospatial features of the WFS used are also added to this new RDF file, where pointers to different GetFeature requests are included as URIs. In this way, our system allows us to maintain a connection with the original data (WFS) and obtain these data in GML format, which is very useful for a "traditional" use of this information, for instance, in GIS.
Following the previous example, we performed a spatial comparison between the previously generated RDF data (associated with biodiversity registers) and WFS from the Alexander von Humboldt Research Institute, gathering information on areas of bird conservation in Colombia. In this case, our framework identifies what biodiversity registers are contained within a specific area for bird conservation. When a result of this spatial comparison is set, our framework defines explicitly a simple features topological relation (Perry & Herring, 2012) using, concretely, "Within." Furthermore, geospatial features of the WFS used are also added to this new RDF file, adding GetFeature requests that are included as URIs. Listing 13 shows an example of the output associated with this process.
Finally, the outputs (RDF files) are stored in a new graph into the RDF triple store associated with our framework (see "Repository" in Figure 1), where results can be queried.

| Enriching Linked Data with WFS
Any data source often contains implicit references to other data types, and geographic data sources are no exception. These references are more relevant when we deal with expert domain data sources. However, it is often difficult for non-expert users to understand and use these hidden data (Tandy et al., 2017). In this sense, according to Lehmann et al. (2015), the aim of the spatial enrichment process is to retrieve such information and make it explicit.
Our framework carries out a materialization of this viewpoint by means of gathering new data from available WFS, and then these data are used to enrich existing Linked Data (following the workflow presented in Figure 1). In order to achieve this goal, the workflow relies on WFS2LD_Connector to perform the aforementioned spatial analysis process. In the context of our study case, we have invoked the following WFS: https://geoapps.ideam.gov.co:8080/geoserver/Clima/wfs?, which was developed by IDEAM (https://www.ideam.gov.co/ geoservicios-institucionales) and contains Lang weather sections ; we have also used the previously generated RDF data.
When spatial relations between resources of both data sources (WFS and RDF data) are established, the workflow uses the WFS2LD_Richer component of our framework, which identifies common attributes within both data sources (exact matching of attribute names) and collects geospatial attributes that are different from the WFS through GetFeature requests. The mismatched attributes can be completely transformed to RDF or they can be shown to an end user, who selects those attributes that are transformed to RDF, since there may be many attributes from WFS that are not suitable for enriching the original Linked Data. These new (  Finally, following the aforementioned workflow (see Figure 1), the obtained outputs (RDF files) are stored in a new graph in the RDF triple store associated with our framework, where results can be queried. Likewise, when RDF data are managed by third parties, we recommend publishing these outputs in the same Linked Data repository where collected.

| Exploitation
Data are ready to be exploited once they have been transformed, linked, and enriched (see Figure 1). The final goal of connecting and opening these data is that users can use semantics tools and SDI technologies in a coordinated way to search, analyze, visualize, or absorb programmatically the whole data available. Therefore, applications deployment on top of the generated (geo)biodiversity Linked Data is required to exploit these data and provide rich GUIs to users (Vilches-Blázquez et al., 2014). There are several tools available for exploiting geospatial Linked Data through graphical interfaces such as LOD4WFS (https://github.com/jimjonesbr/iod4wfs) (Jones et al., 2014), Sextant (https://sextant.di.uoa.gr/) , Map4RDF (https://oeg-upm.github.io/map4rdf/) (Llaves, Corcho, & Fernandez-Carrera, 2014), LinkedGeoData Browser (browser.inkedgeodata.org), OsOpenSpace (https://www.ordnancesurvey.co.uk/business-and-government/products/os-openspace/), or Mappify (https:// github.com/GeoKnow/Mappify) (Lehmann et al., 2015). However, it is important to notice that most of these tools create map views and transform Linked Data into map services. Therefore, these services cannot be queried using the SPARQL protocol directly through the mentioned tools.
In this scenario, our system reuses LOD4WFS Adapter, a service that listens to WFS requests and converts these into SPARQL. After the SPARQL query is processed, the LOD4WFS Adapter receives the RDF result set from a triple store, encodes it as a WFS XML document, and returns it to the client (e.g. GIS). Based on this tool, we perform an enhancement in the query processing, since our framework allows setting GeoSPARQL queries on an RDF triple store and showing an updating of the results on WFS automatically. This is another way in which our framework allows setting connections between two interoperability universes (geospatial Linked Data and SDIs).
In order to facilitate access to this connection, our framework has a Web-based application based on LOD4WFS Adapter, Leaflet (https://leafletjs.com/), and Apache Jena libraries. This application allows us to display data published as OGC Web services, concretely WFS, and query Linked Data sources using GeoSPARQL. These GeoSPARQL queries allow the generation of "dynamics" maps, since each interaction with data (query) displays an updated map related to the obtained results of each query.
Moreover, the deployed architecture associated with our framework allows geospatial queries to be performed on data. For instance, the SPARQL query in Listing 15 would get species within the municipality of Funza, 7 with labels associated with their scientific names. Nevertheless, the true potential of our proposal is noticeable when we carry out integration of the published datasets and, hence, connection between both interoperability universes. As an example, a user may combine meteorological (WFS) and biodiversity (RDF) data and exploit the geospatial (GeoSPARQL) component associated with them. This way, the user may retrieve all species located in the study area (i.e. Cundinamarca Department) with their scientific name, their related description within EUNIS (dataset in Linked Data), their association with different climate classification (WFS) present in the study area, and their spatial location (see the SPARQL query in Listing 16 and the obtained results in Figure 4). The integration of datasets and interoperability universes (WFS and Linked Data), supported by the semantics provided by the links, is one of the main advantages of our proposal. This makes it easier to address problems associated with global biodiversity, which require a cross-disciplinary approach, and allows unlocking data silos facilitating their reuse across organizations and communities.

| D ISCUSS I ON
The need to achieve geospatial data available on the Web in an appropriate way (i.e. [geospatial] resources identified using HTTP URIs, published in such a way that they are indexable by search engines, and connected, or linked, to other resources) is acknowledged (Tandy et al., 2017). However, the combination of SDIs and Linked Data, which can facilitate the reuse and connection of multiple and heterogeneous geospatial information, is still a somewhat isolated approach. In this sense, our work proposes a framework to overcome the current limitations and increase the connection between SDIs and Linked Data.
In terms of generation and connection, the current state of tools for transforming and connecting OGC Web services, and especially WFS, to Linked Data is quite limited. In this sense, our framework provides elements for invoking WFS and transforming them to RDF; however, our proposal is only limited to WFS and does not consider other OGC Web services, although it can serve as a starting point for new efforts. On the other hand, we take into account the key role of geometrical information in order to connect WFS and Linked Data. This allows our framework to discover links between both interoperability universes (WFS and Linked Data, and vice versa) based on spatial relations. This way, the connection is performed by setting relations (e.g. contain, intersect, touch, etc.) between both datasets. In this context, our approach focuses on the GeoSPARQL vocabulary for querying and processing geospatial data and, hence, it is not prepared for handling RDF data according to the WGS84 vocabulary, which is one of the top five vocabularies for describing geospatial Linked Data.
With respect to enrichment issues, our framework allows enriching of geospatial Linked Data by means of collecting new data from WFS. In order to perform this process, our proposal needs to discard common attributes between both data sources; for that, the framework uses an exact matching of attribute names. However, taking into account different, codified and not, sufficiently descriptive attribute names is required to consider different string distance metrics to enhance the identification of common attributes between sources.
Regarding the exploitation component, our framework provides an element that allows setting GeoSPARQL queries on an RDF triple store. The results of each query are displayed on the WFS automatically, which allows the creation of "dynamics" maps in each interaction with data. Moreover, the SPARQL Endpoint of the published triple store (our repository) allows the retrieval and analysis of resource relations in a standardized manner.
Finally, this work provides different benefits because of the connection between two interoperability universes. On the one hand, data silos are unlocked and users may interact with structured, non-proprietary, and harmonized data. On the other hand, "traditional" geospatial data, WFS, and (geo)biodiversity Linked Data are semantically integrated as a unified view; therefore, users may take potential advantage of integration for problems that require connecting SDI services and Linked Data. Besides, these results (data and developed tools) are freely available for the (geospatial) community. In short, these benefits help to facilitate reusing this information, allowing the public to use these data in new and innovative ways, improving the application of geospatial data, and increasing the geo(biodiversity) knowledge available to society (Vilches-Blázquez et al., 2014). However, additional challenges remain in this work for trying to bridge SDI and Linked Data, such as supporting other OGC Web services (e.g. Web coverage service, catalogue services, etc.), performing linking and enrichment processes using queries across diverse data sources (distributed queries) in the Linked Data cloud, and testing our proposal deeply with large volumes of verbose (geometrical) data.

| CON CLUS I ON S AND FUTURE WORK
This article presents a framework for generating, linking, enriching, and exploiting geospatial Linked Data from multiple and heterogeneous geospatial data sources. The proposal has allowed us to connect two interoperability universes (SDIs and Semantic Web technologies). This is achieved using open standards from both the geospatial (WFS) and Semantic Web (ontologies and RDF data) domains.
Moreover, the work in this article has linked various datasets in the context of a study case in the (geo)biodiversity domain. In this context, we deal with geometrical information from WFS and RDF data, since it stands as the key component for connecting geospatial data, allowing that they can be retrieved and interlinked on an unprecedented level in the context of SDIs and Linked Data.
The presented implementation of our framework represents an interesting option aimed to guide the transition from SDIs to best practice for publishing spatial data on the Web (Tandy et al., 2017), that is, the adoption of Linked Data principles in the geospatial data world. We also consider that it may help non-expert users to overcome current problems related to finding, accessing, and using data disseminated through SDIs, and to achieve the goal of finding geospatial data available on the Web in an appropriate way.
Future work will focus on supporting other OGC Web services in order to contribute to the aforementioned transition. We will also test our proposal with large volumes of verbose (geometrical) data and will continue publishing geospatial Linked Data for other related themes and providers (e.g. environmental indicators, climate change, etc.). Furthermore, we plan to improve the exploitation component of our framework by developing hybrid applications capable of combining semantic reasoning, machine learning, and traditional geospatial analysis.

ACK N OWLED G M ENTS
This work was partially sponsored by Colfuturo and Ministerio de Tecnologías de la Información y las Comunicaciones de Colombia. Additionally, we are grateful to Victor Saquicela for his constructive comments and suggestions.

N OTE S
1 Structured data usually reside in relational databases (RDBMS). Such data are eminently searchable, both with human-generated queries and via algorithms using type of data and field names, such as alphabetical or numeric, currency or date. Source: https://bit.ly/2rdzwi9 2 Semi-structured data maintain internal tags and markings that identify separate data elements, enabling information grouping and hierarchies. Both documents and databases can be semi-structured. Source: https://bit.ly/2rdzwi9 3 For instance, we have created shapefiles and/or WFS associated with a project, or we can download or access these kinds of files or services without restrictions.
4 Note that throughout this article we use the terms vocabulary and ontology without distinction.
5 typeName and FeatureID are two parameters (attributes) contained within selected WFS, which can be used for filtering and recovering specific data of this service through the GetFeature operation. 6 Datasets that have been published in the Linked Data format (https://lod-cloud.net/).

7
Funza is a municipality of the Department of Cundinamarca in Colombia. This municipality is located in the Bogotá Savanna, the southwestern part of the Altiplano Cundiboyacense.