Systematic review of question answering over knowledge bases

Over the years, a growing number of semantic data repositories havebeen made available on the web. However, this has created new challenges in exploiting these resources efficiently. Querying services require knowledge beyond the typical user’s expertise, which is a critical issue in adopting semantic information solutions. Several proposals to overcome this difficulty have suggested using question answering (QA) systems to provide user ‐ friendly interfaces and allow natural language use. Because question answering over knowledge bases (KBQAs) is a very active research topic, a comprehensive view of the field is essential. The purpose of this study was to conduct a systematic review of methods and systems for KBQAs to identify their main advantages and limitations. The inclusion criteria rationale was English full ‐ text articles published since 2015 on methods and systems for KBQAs. Sixty ‐ six articles were reviewed to describe their underlying reference architectures.


| INTRODUCTION
Question answering (QA) refers to systems that allow users to use natural language (NL) interfaces to ask questions and receive concise answers.The first solution was created in the 1960s to answer questions asked in English about baseball games from information saved in a list-structured database [1].Some years later, with the emergence of the relational data model, considerable effort was put into developing natural language interfaces for databases (NLIDB).However, just five years after the creation of the World Wide Web, Androutsopoulos et al. reported the lack of interest in investigating NLIDB [2].In those days, the focus was on information retrieval techniques to create web search engines using the keyword-based search paradigm.Meanwhile, QA over text advanced [3], and the Semantic Web (SW) vision formulated in 2001 by Berners-Lee et al. [4] brought attention to semantic data.
Search engines have come to offer direct answers to some user questions in recent years [5].Instead of just presenting a list of links to documents where the answer is likely to be found, the idea is to satisfy the need for information without further searching and navigation.Questions whose answer is an entity are the ideal candidate for this type of approach, and using large semantic databases that capture general knowledge has become of great value.In this context, triples extraction to answer questions is priceless and continues to motivate research work.
Since the formulation of SW principles and standards, semantic representations to capture knowledge in the life sciences have become common practice [6].As a result, there was an explosion in the creation of ontologies and Resource Description Framework (RDF) databases made available online.The usual access to this data has been through visual navigation interfaces and commonly through SPARQL Protocol and RDF Query Language (SPARQL) endpoints, but these access methods have problems.The first approach is not rich enough to answer more complex questions, and the second is not suitable for users who have not mastered the use of formal querying languages.Therefore, this reality is pressing for new question answering over knowledge base (KBQA) solutions for biomedical data.
A couple of examples from the life sciences illustrate the use of these systems.Asiaee et al. applied a KBQA solution to parasite immunology [7], and Hamon et al. created a querying platform for linked biomedical data [8].Other KBQA systems retrieve information from open knowledge databases, such as DBpedia or Wikidata, or use proprietary enterprise knowledge graphs, such as Google Knowledge Graph or Bing Satori [9].
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.wileyonlinelibrary.com/journal/sfw2 - 1     We have identified the publication of surveys and overviews exclusively or partially about KBQA since 2015.Höffner et al. [10] surveyed 72 papers published until the middle of 2015 and mapped against a set of challenges.By the depth of its analysis and selection of articles, this study is a reference work to understand the contributions that had emerged until that moment.Other contributions, such as those of Mishra and Jain [11], Dimitrakis et al. [12], Affolter et al. [13], and Ojokoh and Adebisi [14], are narrative reviews, meaning that they do not use systematic search methods.Unlike the latter studies, our review follows a systematic search strategy that provides a greater guarantee that essential contributions have not gone unnoticed.We also focus on the presentation of KBQA reference architectures, which none of the previous works addresses.Our attention to architectural decomposition allows us to highlight state-of-the-art practices for particular subtasks.
The remainder of this paper comprises four sections.Section 2 overviews the current aspects of KBQA.In Section 3, we introduce the methods for retrieving and selecting the articles surveyed and present the quantitative results of that selection.In Section 4, we critically summarize our findings.We wind up the paper with the conclusions in Section 5.

| BACKGROUND
The purpose of QA systems is to allow users to create questions in NL without a formal query language.Considering the nature of the data sources, we can divide these solutions into three major groups: QA over unstructured data (e.g., text), QA over semi-structured data (e.g., tables), and QA over structured data (e.g.graph data sets).Hybrid systems operate in two or more kinds of data sources.Regarding the scope of data, on the one hand, we can consider domain-specific solutions when the data schema is narrowed down to a particular body of knowledge (e. g., biomedical data) that limits the question types that are accepted.On the other hand, open-domain systems consider data on generic subjects specified by general ontologies.
RDF is the data model for the SW, established in a suite of normative specifications by the World Wide Web Consortium [15].An RDF triple (or statement) has three components: the subject, which is an Internationalized Resource Identifier (IRI) or a blank node (bnode); the predicate, which is an IRI; and the object, which is an IRI, a literal, or a bnode [16].A set of RDF triples is an RDF graph, and an RDF data set is a collection of RDF graphs.Several RDF serialization formats are available, such as Turtle, TriG, and JSON-LD (JavaScript Object Notation for Linked Data).
We realize the real power of the triples when considering large data sets.An RDF store (or triplestore) is a proper database for the storage and retrieval of triples.For quick reference, we can list some well-known solutions: OpenLink Virtuoso (https://virtuoso.openlinksw.com/),Eclipse RDF4J (formerly Sesame) (http://rdf4j.org/),Apache Jena (https:// jena.apache.org/),and GraphDB (http://graphdb.ontotext.com/).By default, we use SPARQL [17] for querying RDF graphs considering one of four query forms.The SELECT construction returns variables and their bindings directly, and the CONSTRUCT clause returns a single RDF graph.The ASK statement returns a boolean indicating whether a query pattern matches, and a DESCRIBE query returns an RDF graph describing the resources found [18].
Several benchmarks and evaluation campaigns have promoted the advancement of KBQA systems.The Question Answering on Linked Data (QALD) challenge launched in 2011 is the oldest running campaign, and its ninth edition provided a training data set with 408 questions in 11 different languages for the open-domain semantic QA over DBpedia task [19].To shorten the size limitations of the QALD data set, the Large-Scale Complex Question Answering Dataset (LC-QuAD) provides 30,000 questions with corresponding SPARQL queries for DBpedia and Wikidata [20].Free917 is another benchmark and consists of 917 utterances using Freebase with a meaning representation in a variant of lambda calculus [21].To avoid using logical forms, Berant et al. [22] created the WebQuestions data set containing 5810 questionanswer pairs, to which Yih et al. [23] added SPARQL queries to create WebQuestionsSP.Then, Bordes et al. [24] achieved, with SimpleQuestions, a significant scale-up of the numbers with 108,442 questions for possible rephrasing in the form (subject, relationship, ?).Finally, the BioASQ series of challenges has a task on domain-specific semantic QA on biomedical data to evaluate systems outputting relevant triples and text snippets [25].
To meet the challenges posed in implementing KBQA solutions, it is important to identify the most common architectures.From the analysis of the papers we selected for our work, we found that they are classified by four different architectures.Semantic parsing pipelines are solutions based on semantic parsing, which uses a pipe and filter style where data flows to generate a formal query from the original input in NL.It is the most straightforward architectural style of KBQA systems and relies on connecting components to form a pipeline, as shown in Figure 1.
The idea is to apply several data transformations from the question in NL to the logical form or formal query.To achieve that, we use natural language processing (NLP) techniques such as tokenization, named-entity recognition (NER), part-ofspeech (POS) tagging, dependency parsing, and entity linking (EL).In addition, steps for query generation and answer generation are required.
An alternative way of using semantic parsing is based on the observation that executing a formal query is equivalent to finding a subgraph, as depicted in Figure 2.
Systems capable of answering complex questions (e.g., questions that cannot be reduced to a simple triple pattern) require a higher degree of sophistication than the systems presented so far.A template is a query skeleton with an arbitrary degree of complexity that fits the knowledge base (KB) to be questioned and has slots that must be filled with information from entities and relations.Naturally, the quality of the system depends on the effort put into creating the templates.In the early stages, the use of templates was mainly accomplished through manual annotations; in more recent times, automatic approaches have emerged.These systems rely on the manual or automatic creation of a template database assuming an architectural configuration such as that shown in Figure 3.
In the offline phase, it is necessary to create templates.This involves considering pairs of questions and answers used to obtain successively more abstract representations that are used to generate pairs of question-query templates after alignment.The online phase is straightforward: a question is matched with a template to produce a query template, the slots are filled with entities and relations, and the answer is provided by issuing the query candidate.
End-to-end solutions perform sequence-to-sequence translation or apply methods to extract triples directly from the KB.The selection of the final answer is based on the representations of the questions in NL obtained by applying machine learning techniques, as can be seen in Figure 4.After extracting the candidate answers from the KB, they are evaluated against a predefined score using a specialized function.
Höffner et al. [10] highlighted significant challenges faced by semantic QA.The lexical gap occurs when the surface forms used in a question are different from those used in the KB.The ambiguity stemming from the fact that the same word can represent various entities is also problematic.Another significant problem is finding answers to questions manoeuvring several units combined in complex queries requiring ordered, aggregated, or filtered outputs.Equally challenging is multilingualism, which concerns two distinct realities that may or may not co-occur.The first involves the problem of using the same interface to ask questions in several NLs, and the second has to do with the possibility of the KB data being multilingual.In addition, systems relying on languages other than English end up receiving far less attention from the scientific community, limiting the number of available solutions.For instance, very few developers have participated in challenges like QALD with multilingual systems.Some systems try to prevent difficulties by using controlled natural languages (CNLs), which are constructions that restrict in some way the lexicon, syntax, or semantics of the NL from which they start.This review will not focus on multilingualism or the use of CNL interfaces.

| PAPER SELECTION
Computer science and software engineering can benefit from using systematic literature reviews to synthesize the best evidence about the state of the art, especially for mature topics for which a large number of studies may not be appropriately acknowledged [26].In this work, we followed a strict methodology.To enable repeatability, we have created a replication package that is available at https://osf.io/hxyvw.We have used PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [27] to report the protocol's execution and present the findings.

| Search methodology
The following research questions guided our study: RQ1 Which methods of KBQA have been proposed?RQ2 What are the solved and unsolved challenges?Going through past surveys and overviews [10][11][12][13][14], we collected the keywords shown in Table 1 mapped against the Population, Intervention, Comparison, Outcomes (PICO) structure [28].As pointed out previously, Höffner et al. [10] presented a systematic study of KBQA systems until mid-2015.
We have built upon that.Then we used Scopus, Web of Science, IEEE Xplore, and the ACM Digital Library to find KBQA papers.We emphasize that our research questions and search string allow us to retrieve all papers reviewed by Höffner et al. [10], in addition to the most recent ones.Next, we considered the following inclusion criteria: I1 Studies on methods and systems for question answering over knowledge bases.I2 Papers focussing on a specific challenge of KBQA solutions (answering complex questions, module reusability etc.).
For the exclusion criteria, we have E1 Books, surveys and overviews, tutorials, talks, panel sessions, conference reviews, editorials, abstracts, a summary of a workshop or challenge, dissertations, or grey literature.Not available in English.Being unable to retrieve the full text.E2 If faced with multiple papers by the same author about the same subject, we will keep only those needed to report the main contribution.
The main threats to validity are losing relevant studies in the search step and risking the rejection of relevant studies in the review phase.The PICO strategy minimizes the first problem, ensuring high recall.In addition, we have queried four bibliographic databases.To mitigate the second problem, we allocated two people (the first two authors) to select and review papers.We have performed content analysis to collect information about the challenges addressed, proposed solutions, future work, and architectural styles.The fourth author participated as a referee when disagreements arose.

| Selection results
Figure 5 shows the flow diagram of the paper selection procedure.After applying the inclusion and exclusion criteria, we selected the 66 papers listed in Table 2. Figure 6 presents the relations of the keywords of the papers selected for qualitative analysis.Naturally, the term 'question answering' appears in the most articles.It is interesting to note that the 'Semantic Web' entry is very prominent only Figure 7 shows the distribution of the selected articles divided by types of architecture and distributed over years.
As we can see, there is a consistent decline in the use of pipeline-based approaches.On the other hand, after an increase in subgraph matching solutions, we saw a slight drop in 2020.After a boom in 2016, the proposals for information extraction fell to a plateau still higher than the other proposals.Finally, template-based systems fluctuated to an annual maximum of two proposals in 2017 and 2018.

| Semantic parsing pipelines
Hamon et al. [8] obtain answers from linked biomedical data using a four-step method that begins with the linguistic and semantic annotation of the input question.Similarly, a simple pipeline is proposed by Lopez et al. [37] for the QuerioDALI solution.In the first step, the system performs an NER to classify named entities related to the KB domain.At the next level, an EL filter binds a unique identity to each of the entities identified in the previous step.Then, the system uses fusion and ranking of possible answers.The lexical gap is a central issue in solving QA challenges.This refers to the mismatch between the vocabularies: on one hand, the user's vocabulary, and on the other hand, the vocabulary used in the data set of interest.The importance of the problem led to the emergence of several strategies.
Hakimov et al. [29] use a combinatorial categorical grammar with handcrafted lexical items and lambda-type calculus expressions to obtain semantic representations.In this way, the input utterances must comply with the grammar.As is naturally emphasized by the authors, performance improves according to the lexicon size.The same conclusion is reached by Yih et al. [23], which shows that learning from labelled semantic parsers significantly improves overall performance.TR Discover, a solution by Song et al. [34], uses feature-based grammar.First, it parses an NL question to its first-order logic representation, which is, in turn, translated into SPARQL.Dubey et al. [35] also use a logical representation.Users can put queries in English to a target RDF KB.They are first normalized into an intermediary canonical syntactic form, called normalized query structure, and then translated into SPARQL queries.
The direct use of an ontology makes it possible to reduce ambiguity.Ruseti et al. [31] use the structured KB DBpedia and a Wikipedia-based approach to match phrases from the question to entities in the ontology.Different solutions for matching each type of entity were developed, and the most probable interpretation is converted in a SPARQL query.Missing properties or types can also be inferred if they were not matched in the previous step.
The query-generation (QG) process of a QA pipeline occurs after the entity and relation linking subtasks.Zafar et al. [47] start with the set of identified entities and relationships and generate walks on the KB by using the adjacent relations within one-hop distance.Valid walks are the ones containing all the starting entities.Finally, candidate walks are evaluated against the type of question, and a SPARQL query is created.To extend QG to ordinal and filter questions, Abdelkawi et al. [51] add extra constraints to the list of all possible answers.
Within the WDAqua Marie Skłodowska Curie ITN (https://github.com/WDAqua)effort to advance the field of QA, several contributions can be reported related to KBQA.Both et al. [36] start from the realization that QA systems are very complex and usually monolithic to present Qanary (https://github.com/WDAqua/Qanary), a vocabulary-driven methodology to allow decoupling of the different components and thus achieve reconfiguration and reuse.First, the Web Annotation Data Model (https://www.w3.org/TR/ annotation-model/) is used to create a vocabulary covering the common abstractions related to the author's idea of a QA pipeline.In addition, the input and output of filters are described to achieve interoperability, forcing the components to have the same interface, like in a uniform pipe and filter architecture.Considering that no vocabulary can describe all existing components, the burden of creating a new description is naturally pushed to the creators of the components, which can compromise the adoption of this methodology.The problem of adapting the input and output of each module to comply with the shared vocabulary is also burdensome.Diefenbach et al. [45] present a reusable user interface to call the Qanary APIs.To allow user feedback, a way to change the descriptions of the module inputs at runtime was proposed that used timestamps to avoid conflict between Qanary annotations [43].
The idea of creating a generic (pipeline) architecture for QA on linked data to foster cooperation among developers is championed by QAestro (https://github.com/WDAqua/QAestro) [44], a proposal competing with Qanary that can be used to combine building blocks in tailored systems, allowing a semantic description of both QA components and requirements.Several important subtasks are covered, such as tokenization, POS tagging, NER, EL, dependency parsing, triple generation, data mapping, QG, and answer generation.Question type identification, answer type identification, query ranking, and syntactic parsing are also available.
Embracing the quest for component reuse, Frankenstein (https://github.com/WDAqua/Frankenstein)[48] is a platform that collects several core components to solve QA tasks and enable the creation of different QA pipelines, more precisely 380 at the time of the paper's publication.Highlighting the fact that modern QA systems rely on the flexible integration of many specialized filters, Singh et al. suggest that the construction of the pipeline could be considered an optimization problem [50], where each component could be selected from a set of options for NER and EL, relation extraction and query building.The prediction of the best-performing components facing a new NL question is tackled as a supervised learning problem in a training set of labelled questions.
The use of semantic pipelines for KBQA is the oldest and most documented approach in the literature and is preferred by authors who intend to integrate NL interfaces into their systems quickly.Reinforcing this statement is the existence of frameworks that allow decoupling of the different components used to filter the data, thus offering greater customization.It is also the easiest way for those who do not want to invest a great deal to develop more technically elaborate solutions, usually with better performance.We can investigate each filter independently because they are of interest in many other applications, not just in QA.For instance, Shen et al. [56] surveyed EL issues, techniques, and solutions.Nevertheless, this way of solving the problem seems to be reaching its maturity, and in our opinion, more important future developments will almost certainly come from other approaches.

| Subgraph matching
Some proposals depart very little from the classic pipeline, building the query subgraph using a semantic tree, whereas others move away sharply by constructing the subgraph step by step from a starting entity.Hu et al. [46] start by finding the semantic tree, and then after extracting the semantic relations, they build a semantic query graph.More elaborately, Yih et al. [33] propose staged query graph generation, a solution that formulates a query graph by solving a search problem.A general query subgraph is supported by the existing entities in the KB, an existential node not mappable to the KB, and a node for identifying possible aggregation functionality.The solution revolves around creating an inferential chain starting with a root entity node and using legitimate actions to grow a query graph.The first step is to find root candidates by using a lexicon to perform EL over the input query.The next step considers the lexicon again to extract the expected answer.

PEREIRA ET AL.
From relating the root entity and the kind of answer, it is possible to create a set of candidate subgraphs constrained by an aggregation function.Finally, a convolutional neural network is used to select the best candidate.For this last classification task, we can use the proposal by Gauray et al. [52], which considers a self-care mechanism that explores the intrinsic structure of subgraphs.

| Template-based KBQA
Zheng et al. [30] started from an initial set of NL questions and formal queries to propose a technique based on studying the similarity of graphs generated from the utterances and SPARQL queries to match the best candidate pairs to form a database with templates.Savenkov et al. [39] used external text data to explore the central topic of the question and select the best query candidates using a predefined collection of query templates.However, considering a set of manually adjusted templates is necessarily limiting, for instance, when new relations are added to the KB.To overcome this limitation, Abujabal et al. [41] proposed the QUINT system, which automatically learns role-aligned utterance-query templates from user questions paired with their answers.When QUINT answers a question, it visualizes the complete derivation sequence from the NL utterance to the final answer.The derivation explains how the syntactic structure of the question was used to derive the structure of a SPARQL query and how the phrases in the question were used to instantiate different parts of the query.When an answer seems unsatisfactory, the derivation provides valuable insights for reformulating the question.An evolution of QUINT is Never Ending QA (NEQA) [49].NEQA automatically learns templates mapping syntactic structures to semantic ones from a small number of training question-answer pairs.Once deployed, continuous learning is triggered in cases where templates are insufficient.Using a semantic similarity function between questions and a judicious invocation of non-expert user feedback, NEQA learns new templates that capture previously-unseen syntactic structures, gradually extending its template repository.
We note that the literature offers few proposals for this type of system, which strikes us as quite strange because it allows answers to a wide range of questions.Investing in research to create wider lexicons to be used in the production of templates promises the creation of systems with even higher performance regarding complex questions.However, it seems that the research effort is shifting to end-to-end systems, which we will talk about in the next subsection.

| KBQA based on information extraction
Several KBQA solutions using some form of a deep neural network have been reported.Dong et al. [32] introduced a multicolumn convolutional neural network to understand questions from three different aspects, answer path, answer context, and answer type, and learn their distributed representations.Meanwhile, the system enables us to jointly learn low-dimensional embeddings of entities and relations in the KB.This approach can be expanded and enriched if we consider more dimensions to convert into vector representations.Xu et al. [38] present a neural network-based relation extractor to retrieve the candidate answers from Freebase and then infer from Wikipedia to validate these answers.More precisely, the process involves dividing the original question into subquestions by applying a set of syntactic patterns.Then, for each subquestion, EL and relation extraction is performed and refined by a joint inference model.After retrieving a set of candidate answers, the final solution is obtained by inference on Wikipedia, searching on the page of the topic entity for evidence about candidate answers.
The model proposed by Lukovnikov et al. [9] learns to rank subject-predicate pairs to enable the retrieval of relevant facts given a question.The network contains a nested word and character-level question encoder that allows the handling of new and rare words without compromising the exploitation of word-level semantics.This neural network approach generates a single process solution that avoids complex NLP pipeline constructions and error propagation, and it can be retrained or reused for different domains.In scenarios where training data is limited, overfitting compromises network performance.To tackle this problem, instead of using a bidirectional long shortterm memory (LSTM) network to create the language representation model, Lukovnikov et al. [53], Luo et al. [54], and Panchbhai et al. [55] independently evaluated the use of Bidirectional Encoder Representations from Transformers (BERT) [57], the current most performant solution for NL understanding tasks.
Hao et al. [40] present a model to represent the questions and their corresponding scores dynamically according to the various candidate answer aspects via the cross-attention mechanism.In addition, they leverage the global knowledge inside the underlying KB, aiming to integrate this information into the representation of the answers.As a result, it could alleviate the out-of-vocabulary problem, which helps the crossattention model to represent the question more precisely.
Relation detection is essential to extract candidate answer triples.Yu et al. [42] use deep residual bidirectional LSTM networks to compare questions and relation names considering different abstraction hierarchies.This relation detector integrates EL for mutual enhancement, similar to the joint inference feature of Xu et al. [38].
The creation of models to generate vector representations of features of interest from KB allows avoiding the use of semantic pipelines.As there are multiple architectures of deep neural networks and varied ways of digesting the information to be processed, the literature already reports several possibilities, and many more will appear shortly.LSTMs with attention have great room for further development.On the other hand, we noticed that transfer learning using pretrained models is still underrepresented in new system implementations.Finally, the arrival of new and better-performing models allows better results but at computational costs that are not always bearable.

| Challenges (Research Question 2)
Several obstacles have prevented the full adoption of KBQA systems.Table 3 presents a summary of the challenges KBQA has faced.
The preferred technique for solving simple questions is sequence-to-sequence translators using neural networks.An encoder converts the NL question to a vector representation, and then a decoder outputs a query in a formal language.It is also possible to extract features by processing convolutions.There is also a paper reporting using BERT linguistic model, but using a transformer is clearly unsuitable because of the high computational cost.
Research on this topic is marginal, and as such, we are led to believe that it has been successfully solved.
Research on complex questions is richer in the proposals, starting with systems that propose adding support to another set of SPARQL modifiers.More sophisticated techniques such as the generation of templates or the use of subgraphs are also on the agenda.The use of the information extraction approach using some neural model is common practice.We also find hybrid systems that use KB data and free text.This technique is also used to mitigate KB incompleteness.The renewed interest in both topics indicates that these challenges are not closed.Entity and relation linking are unsolved issues, although the joint entity and relation linking approach shows promise.Automatic labelling and distant supervision usually help in obtaining more training data.
In general, almost all papers promise to tune their proposals for better performance.However, two major problems remain open, as presented in Table 4.
Future work to tackle the answers to complex questions revolves around exploring solutions that allow real-time feedback to the system, such as implementing a conversational agent or shifting to reinforcement learning so that new knowledge adds can be continuous.On the other hand, KB incompleteness also limits these systems' usability.Hybrid systems that use free text to address this problem have been explored, but there is still a long way to go.We need more training data and more external knowledge.

| CONCLUSION
This systematic study collected information on methods and challenges of QA over KBs, a topic that has gained traction in the search engine industry.We analysed 66 documents to classify KBQA systems according to their architectural styles.We reported 25 semantic parsing pipeline systems, 12 using subgraph matching, 7 based on templates, and 22 performing information extraction.We look at the challenges ahead and identify some directions for future research.Two primary challenges remain that are particularly sensitive to the success of this technology.On the one hand, it is necessary to answer increasingly complex questions, and on the other hand, we need to deal with the natural incompleteness of KBs.Our

©
2021 The Authors.IET Software published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.IET Soft.2021;1-13.

F I G U R E 1
General architecture for semantic parsing pipelines.The direction of the arrows denotes the direction of the data flows F I G U R E 2 Subgraph matching approach PEREIRA ET AL.
Our search query followed the template ((Population OR Comparison) AND (Intervention OR Outcomes)) F I G U R E 3 General architecture for template-based question answering over knowledge bases F I G U R E 4 General architecture for question answering over knowledge bases based on information extraction

F I G U R E 6
Relationships between keywordsF I G U R E 7 Distribution of papers by year and architecture

T A B L E 3
Question answering over knowledge bases challenges and solutions Slot values for the population, intervention, comparison, and outomes template T A B L E 1 Population 'Knowledge Base*' OR 'Knowledge Graph*' OR 'Semantic Web' OR 'Linked Data*' OR 'RDF Data*' OR 'data web' Intervention Question-Answer* OR 'natural language que*' OR 'Natural Language Interface' Comparison SPARQL OR 'Query Graph*' Outcomes QALD* OR SimpleQuestions OR WebQuestions OR WebQSP OR LC-QuAD Abbreviations: SPARQL, SPARQL Protocol and RDF Query Language; QALD, Question Answering on Linked Data.F I G U R E 5 Study selection the term 'knowledge graph', showing a greater interest in this type of structure, which is very suitable for applying deep learning techniques.