CogNLG: Cognitive graph for KG‐to‐text generation

Knowledge graph (KG) has been fully considered in natural language generation (NLG) tasks. A KG can help models generate controllable text and achieve better performance. However, most existing related approaches still lack explainability and scalability in large‐scale knowledge reasoning. In this work, we propose a novel CogNLG framework for KG‐to‐text generation tasks. Our CogNLG is implemented based on the dual‐process theory in cognitive science. It consists of two systems: one system acts as the analytic system for knowledge extraction, and another is the perceptual system for text generation by using existing knowledge. During text generation, CogNLG provides a visible and explainable reasoning path. Our framework shows excellent performance on all datasets and achieves a BLEU score of 36.7, which increases by 6.7 compared to the best competitor.

recognition (NER) tasks, KGs like dictionaries help the model with weakly supervised learning without domain-specific annotated data (Lison et al., 2020).In NLG tasks, the explicit knowledge in KG effectively guides models to generate controlled, factual text (Koncel-Kedziorski et al., 2019).However, providing too much knowledge in KG-to-text tasks leads to the over-generation problem because that useless knowledge as noise will affect the model generation performance (Fu et al., 2020).Furthermore, it will be expensive to manually select accurate knowledge from a large-scale KG for the model training.
In general, the traditional KG-to-text models still face two main challenges: one is that most models lack explainability, making it difficult to find out where the issues come from when generating inappropriate content; and another challenge is that most existing approaches don't have scalability.When the provided knowledge contains errors or irrelevant information, the model's performance decreases badly.To address the above issues, we propose a cognitive graph framework called CogNLG for KG-to-text tasks insight by the dual-process theory (Evans, 1984;Evans, 2003;Evans, 2008;Sloman, 1996).The theory implies that human cognitive processing is divided into two systems: Systems 1 and 2. System 1 is a perceptual system that is intuitive, unconscious, and fast.The primary function of System 1 is to collect and retrieve intuitionistic information that humans perceive.System 2 is an analytic system for analyzing and reasoning the information provided by System 1.These two systems work together to form human cognition.Inspired by the dual-process theory, our CogNLG framework consists of Systems 1 and 2. Benefiting from the cognitive graph structure, the generating process of CogNLG is explainable, and our approach can filter out the accurate knowledge for the target text, which has scalability.
As shown in Figure 1, the cognitive graph for the KG-to-text task is constructed based on the input entities.Each input entity is initially defined as a source entity node.We add the new extension entity nodes by retrieving the association information of the existing entities through F I G U R E 1 An example of the cognitive graph for KG-to-text generation, the circles in blue are the source entity, the circles in green are the extension entity, and the circle in gold is one of the predicted best nodes.the wiki.Compared to the traditional approaches, which extract the entire KG, the CogNLG System 2 dynamically predicts the best nodes in each position and only extracts the valuable knowledge.According to the concept of text-to-text in the T5 model (Roberts et al., 2019), we use prompt-based templates to convert the best nodes triples to unstructured text.System 1 is a generator that predicts the next token based on the information provided by the input and the best nodes.In this work, we adopt the GPT-2 as System 1, and System 2 is an extractor implemented based on a graph convolutional neural network (GCN).To effect our implantation, our model evaluates two large-scale KG-to-text datasets called ENT-DESC (Cheng et al., 2020) and Person and Animal (Vrandeči c & Krötzsch, 2014).
The contributions of this work are as follows: 1. We propose a novel CogNLG framework for KG-to-text generation tasks based on cognitive science.Moreover, experiment results show that the cognitive graph helps to generate controlled, factual text.
2. We demonstrate that the two-system structure of the cognitive graph provides strong explainability in the process of text generation and scalability in large-scale knowledge reasoning.
3. The performance of our implementation in multiple metrics on the ENT-DESC and the Person and Animal dataset surpasses the state-of-art work.
The structure of this paper is described as follows.Section 2 introduces some related work about KG-to-text tasks.Then the model and method details are presented in Section 3. Section 4presents the experimental results.Finally, the conclusion and the future work are presented in Section 5.

| KG-to-text
A knowledge graph (KG)-to-text generation task is essential in NLG.In traditional NLG tasks, the rule-based systems depend on many handcrafted templates (Belz & Reiter, 2006;Duma & Klein, 2013), which is time-consuming and unscalable.In the deep learning approaches, the Sequential-to-sequential (seq2seq) (Mei et al., 2016;Wiseman et al., 2017) and Variational Autoencoders (VAEs) (Liu et al., 2019;Serban et al., 2017) models' performance have been dramatically improved compared with the rule-based systems.However, deep learning models cannot internalize all knowledge, and this problem also arises in deep pre-trained language models.The KG's primary function is to provide adequate knowledge support for NLP tasks to enhance the logic and model performance.In KG-to-text generation tasks, (Li et al., 2020) implements a KGto-text model through the multi-attention mechanism, which encodes the input knowledge triples through a bidirectional GRU unit.Recently graph neural networks (GNN) have developed rapidly, and there are some works attempts to combine GNN and KG for text generation.(Koncel-Kedziorski et al., 2019) presents a graph transformer to encode graph structure input, and (Beck et al., 2018) proposes a gate graph neural network for the graph-to-text generation.Moreover, some works store knowledge with memory networks to improve performance on tasks like multiple-turn dialog (Madotto et al., 2018;Yang et al., 2019).(Zhu et al., 2020) proposes a fact-aware summarization model to ensure that the content generated by the model conforms to factual logic.The MGCN models (Cheng et al., 2020) adopt multiple graph transformations to obtain the context feature in different scales, which achieve great performance on KG-to-text tasks.(Chen et al., 2020) and (Ji et al., 2020) use PLM models with knowledge injection to generate the content with commonsense.However, most of the existing work still lacks the explainability of generated text and scalable reasoning in large-scale knowledge.

| Cognitive graph
The idea of the cognitive graph is based on cognitive science, and it was first proposed by (Ding et al., 2019) for multi-hop reading comprehension in the field of NLP.And (Dong et al., 2015) proposes the Cognitive Knowledge Graph reasoning framework for one-shot knowledge graph relation reasoning at scale.Both approaches have two systems: Perceptual System and Analytic System, and this is a significant feature of the cognitive graph compared with typical KGs.The two-system structure is inspired by the dual-process theory, and they run at inconsistent speeds.The perceptual system is more intuitive and unconscious, which means faster computing; the analytic system refers to human logical thinking, and it is more complex, rational, and slower.For complex problems, the two systems cooperate effectively to improve the performance in natural language understanding (NLU), and this approach is also known as fast and slow thinking (Kahneman, 2011).And the study of (Rastogi et al., 2020) has shown that fast and slow thinking can effectively assist Artificial Intelligence (AI) decision-making.Another feature of the cognitive graph is that it supports reasoning in the large-scale dataset.A cognitive graph framework requires scalability as the human brain can quickly retrieve information from large amounts of knowledge.

| METHODS
In this section, we describe the implementation of the CogNLG framework in detail.Based on the dual-process theory (Evans, 1984;Evans, 2003;Evans, 2008;Sloman, 1996), the human content creation process can be divided into two systems.One system is used for subjective language expression, and it is based on the human brain's accumulated prior linguistic knowledge.The other system retrieves relevant clues in real-time to support language generation.The NLU ability in NLG tasks is also critical to model performance.The NLG task can be treated as the reverse work of the reasoning tasks.
The analytic system selects the most appropriate information from the existing knowledge to assist the perceptual system.(Ji et al., 2020) adopts GPT-2 and GCN (Vashishth et al., 2020) to build a two-system text generation model and achieved significant performance in commonsense text generation.In this work, we show that the cognitive graph is also effective for KG-to-text tasks.We gain extension knowledge triples associated with the input through Wikipedia API and sift the best information with the analytic system in real-time.Our implementation can provide inference paths in large-scale data and reduce the cost of manually sifting knowledge triple.
Inspired by the dual-process theory, the CogNLG framework mainly consists of two systems called generator system(S 1 ) and extractor system (S 2 ).S 1 plays the role of a perceptual system and requires a large amount of prior linguistic knowledge.Therefore, a pre-trained model is necessary because it is trained intensively on large-scale corpora.S 2 is an analytic system.In S 2 , we construct a cognitive graph to collect the supporting knowledge for text generation.Then we use a GNN model to update the hidden state of each node dynamically and iteratively in the cognitive graph and predict the best nodes to support S 1 .
The overall model structure is illustrated in Figure 2. We adopt the GPT-2 as S 1 and a GCN model as S 2 .The structure and functionality of each system are described in detail below.

| Generator system
The core task of S 1 is text generation.We adopt a pre-trained decoder model as the based model of S 1 .GPT-2 is a decoder model for text generation, which is widely adopted as a pre-trained model in various NLG tasks (Chen et al., 2020;Ji et al., 2020).The state-of-art GPT-3 (Brown et al., 2020) as a large language model (LLM) contains 175 billion parameters that cannot be trained on typical workstations.Besides, the issues of lacking explainability and generating inappropriate content also occur in GPT-3.GPT-2 has a much smaller number of parameters than GPT-3.In this task, our experimental results demonstrate that our approach can successfully address the above issues with GPT-2.
The overall model structure of the CogNLG framework.
The model needs accurate external knowledge support in generating sequence to generate the text that conforms to the factual logic.Unlike other knowledge-based approaches, which introduce static external knowledge in each iteration, our S 1 dynamically takes the filtered external knowledge from the Extractor System in each position.The input of S 1 is divided into three parts: the Extension (E E ), the Condition (E C ), and the Target (E T ).Furthermore, the input format can be described as the following equation: where E E ¼ e 1 , …, e N f gis the concatenation of the best nodes summary, and the best nodes are predicted by S 2 dynamically based on the current output content semantic in S 1 and each node's hidden representation in S 2 ; is the ground truth target sequence, and the < SEP > represents the special separator token.
Like common sequence generation tasks, the tokenizer encodes the input tokens and maps them to a high-dimensional vector through an embedding layer.In this task, the GPT-2 only outputs the hidden states within the E T index range.The output hidden states in position i can be denoted as , where H is the dimension size of the hidden features.The hidden states T S i also represent the S 2 semantic feature input H sem in position i and be used for the next token prediction in evaluation.

| Extractor system
External knowledge plays a critical role in the S 1 generation performance.Therefore, it is essential to select the most relevant nodes based on the current semantic feature of S 1 and the hidden representations of S 2 .In particular, the external support knowledge should be closely related to the current generated content topic.Otherwise, the external knowledge would affect model performance as noise.
We construct a cognitive graph G ¼ < V, E > based on the input entities.There are two types of nodes in G: Source Entity (SE) node and Extension Entity (EE) node.SE nodes are derived from the input entities; EE nodes are derived from the links of each parent node.G initially consists of multiple SE nodes, then denote a layer depth variable d.The EE nodes for each layer are then generated based on the links of the parent node.Algorithm 1 describes the construction procedure in detail.Each node v i V contains the node name, links, and summary.The links array contains the related child nodes, and it is obtained by retrieving the node name from the wiki.The summary represents the text computed from the triple of the node and its parent.Previous work has demonstrated that the prompt-based input effectively improves the model's understanding of semantic information.We collect all the relation types from the datasets and design several templates according to their part of speech.As shown in Figure 3, the triples can be converted into summaries by the templates.
The initial hidden state of G is denoted as H G ℝ HÂH , where H is the max size of G.In our approach, we set H equal to the dimension size of the hidden features in S 1 .For each node, the hidden feature is computed by S 1 : where the k is the index of k-th node, and the input T Node is the concatenation embeddings of node name and summary.The Trm of S 1 is a transformer decoder with ℕ layers, the k-th node initial hidden feature H k G is the last position ℒ output in the last layer of the decoder.To predict the best nodes based on the context in each position, we first combine the semantic feature of each node with the relationship between neighboring nodes by adopting a variant GCN model.The hidden feature H G is updated iteratively, and the new hidden state H 0 G of one backpropagation step is computed as: where σ is the activation function, A is the adjacent matrix of G, which represents the relationship between nodes, D is a diagonal matrix and where H fusion is the fusion feature of H G and H sem , W fusion ℝ 2HÂH is a learnable weight matrix.Finally W cls ℝ HÂ2 maps H fusion to H-size two dimensional vectors, then we compute the indices of max value and choose the nodes with index one as the best nodes.

| Training implementation
A significant challenge in the training of the CogNLG framework is how to determine the best nodes for the current sequence.Most common related datasets contain only source input entities and ground truth target text.Besides, manually labeling the best nodes of a dataset is costly.
This section proposes a general unsupervised best node labeling approach based on BLEU score and a pre-trained text similarity model to implement best node labeling on KG-to-text datasets.
For each input of the training set, let Y ¼ ½w 1 , …, w n be the ground truth target text of input, where w is the word of Y, n is the length of Y.
We split Y into subsets y 1 , …, y m f gbased on pauses (commas, periods, semicolons, and some conjunctions).In the meantime, we construct the input's cognitive graph G.And the best nodes in y i are computed as: The examples of triple-to-text conversion, where the text in red @s ð Þ represents the subject, the text in green @o ð Þ represents the object, and the text in blue @r ð Þ represents the relation.
ALGORITHM 1 Cognitive graph construction where the Sim function is implemented based on a BERT (Devlin et al., 2019) pre-trained model with a binary-class fully connected layer.We use the summation score of BLEU and Sim function to compute the correlation e j between v j and y i .Since the difference between the correlation score and the actual best node distribution, we use a variant softmax with temperature τ to smooth the correlation score and compute the final similarity score.

| Loss function
In S 1 , the final task is to generate the ground truth target text Y ¼ ½w 1 , …, w n , suppose w t y i , where y i is a subset of Y.The loss function of S 1 is defined as follows.
Where n is the length of Y, the model stops after the < eos > token is generated.For best node prediction in S 2 , we compute the loss between probabilities and the fusion feature: and the final loss can be optimized as: where α is a hyper-parameter, in this task, the value of α decreases during the training procedure, and the final loss is back-propagated to optimize both systems in CogNLG.

| Dataset
ENT-DESC The ENT-DESC dataset (Cheng et al., 2020) is extracted from Wikipedia with more than 9.9 million pages.The dataset contains domains like humans, events, locations, etc.It consists of 110k instances and is significantly larger than related data-to-text datasets such as WebNLG (Gardent et al., 2017), AGENDA (Koncel-Kedziorski et al., 2019), and E2E (Novikova et al., 2017).We follow the same experiment setup of (Cheng et al., 2020), the dataset is randomly split into three subsets for the training set (80%), development set (10%), and test set (10%).Table 1 presents the statistics of the datasets.Each dataset item contains a list of source input entities, a ground truth target text, and the associated topic-related entities with the source input.In training and evaluating our CogNLG framework, we only take the source input entities and the target text.Then we use the Wikipedia API to search for a 2-hop (the node extension depth d is set as two as it is large enough in this task) path associated with the input entities and save all path triples.The node name, summary and the related child nodes are extracted from the response node info of Wikipedia.We collect 620 million triples and store them in the database.Considering the time-consuming training, we construct a cognitive graph for each item in advance according to the method mentioned in Section 3.2.We use the default triples provided in the datasets for other related approaches.
We evaluate CogNLG and other approaches on multiple evaluation metrics, including BLEU (Papineni et al., 2002), METEOR (Denkowski & Lavie, 2011), TER (Snover et al., 2006), ROUGE (Lin, 2004), and PARENT (Dhingra et al., 2019).Both the evaluation measures BLEU and ROUGE are based on n-gram analysis.BLEU measures text similarity to the reference based on n-gram overlap, while ROUGE further measures the longest common subsequence (LCS) between the generated and the reference text.METEOR considers word order and synonymy to evaluate the generated content quality.TER quantifies the edit distance between generated and reference text.PARENT combines TER with paragraph reuse for summary evaluation.By utilizing these metrics, we obtain objective measures of model performance, enabling comparisons with other approaches in the field of KG-to-text generation.

| Experimental details
We implement the CogNLG framework based on the Huggingface Transformers (Wolf et al., 2020).The pre-trained model of the GPT-2 in the S 1 is "gpt2", released by (Radford et al., 2019).In this task, the number of transformer layers ℕ is 12, and the hidden size H is 768; all the activation functions are gelu (Hendrycks & Gimpel, 2016); the value of k in Equation ( 8) is set as 6, and the temperature τ in Equation ( 10) is set to 0.2; the hyper-parameter α is initialized with 0.5 and linearly decreasing to 0.2 in 5000 steps.We optimize CogNLG with Adam (Kingma & Ba, 2017), the learning rate of S 1 is 5 Â 10 À5 and the learning rate of S 2 is 1 Â 10 À4 .To accelerate the convergence of the CogNLG, we further pre-trained vanilla GPT-2 with input entities and target text.The batch size of the pre-trained step is eight, and the batch size of CogNLG is set as one because its input length changes in real-time.During decoding, we use Nucleus Sampling (Holtzman et al., 2019) with top-8 tokens, which is more efficient than beam search.

| Results
The overall experimental results on the ENT-DESC and the Person and Animal datasets are shown in Tables 2 and 3. On the ENT-DESC dataset, our CogNLG framework is superior to all related approaches.Our model and KGPT (Chen et al., 2020)  The results show that all approaches performance decrease when introducing the triple noise.CogNLG shows great stability and still achieves the best performance compared with other approaches.
We further analyze the S 2 knowledge extraction performance of CogNLG to evaluate its scalability in detail.The knowledge extraction performance can be treated as the best nodes prediction performance in Section 3.3.We introduce the F 1 score as the evaluation metric and evaluate the test set of the ENT-DESC and the Person and Animal.We set the max size of the cognitive graph equal to the hidden size H. Table 5 presents the results of the prediction performance in different hidden sizes.It is shown that S 2 performance remains stable on both datasets when the hidden size changes.The results also explain why CogNLG remains stable on the ENT-DESC dataset with triple noise and demonstrate that CogNLG has outstanding scalability in large-scale dataset inference.
Over-generation is one potential reason for the model performance decrease caused by the triple noise.When the model can't effectively filter irrelevant entities, the generated text will contain irrelevant entity information.We count the proportion of irrelevant entities generated by different models under the original ENT-DESC dataset and the dataset with triple noise.As shown in Figure 4, the irrelevant entities proportion increases in all models when introducing the triple noise.Due to the excellent performance of the S 2 , our CogNLG generated text contains the lowest irrelevant entity proportion, which implies it can effectively suppress the over-generation issue.

| Ablation study
We designed several experiments to analyze the performance impact of each component in CogNLG.To compare the results without external knowledge and only to use S 1 , we use the vanilla GPT-2 for training and evaluating with input entities and the ground truth target text.As shown in Table 6, the BLEU score of GPT-2 decreased by 12.5 compared with CogNLG, which is worse than most graph-based approaches in Table 2.
The results prove that external knowledge plays a vital role in the performance of KG-to-text tasks.
The CogNLG-R is designed with the same structure as the CogNLG framework but disables the S 2 predictor and replaces it with a random selector for best nodes selection.The CogNLG-R performs poorly and even worse than the vanilla GPT-2.6, the performance of CogNLG-O is similar to CogNLG.

| Explainability analysis
To verify the performance of CogNLG explainability, we present some cases of how the CogNLG S 2 is reasoning the best nodes.As shown in Figure 5, each case consists of a simplified cognitive graph at the top and the generated text at the bottom.In the beginning, the E E and E T in S 1 are missing, and S 2 predicts the best node based on the semantics of E E .The best nodes prediction at the beginning of the generated sentence are shown in cases (a) and (b).We observed that the relation of best nodes in both cases are related to the sentence's subject.In case (a), the subject is a person, and the best nodes include "gender", "birthday", "country", "occupation", and so forth.The subject of case (b) is an airplane base, and the best nodes include the "instance of the subject", "established date", "country", and so forth.In case (c), when the model encounters the preposition "in", S 2 successfully predicts the best node "Villorsonnens" required by the next token.We observe that the best nodes usually remain constant within a clause.However, when it comes to "is" or prepositions like "in", "at", the best nodes change significantly.Case (d) illustrated that S 2 successfully deduces the relationship between "Villorsonnens" and "Fribourg", which reflects reasoning ability to the depth information of the cognitive graph.
We also present the relation tag of the top three best nodes prediction results for each token in  onstrates that the CogNLG excellent performance benefits from the two-system structure, making S 1 focus on organizing language expression and S 2 focus on knowledge extraction.

| Error study
We visualize the CogNLG evaluation BLEU À 4 and ROUGE L scores distribution on the ENT-DESC test set.The results are illustrated in Figure 7.
Most cases show a Gaussian distribution near the average scores in distribution.The reasons for the bipolar distribution cases in (a) are shorter text length and fewer best node labels.We find that the generated outputs get high scores in short texts if the reference contains enough topic-related entity information.On the contrary, the generated outputs get low scores if the reference is mainly composed of verbs, adjectives, and so forth, and lacks topic-related nouns.
We also observed that some cases get low scores due to entity missing or mismatching.The external knowledge is obtained from the Wikipedia API, and we use fuzzy matching to retrieve entity association information.Therefore, it inevitably leads to some missing or mismatched issues.To study the consequences of missing entities, we selected an instance and manually removed the birthday and country from the graph.
As shown in Table 8, when CogNLG misses the birthday and country, it still generates a fake birthday and country.Moreover, we observed that F I G U R E 5 Cases of cognitive graph reasoning path.The text in red is the current last position token, and the blue is the next prediction token.The red circles are the best node for the current token, and the blue circles are the best node for the next token (red and blue overlapping circles represent the best nodes for both).
T A B L E 7 Case results on predicting the top three best nodes for each token.the birthday was randomly different each time the model generates, but the country was the same.The result is similar to using vanilla GPT-2 without any external knowledge, that is, the generated sentences are grammatically smooth but may not follow factual logic.
A deep end-to-end model cannot control whether it generates related entities randomly or fixedly because it is unknown what knowledge is internalized.The two-system-based CogNLG helps us locate the wrong entities by observing the cognitive graph.After adding the missing entities, the model generates the relevant text accurately.

| Diversity analysis
To evaluate the diversity of the model, we randomly sort the input entities and make three predictions on each test set.We adopt the Self-BLEU (Zhu et al., 2018) score to compute the diversity of the generated text.As shown in Table 9, CogNLG achieves the lowest Self-BLEU score on  both datasets, indicating that the generated text of CogNLG is more diverse.We believe it is due to the design of the two system structures, which makes the model S 1 to adapt to the information provided by the S 2 with increasing generation diversity during training.
The Person and Animal dataset (Vrandeči c & Krötzsch, 2014) is extracted from structured KG WikiData and Wikipedia.There are two main types of entities in this dataset named Person and Animal.Compared with the ENT-DESC dataset, it contains only one entity with multiple references in each input.During our experiments, we only generate the first reference for better comparison.

F
I G U R E 4 The irrelevant entities proportion of the generated text on the ENT-DESC dataset."*noise" represents the value with triple noise.To analyze the impact of the triple-to-text policy, the CogNLG-T takes the original triples as E E input in S 1 .The performance of CogNLG-T decreased slightly compared to CogNLG, indicating that the transformation from triple to text can effectively improve the model's internalization of the triple relational.The CogNLG-O represents the model performance training with the original triples from the ENT-DESC dataset instead of using the triples from the Wiki Database.As shown in Table Figure 6, where GCN and MGCN+SUM use the KG provided by the ENT-DESC dataset.The highlighted text in red represents the main input entity, and the highlighted text in blue represents the topic-related entities.The first row in Figure 6 is the gold reference.GCN fails to generate the main entity, which means it cannot extract knowledge accurately.The CogNLG S 2 is implemented based on a variant GCN with aggregate semantic information from S 1 .It successfully filters out accurate knowledge by binary classification based on the current context and node relational information.Compared to the output text generated by MGCN+SUM, CogNLG describes related entities more accurately.It further dem-

F
I G U R E 6 Comparison of different models on an example.F I G U R E 7 Data distribution visualization on the ENT-DESC test set.T A B L E 8 An example of CogNLG generation with knowledge missing.
Then we select the top-k nodes X y i best with similarity score as the set of best nodes in y i .And the collection of best nodes in Y are ½X y 1 best ,…, X y m best .It is important to note that in the evaluation step, the model predicts each position's best nodes X the time cost for each best node computing is expensive.Therefore, we consider that the best nodes in y Y are consistent across the range of y and only compute the best nodes in the start index of y.
outperform the vanilla MGCN in all evaluation metrics.The BLEU score of CogNLG increases by 11.0 compared with the MGCN; this implies a significant improvement in performance for CogNLG has excellent scalability, which can filter knowledge noise, and it is a highlight compared with other traditional approaches.To analyze the scalability of CogNLG, we introduce triple noise for the ENT-DESC dataset.For each item in the ENT-DESC dataset, we randomly add the same number of triple noises as the origin triples.Triple noise refers to triples unrelated to the target text, leading to a severe decline in model performance when the model cannot effectively filter noise.Table4illustrates the performance comparison on the ENT-DESC test set with noise.
adopting the pre-trained model and two-system architecture.Compared to the MGCN ensemble models (MGCN + CNN + delex and MGCN + SUM + delex), our CogNLG outperforms them in BLEU and METEOR scores, but fell behind in ROUGE L .It indicates that our model can generate content with more diversity and ensure fluency and accuracy.T A B L E 1 Statistics of datasets.On the Person and Animal dataset, our model outperforms other approaches in BLEU, METEOR, and ROUGE L .The result shows that MGCN performs better in complex graph structure scenarios with multiple input entities.Since the average number of entities in the Person and Animal dataset is much smaller than in the ENT-DESC dataset, MGCN performs worse than other related approaches.Our CogNLG still shows excellent performance on this dataset, and we can conclude that CogNLG has strong robustness on different entity structure datasets.
Comparison of different models on the ENT-DESC test set.Comparison of different models on the person and animal test set.
T A B L E 2 T A B L E 3Note: # represents lower is better.
It implies that inaccurate knowledge would reduce the model performance as noise, and CogNLG S 2 contributes to filtering accurate knowledge for S 1 .T A B L E 4 Comparison of different models on the ENT-DESC test set with random tripple noise.The performance of S 2 on the ENT-DESC and the person and animal test set.

Table 7 .
The predicted relation tags of best nodes have a high correlation with the current token, making it easy to forecast what token the model will generate in the next position.The visual analysis of best nodes prediction demonstrates that CogNLG shows explainability ability in NLG tasks.It can be concluded that S 2 provides all nodes that it considers nodes to have a high probability of being generated in the following text to prevent information omission.S 1 then decides what to generate based on the context.In particular, S 1 's selection of knowledge is influenced by the semantic distribution of the training set.

Table 2
Results of the ablation study on the ENT-DESC test set.
. However, it can be observed a bipolar distribution in many cases (b) and (c) display the average target reference length and best node labels in each ROUGE L score range, and they both have a Gaussian T A B L E 6 Missing birthday and country Lee Aaron (born June 21, 1969) is an American rock musician.+birthday Lee Aaron (born July 21, 1962) is an American rock musician.+birthday +country Lee Aaron (born July 21, 1962) is a Canadian rock singer, songwriter, and musician.