Chinese Named Entity Recognition in the Geoscience Domain Based on BERT

Geological reports are frequently used by geologists involved in geological surveys and scientific research to record the results and outcomes of geological surveys. With such a rich data source, a substantial amount of knowledge has yet to be mined and analyzed. This paper focuses on automatically information extraction from geological reports, namely, geological named entity recognition. Geological named entity recognition has an important role in data mining, knowledge discovery and Knowledge graph construction. Existing general named entity recognition models/tools are limited in the domain of geoscience due to the various language irregularities associated with geological text, such as informal sentence structures, several domain‐geoscience words, large character lengths and multiple combinations of independent words. We present Bidirectional encoder representations from transformers (BERT)‐(Bidirectional gated recurrent unit network) BiGRU‐ (Conditional random field) CRF, which is a deep learning‐based geological named entity recognition model that is designed specifically with these linguistic irregularities in mind. Based on the pretrained language model, an integrated deep learning model incorporating BERT, BiGRU and CRF is constructed to obtain character vectors rich in semantic information through the BERT pretrained language model to alleviate for the lack of specificity of static word vectors (e.g., word2vec) and to improve the extraction capability of complex geological entities. We demonstrate our proposed model by applying it to four test datasets, including a geoscience NER data set from regional geological reports, and by comparing its performance with those of five baseline models.

2 of 15 et al., 2021). Such approaches depend on hand-crafted pattern-matching-based rules for supporting the recognition and extraction of target entities from geological textual data. Outside of the geoscience domain, several rule-based NER techniques/models have been proposed (Appelt et al., 1993;Elsebai et al., 2009;Lehnert et al., 1992). As the amount of data increases, the workload of rule extraction increases, the difficulty of maintaining rule consistency increases, and rule-based and dictionary-based methods cannot address the heterogeneity and complexity of text and the thus achieve high GNER performance Santoso et al., 2021). Compared with rule-based methods, the statistical learning methods can learn from large amount of annotating training datasets to guide the recognition and extraction NER (Liu et al., 2022;Molina-Villegas et al., 2021;Peng et al., 2021). The popular deep learning methods for named entity recognition generally base on word embeddings, which are able to learn similar representations for semantically or functionally similar words Santoso et al., 2021;Tian et al., 2021). Researchers started to construct models without complicated feature engineering and to minimize the reliance on NLP toolkits for feature acquisition.
However, the recognition of geological named entities still faces some difficulties and challenges (Qiu, Xie, Wu, Tao, & Li, 2019;: (a) compared with general domain texts, geological named entities have (a) large character lengths, (b) many remote words, (c) multiple combinations of independent words, and (d) nested entities. For example, "garnet diopside silica" indicates a kind of silica with garnet and diopside as the main mineral components, and this geological naming entity contains 10 characters. The entity "garnet diopside silica" includes garnet, diopside, and silica. The geological nomenclature of "garnet diorite" includes garnet, diorite, diorite, and garnet diorite silica. There are nesting relationships among geological naming entities of different conceptual levels; for example, a certain stratigraphic entity contains several rocks, and rocks contain several minerals. Presently, there is a lack of a large-scale annotated corpus in the geological field; there is a lack of samples for training the model; and the recall and accuracy are not sufficient. Therefore, geological named entity recognition becomes a challenging task.
The popular deep learning approaches for NER generally based on word embeddings for learning similar representations for semantically or functionally similar words. One drawback of these approaches is that the embedding for the same word in different semantic sentences are identical. A large-scale annotated corpus is required for model training to prevent parameter underfitting or overfitting, while the production of an annotated corpus is costly, and building a large-scale annotated data set is challenging for most natural language processing tasks. As an advanced and widely employed language expression model BERT (Devlin et al., 2018) extracts bidirectional semantic features of Chinese utterances from large-scale texts by unsupervised learning, further enhances the generalization ability of the word vector model, and can better characterize syntactic and semantic information in different contexts, which can, to a certain extent, free the dependence of supervised learning on large-scale annotation data and effectively reduce the labor intervention of manual annotation. Therefore, the integration of BERT into deep learning models will become a new way to improve the performance of Chinese, geological named entity recognition.
To address the abovementioned issues, in this paper, a character-level embedding-based BERT-BiGRU-CRF model is proposed for extracting Chinese geological named entities in response to the characteristics of named entity recognition tasks in the geological domain and to the problems of small sample size and poor results for some target entities (Note that BERT can be used for other languages, while this paper we focus on Chinese GNER). The model consists of an encoder, a BiLSTM neural network layer and a conditional random field (CRF) layer from the bottom up: the encoder is a character-level, Chinese, BERT-based model that maps the input Chinese entity characters into a low-dimensional, dense real space to mine the potential semantics embedded in Chinese entity elements; the BiLSTM neural network layer takes the character vector transformed by the encoder as input to capture the Chinese entity sequences; the BiLSTM neural network layer takes the character vectors transformed by the encoder as input to capture the forwards (left-to-right) and backwards (right-to-left) bidirectional features of the Chinese entity sequences; and the conditional random field layer takes the upstream BiLSTM extracted bidirectional features as input to generate the labels that correspond to each character of the geological entities in combination with the BIOES annotation specification. The experimental results demonstrate that our presented model outperforms previous models and that our approach achieves state-of-the-art performance on the constructed datasets.
The main contributions of this research are summarized as follows: 1. Based on the characteristics of Chinese geological domain text, we propose BERT-BiGRU-CRF model for guiding the recognition and extraction of target information from unstructured textual data 2. Our model is compared in detail with four other mainstream models, and experiments demonstrate that our model obtains higher performance on both the general domain data set and geological domain data set 3. We share the source code of BERT-BiGRU-CRF and the annotated test data at https://doi.org/10.5281/ zenodo.5758466 The remainder of the article is structured as follows: Section 2 discusses related work in the area of geological named entity recognition. Section 3 introduces the proposed approach, and Section 4 provides details on the experiments and results. Some concluding remarks are described in Section 5.

Text Mining in Geoscience
In the domain of geosciences, various systems and applications, such as mineral exploration (Holden et al., 2019;Shi et al., 2018), paleontological studies (Peters et al., 2014(Peters et al., , 2017Wang et al., 2018), and geological text mining and application in Chinese (Qiu, Xie, Wu, Tao, & Li, 2019;Wang et al., 2018), have been developed and constructed. Holden et al. (2019) analyzed 25,419 mineral exploration reports in a targeted manner using NLP pipeline analysis methods (word recognition, keyword extraction, etc.), focusing on searching and summarizing the geology-related content in the reports, and developed a related system named the GeoDocA system. This system retrieves geology-related subject terms based on a dictionary of pre-customized entity groups (geological timescales, mineralogy, host rock types and alteration types), calculates the co-occurrence of these terms, and ultimately creates a summary map for retrieval. Shi et al. (2018) utilized a mature text mining algorithm for direct application to geological text data in Chinese, performed a case study of the Lara copper deposit in Sichuan Province, China, for retrieval, and used convolutional neural networks (CNNs) to classify geological, geophysical, geochemical and remote sensing-related text data for geological texts. Entity co-occurrence and frequency statistics of the deposit analysis were performed. Enkhsaikhan et al. (2018) used a form of word embedding to analyze the semantic similarity among geology-related phrases and applied an analogical solver to establish the semantic relationships among terms to investigate mineral exploration through a semantic analysis approach. The final results demonstrate the potential application of semantic relationships among entities in the domain.
For the domain of paleontological text mining, Peters et al. (2014) developed PaleoDeepDive, a paleontology-oriented knowledge mining platform based on the original platform named DeepDive (De Sa et al., 2016), which was primarily utilized to generate historical and generic turnover rates of taxonomic diversity. In their recent work (Peters et al., 2017), they automated the analysis of stratigraphic databases using a computer reading system to automatically extract information about the three phases of occurrence of stromatolites arranged on a geological time scale, as well as predictors of stromatolite prevalence.
In the domain of geosciences in Chinese,  proposed a self-learning-based word segmentation method to segment meaningful words from geoscience reports written in Chinese, in response to problems such as the drastic performance degradation of generic domain, word segmentation methods applied to the geological domain. To address the lack of a massive corpus in the Chinese geology domain,  proposed a method to generate a corpus based on a random combination of word and word frequency information to segment words related to geology by using aBiLSTM model. Wang et al. (2018) segmented mineral deposit domain texts based on CRFs and then used the segmented texts for keyword extraction and co-occurrence word statistics to construct a knowledge graph for visual analysis. Their algorithm was able to segment both generic domain words and geological domain words. Li et al. (2021) constructed a Chinese word segmentation algorithm based on a geological domain ontology assisted by a self-loop approach to better segment geological domain texts. Ma et al. (2021) employed a deep learning model to train journal abstracts and titles in the field of Chinese geology. The resulting model was trained to perform automatic abstract construction.
All of these studies have explored and visualized geological text but have not addressed the more fine-grained information (e.g., NER, keywords and relation extraction) contained in it. Instead, this paper focuses on the extraction of named entities from geological texts to build a knowledge graph of the geological domain and to mine and discover hidden knowledge in the geological domain.

Previous Research on Geological Named Entity Recognition
Geological named entity recognition is a domain-specific named entity recognition that aims to identify some important concepts in geology, including geological age, geological formations, stratigraphy, rocks, minerals and locations . Several scholars have conducted research on geological named entity identification. Zhang et al. (2018) developed a classification system for elements of geological entity information and an annotation specification based on the linguistic features of geological texts and applied a deep belief network model to geological entity information recognition. Ma (2018) combined a geological domain ontology to carry out the named entity recognition task using a mature BiLSTM-CRF model after preprocessing operations of word separation and deactivation for geological texts, such as geological reports, geological domain journal papers and geological professional websites. Qiu, Xie, Wu, and Tao (2019) preprocessed the corpus with reference to the word2vec model using a large amount of unlabeled data to train word embeddings in the geological domain, used a recurrent neural network approach based on BiLSTM with an attention mechanism for semantic encoding of sentences, and finally combined it with conditional random fields to achieve geological entity recognition. Chu et al. (2021) fused multiple methods, such as ELMO (embeddings from language models), CNN, and Bi-LSTM-CRF, to extract geological entities after word vectorization representation using a CNN to add character features and ELMO to extract word dynamic features for input distributed representation. The commonality of these studies is that they take advantage of the ability of deep learning models to learn deep nonlinear features among words to tackle geological named entity recognition tasks.
In this paper, we focus on named entity recognition in the Chinese geological domain, compare the performance of the current mainstream deep learning models on geological named entity recognition, compare the performance of the mainstream models on a standard data set in the general domain and a constructed data set in the domain of geosciences.

BERT-BiLSTM-CRF
In the BERT-BiLSTM-CRF model, the BERT model is selected as the feature representation layer for word vector acquisition. The BiLSTM model is employed for deep learning of full-text feature information for specific geological domain named entity recognition. The output sequence of the BiLSTM model is processed in the CRF algorithm layer, and combined with the CRF algorithm, a global optimal sequence is obtained based on the labels between two neighbors, as shown in Figure 1.

BERT-BiGRU-CRF
The model framework consists of a BERT pretrained language model, BiGRU network, and CRF layer; a diagram of the model architecture is shown in Figure 2. First, the input sequence is input into the BERT layer for pretraining to obtain context-dependent representations, which are utilized to solve the key problem of entity recognition with many rare characters and nested entities. Second, the vectors obtained from the BERT layer are input into the BiGRU layer to solve the problem of long-term text memory and long text dependency, which can be applied to solve the key problem of large-length geological entity characters. Last, decoding is performed by the CRF layer to obtain the output label sequence.

BERT Layer
The RNNs and CNNs have certain shortcomings when addressing NLP tasks: the circular network structure of RNNs is not parallelized, and training is slow; the inherent convolutional operation of CNNs is not well suited to serialized text. The transformer model (Vaswani et al., 2017) is a new architecture for textual serial networks that is based on a self-attention mechanism, where arbitrary units interact and there is no length limi-10.1029/2021EA002166 5 of 15 tation problem. The BERT model uses a multilayered bidirectional transformer encoder structure while being constrained by left-right and right-left contexts and is better able to include rich contextual semantic information than the ELMo model (Peters et al., 2018), where left-to-right and right-to-left LSTM connections are trained independently. In addition, the transformer uses positional embedding to add temporal information in response to the problem that the self-attention mechanism cannot extract temporal features. The BERT input representation is stitched by three embeddings, namely, token embedding (word embedding), sentence embedding and positional embedding, which can clearly represent a single text sentence or a pair of text sentences in a token sequence, as shown in Figure 3.  In addition, the BERT pretrained language model also captures word-level and sentence-level representations through two tasks, the masked language model and next sentence prediction, respectively, and is trained jointly. The masked language model is designed to train a deep bidirectional language representation vector by randomly masking certain words in a sentence and then predicting the masked words. In contrast to standard language models (e.g., word2vec) that can only predict the target function in one direction from left to right or from right to left, masked language models can predict the masked words from any direction.

BiLSTM Layer
LSTM networks were proposed by Hochreiter and Schmidhuber (1997) to solve the phenomenon of gradient disappearance or gradient explosion that occurs in earlier recurrent neural networks after multilayer network propagation; they comprise a special kind of recurrent neural network. LSTM is widely applied in text information processing tasks as it can well capture temporal information and handle information with backwards and forwards dependencies (Chiu & Nichols, 2016). The standard LSTM can only accept antecedent information and only considers the impact of antecedent information on the current moment, disregarding the following information. Considering the close contextual connection of Chinese text, this paper adopts a bidirectional LSTM. BiLSTM is a further development of LSTM by adding a backwards LSTM that combines the forwards hidden layer and the backwards hidden layer to obtain two different vector representations of the input information at the current moment in a recursive operation, combining them as the vector representation of the input information at the current moment, which can access both preceding information and following information. The detailed encoding pattern of BiLSTM is shown in Figure 4.

GRU Layer
A gated recurrent unit (GRU) is designed to solve the problem of RNN, long-term memory and back-propagation gradient disappearance . The performance effect of a GRU is similar to that of LSTM, and its advantages are reflected in fewer parameters, lower hardware and time cost, and better generalization ability effect on small sample datasets. The internal structure of GRU is shown in Figure 5.
The GRU combines the current node input x t with the state h (t−1) transmitted from the previous node to derive the output y t of the current node and the hidden state h t passed to the next node. The network internal parameter transfers and update equations are shown in (1)-(4).
where σ is the sigmoid function, which is utilized as a gating signal. The closer the gating signal is to 1, the more data are remembered, and conversely, the more data are forgotten. r is the gating to control the reset, and z is the gating to control the update. h' refers to the candidate hidden state. w rx , w rh , etc. Are the weight matrices, and b r , b z , etc. Are the bias quantities. ⨀ is the Hadamard product, that is, the corresponding elements in the matrix are multiplied.

Attention Mechanism Layer
GRU can solve the problem of long-term memory to a certain extent and extract global features. However, it is difficult to solve the problem of long-distance dependency in geological text, and it is difficult to retain local detail information in long text. To compensate for the shortcomings of BiGRU in extracting local features (there are many solutions, and in this paper we choose the attention mechanism strategy), this paper introduces the attention mechanism  to extract the degree of association between different characters in a sentence and the context, which is conducive to solving the long-distance dependency problem caused by the large character length of geological named entities. The attention mechanism adds feature weights to the semantics related to geological named entities to improve the effect of local feature extraction.
The attention mechanism layer assigns weights to the feature vectors h t output by the previous layer and calculates the common output feature vector c t of the previous layer and the attention layer at time t.
, = exp score −1 , ℎ score , ℎ = tanh , ℎ where a t,i is the attention function. The score function is the alignment model, which assigns scores based on how well the inputs and outputs match at moment i, defining how much weight each output gives to each input hidden state.

CRF Layer
The named entity recognition task can generally be considered a sequence annotation problem, and usually the output of BiLSTM can be applied for sequence annotation by adding a softmax layer to the top layer for judgment and outputting the label with the highest probability to complete the annotation task of the input sequence. Although the BiLSTM solves the problem of contextual linkage, it lacks constraints on the output label information.
The softmax layer is based on the judgment of the information at the current moment and does not consider the overall linkage, and the output result is only the optimal solution of the information at the current moment, that is, the local optimal solution. This approach may produce an invalid sequence of labels, which are sequences of outputs such as "B-ROC, I-TIME...". In the case of named entity identification tasks, the BIOE method (Lample et al., 2016) is selected for labeling so that B-ROC followed by I-TIME is not possible.
Given the set of random variables X as the observation sequence and the output sequence Y, the CRF model is described using the conditional probability P(Y/X). For a sentence of text, X = {x 1 , x 2 ..., x n } denotes its observation sequence, and for the output label sequence Y = {y 1 , y 2 ..., y n }, its score is computed as follows: where Q is a matrix of scores m*k output by the attention mechanism, where m is the length of the sentence and k is the number of tags with different entity types. Q i,j denotes the score for the jth tag of the ith word. A is a matrix of transferred scores of size k + 2, where Ay i ,y i+1 denotes the scores transferred from label i to label i + 1.
Y X is the sequence of all possible annotations for sentence X. The final decoding is performed by the Viterbi algorithm to obtain the predicted tag sequence with the highest score as follows: * = arg max(score( , )) (9)

Experiments and Results
A set of primary experiments are conducted to evaluate the presented approach on the gold NER datasets and GNER datasets. First, we introduce the experimental environment and parameters. Second, we presented the evaluation metrics based on precision, recall and F1-score. Third, we compare a set of models/algorithms with our proposed models. Last, we demonstrate the recognition results by a case study and analyze the illustrative errors.

Datasets
Our presented approach is evaluated on the MSRA, Boson, PeopleNER and GeoNER2021 datasets, all of which are preprocessed. Among these datasets, GeoNER2021 is the domain of geoscience that was developed by human annotation; the other datasets comprise the gold NER datasets in the generic domain. Note. Text (%) = the number of entities/number of words in the data set; count = the number of entities on the data set; Max length = the maximum length of the entity on the data set; Avg length = the average length of the entity on the data set. a Word = the number of the words on the data set. b Sentence = the number of sentences on the data set.

Table 1 Statistical Analysis of the Datasets in Our Work
Boson: This dataset was provided by the Boson Chinese Semantic Open Platform; it includes 2,000 sentences in total. There are 6 entity categories: time, location, person name, organization name, company name and product name.
PeopleNER: This data set is divided into two data files, with a total of approximately 286,000 sentences. The first data set includes the original text after splitting, and the second data set includes the corresponding tags one by one after splitting, with a total of 286,000 sentences. The main entity types include location, organization, person name and time.

GeoNER2021:
The data in this paper were obtained from the National Geological Archive of China Geological Survey (NGAC) website, with a total of 43 regional geological survey reports. The collected geological reports were manually labeled into six geological named entity categories GTM, GST, STR, ROC, MIN and PLA.

Experimental Environment and Parameter Settings
The model was trained and tested in Python 3.7.3 and TensorFlow 1.1. The experiments were conducted using the BERT-Base model, which contains 12 transformer layers, a 768-dimensional hidden layer and a 12-head multihead attention mechanism. The GRU network has a 128-dimensional hidden layer. The attention mechanism layer is set to 50 dimensions, and the maximum sequence length is set to 256. The optimization function is Adam; the learning rate is set to 5E−5; and the dropout layer is set to 0.5. The dropout layer is set to 0.5. All models were trained on a single GTX 3090 GPU (Table 2).

Evaluation Metrics
The performance is measured with the precision (named P), recall (named R) and F1-score. The detailed formula is calculated as follows: where TP refers to a positive sample and a positive prediction; FP refers to a negative sample and a positive prediction; and FN refers to a positive sample and a negative prediction.
In this work, each experiment is repeated five times, and we report the average F1-score as the final result (Table 3).

Comparative With Other Algorithms
We selected six mainstream algorithms for testing on three generic domains, named entity recognition datasets. The experimental results are shown in Table 4, which show that all the deep learning models obtained more than 85% performance on these datasets. Among them, BERT-BiGRU-CRF obtained the highest performance on all three datasets, with F1 values of 98.1%, 99%, and 99.1%. This finding further demonstrates that the model has a good generalization capability.
Also, six comparative experiments are conducted to validate the different deep learning-models on the Boson data set, which is a relatively small test data set, and the result is reported in Table 5. As seen in Table 5, the algorithm that achieves the best performance among all the models is BERT-BiG-RU-CRF. As we expected, BERT-BiGRU-CRF has good ability to generalize for identifying named entity. In addition, increasing comprehensive and representative gold data set can help BERT-BiGRU-CRF obtain better performance, and further illustrating the ability of BERT-BiGRU-CRF model to obtain higher performance even on small sample datasets.
Further, a series of experiments were designed to validate the performance of the six algorithms on the PeopleNER data set that contains a large number of named entities. Table 6 demonstrates the results. BERT-BiGRU-CRF achieves the best performance, reaching a precision, recall and F1-score of 0.981,0.983 and 0.982, respectively. This again validates that the model exhibits superior performance compared to mainstream deep learning models and is capable of better recognition in large scale corpora.
We compared six mainstreams, named entity recognition models on the geological named entity recognition data set that we constructed. The experimental results, which are shown in Table 7, suggest that the BERT-BiGRU-CRF model achieves the best performance among all deep learning-based models on the GEONER2021 data set. The second best model is the BERT-BiLSTM-CRF model. The experimental results demonstrate that the BERT model has better input sequence characterization, while the GRU module has better modeling results than BiLSTM.
The above three experimental results show that the BERT model can fully extract word-level, word-level and sentence-level features, and the pre-trained word vectors can better characterize the contextual semantics and enhance the generalization ability of the model, especially for the recognition of domain entities with small data set size, and also for the types of geological entities that consist of Chinese and numeric combinations, it can avoid the problem of word separation errors affecting the recognition of entities. It is also possible to avoid the problem that entity recognition is affected by word separation errors. Among the experiments, it is also found that the training time has increased significantly after incorporating the BERT model.

Impact of the Size of the Training Data Set
A set of primary experiments was conducted to determine how many training data from the corpus could affect the GNER model performance, so an optimal number of training data could be determined and utilized for improved performance. A total of 45 controlled experiments were conducted with training data whose proportion ranged from 10% to 100% (the step size for increasing is 10%). The experimental results, in terms of average F1-score, are demonstrated in Figure 6.
The experimental results show that the number of training datasets can improve the GNER performance. As shown in Figure 6, increasing the number of training datasets can increase the average F1-score. For instance, the BERT-BiGRU-CRF model achieved the least satisfactory performance with a 10% training data set, and increasing this proportion to 90% improved the average F1-score by 72%. Although increasing the number of training datasets could improve the performance of the model, optimal GNER performance cannot be achieved if the number of training datasets is solely increasing. For instance, when the number of training datasets was set to 100%, the  Note. Bolded indicates the best performance indicator, highlighting.

Table 5
Performance of Different Models on the Boson Data Set average F1-score dropped by 2% compared to the best performance (demonstrated with a 90% proportion). Figure 6 also demonstrates that the BERT-BiGRU-CRF is sensitive to relatively minor changes in the number of training datasets, with a range of 10%-90%. For instance, increasing the number of training datasets from 50% to 90% did affect the average F1-score. Based on the experimental results, the optimal number of training datasets ranges between 80% and 90%.

Impact of Different Architectures
To evaluate the effects of different features and components on the overall performance of the presented methods, we conducted an additional set of experiments to assess the variants of BERT-BiGRU-CRF without different features and components, as illustrated in Table 8. These results demonstrate that each enhancement incorporated into the proposed model contributes to the improvement of the overall performance. The following model settings were evaluated: 1. The entire architecture (BERT-BiGRU-CRF), as introduced in Section 3.2 2. The entire architecture proposed in this study with a word embedding of low dimensionality (i.e., vectors of 150 dimensions instead of 200) 3. The entire architecture proposed in this study with a word embedding of high dimensionality (i.e., vectors of 300 dimensions instead of 200) 4. The entire architecture proposed in this study apart from the BERT representation layer 5. The entire architecture proposed in this study is separated from the CRF layer As is evident from Table 8, all aforementioned variants of BERT-BiGRU-CRF outperformed the traditional matching method, although slight differences were observed in scores obtained via the various approaches. The first two variants were assessed to determine the effect of lower or higher dimensionalities of the word embeddings in the architecture. The results demonstrate that the quality of the results depends on dimensionality. An appropriate dimension setting of the word embedding is necessary for this DNN model, as the use of a higher dimensionality requires a greater number of computational resources, producing slightly inferior results. The two final variants were assessed to determine the effects of specific layers within the proposed model. As expected, in either case, the performance of the model was slightly degraded in terms of each metric, indicating the important impact of the two layers on the performance of the overall model.

Extraction Results and Error Analysis
In this research, a set of experiments is conducted to validate the extracted NER results and analyze the errors. Some illustrative examples of geological NER results are demonstrated in Table 9. As shown in Table 9, basic geological nomenclature entities, such as the stratigraphic unit entities "Upper Cretaceous" and "Neoproterozoic" and rock entity "red molasses", can be effectively identified. In addition, the model can accurately identify the long entity with the remote geographic nomenclature "Nima County Zhang'en-Shenzha County Kargol", which is a nested entity composed of several independent words: Nima County, Zhang'en, Shenzha County, and Kargol. The result of identifying Nima County, Zhang'en-Shenzha County, and Kargol as geological named entities is output directly.
We also summarized and analyzed the model recognition errors, as shown in Table 10, and discovered the following main problems: (a) We cannot accurately recognize some consecutive entity characters separated by symbols only. For example, in the entity "Raja-Kanru Fault", "Raja" is not recognized as the character of the geological structure, but the "-" character followed by "Kangru" is employed as the starting character of this geological structure.  Note. Bolded indicates the best performance indicator, highlighting.

Table 7
Performance of Different Models on the GeoNER2021 Data Set (b) The model only recognizes local information. For example, the rock entity of "medium basal volcanic rock" only recognizes "volcanic rock", and "medium basal" is marked as other characters.
The solution for Problem 1 can be constructed by constructing regular expressions to judge the entities on the left and right sides of the symbol "-" If both sides match the entity type, then the entity is considered as a whole. For Problem 2, regular expressions can also be employed to judge and fuse the lexical properties before identifying the entity, and of course, it is also possible to improve the generalization ability of the model by extending the data set.

Conclusions and Future Work
Geological named entity recognition is the basis step for acquiring information and extracting knowledge from massive geological reports or documents and enables further relationship extraction and construction of geological knowledge graphs. In this paper, we study and compare the mainstream, deep learning-based, named entity   recognition methods in the geological domain to address the current challenges of named entity recognition in the geological domain and the descriptive characteristics of geological domain texts. An annotated corpus for geological domain, named entity recognition is constructed, and a named entity recognition model for geological domain texts based on the BERT, pretrained language model is proposed. We applied the proposed BERT-BiGRU-CRF model in four different datasets, and evaluated its performances in terms of precision, recall, and F1-score. The experimental results show that the method significantly outperforms the baseline model and other deep learning models in terms of performance evaluation metrics, such as precision, recall and F1-score for text named entity recognition in the geological domain.
The contributions of this research can be seen from two perspectives. From the perspective of methodology, this work presents a deep learning model (named BERT-BiGRU-CRF) approach. As indicated by the experiment results, the BERT-BiGRU-CRF has a better performance in extracting named entity than other deep learning models. From the perspective of application, this paper develops domain-geosciences NER data set for extracting geological NER and further knowledge discover.
Future research will focus on two aspects: (a) the geological domain corpus is an open domain corpus, and the construction of the corpus is a process of continuous updating and improvement. We will further improve and optimize the geological domain named entity annotation corpus, expand the classification of geological domain named entities, and enrich the corpus by various means such as offline collection; (b) research advanced deep learning models, explore the application of new deep learning models in the geological domain, named entity recognition task, optimize the existing model structure, design a more suitable named entity recognition model for the geological domain, and obtain better performance.

Conflict of Interest
The authors declare no conflicts of interest relevant to this study.