Research on the prediction and relationship between academic attention and network attention in chemistry teaching

In order to adapt to the development of modern education and provide education practitioners with decision‐making suggestions, it is necessary to understand the relationship between the public's attention to basic chemistry education (BCE) and academic attention. However, many existing research is based on a single platform to study social hot information, such as Google Index, Baidu Index, Web of Science, and so on. But they ignore the relationship that exists between the Baidu Index and related academic platforms, and ignore the common information reflected between them. This paper takes advantage of the big data method, through the big data of Baidu Index and the big data obtained from China National Knowledge Internet (CNKI) database, to study the network attention and academic attention of BCE, and propose a CVS‐LSP‐GP framework. It first selects keywords through correlation analysis, secondly uses the data obtained from the first step to construct a nonlinear regression model, and finally combines the results of gray prediction to predict the academic attention of CNKI related to BCE. The research results show that the BCE is mainly affected by the micro‐lecture teaching mode, and relevant education practitioners should integrate the micro‐lecture mode into teaching for more further research and practice.

mon information reflected between them. This paper takes advantage of the big data method, through the big data of Baidu Index and the big data obtained from China National Knowledge Internet (CNKI) database, to study the network attention and academic attention of BCE, and propose a CVS-LSP-GP framework. It first selects keywords through correlation analysis, secondly uses the data obtained from the first step to construct a nonlinear regression model, and finally combines the results of gray prediction to predict the academic attention of CNKI related to BCE. The research results show that the BCE is mainly affected by the micro-lecture teaching mode, and relevant education practitioners should integrate the micro-lecture mode into teaching for more further research and practice.

INTRODUCTION
The basic chemistry education (BCE) has been attached great importance by various countries. Chemistry is a practical subject, which, together with mathematics and physics, has become the basis for the rapid development of natural sciences. The core knowledge of chemistry has been applied to various areas of the natural sciences. At present, chemists use the viewpoint of chemistry to observe and think about social problems, and use the knowledge of chemistry to analyze and solve social problems, such as energy problems, food problems, environmental problems, health problems, resources and sustainable development and other issues. In a word, chemistry is closely related to all aspects of human beings, it is a practical subject urgently needed by society. In basic education, chemistry, as the examination subject of the senior high school entrance examination and the college entrance examination, has always attracted public attention and academic attention.
In view of the public's habit of querying all the information involved in chemistry teaching through search engines, the data recorded and stored by these search engines on chemistry teaching can objectively reflect the relevant matters of chemistry teaching that real users are concerned about to a considerable extent, so analyzing the data recorded and stored by search engines is of great value to better understand the public's needs and decision-making behaviors for chemistry teaching, and then improve chemistry teaching practice.
Chemistry teaching, especially basic education, has long attracted widespread attention in society, relevant media reports have emerged one after another, and scholars' research has been deepening. The purpose of this paper is to sort out the current focus of public and academic research on chemistry teaching, and the relationship between them. The scientific citation index analysis of papers on a topic can reveal the research and development process of knowledge or theory related to the topic, and reveal the basic trend of research and development of the topic from an important aspect. Therefore, this paper adopts the China National Knowledge Internet (CNKI) database to use the frequency of articles on a certain topic as an indicator to measure academic attention. CNKI is one of the most informative and valuable Chinese websites in the world. Its information content is deeply processed, edited, integrated and managed in an orderly manner in the form of a database, with clear sources, and the content is credible and reliable, which can be used as the basis for academic research and scientific decision-making.
In order to meet the development needs of modern education, it has become more important to understand the relationship between the public's attention to BCE and academic attention to it. However, many existing research are based on a single level of attention, ignoring the relationship between the public's network attention and the scholar's academic attention. So, the research done in this paper is very valuable to relevant educational practitioners. The main contributions of this paper are as follows: (1) Construct a framework for studying the relationship between public and academic attention in BCE. (2) Establish a model for predicting academic attention in BCE. (3) This paper draws conclusions that are of some value to relevant education practitioners through the analysis of the predicted results.
The study is divided into five sections, section 1 is Introduction, which mainly states our research background, research purposes and research inspiration; section 2 is to sort out the work done by scholars related to our research, mainly divided into two aspects: research on BCE and research on Baidu Index; section 3 shows the framework presented in this article, detailing the process and methodology of each step; section 4 is the description of the experimental process, which is mainly divided into several parts: data source, evaluation index, correlation analysis, prediction model and analysis of experimental results; section 5 is conclusion of this paper, summarizing what we have done and what we look forward to in the future.

RELATED WORK
In the current era of rapid development of big data, Internet big data has become an important tool for scholars to conduct academic research. Empirical research through data collection, sorting and analysis can provide relatively sufficient data and provide a practical basis for the construction of education. Baidu Index is a big data platform developed by Baidu, which records the search behavior of netizens in the Baidu search engine. Once a keyword is searched, relevant records are recorded in the background to form a large database. According to the current work on chemistry teaching through big data, we have classified the chemistry teaching research according to the research methods, which are divided into two categories: research based on BCE and research based on Baidu Index.

Research based on chemistry basic education
In 1869, the famous Russian chemist Mendeleev published the periodic table of elements, which provided a new way of thinking and exploration for the development of chemistry. In just a 100 years, chemistry has flourished. All countries in the world realize that chemistry has a huge impact on science and technology, life and society. Therefore, countries around the world attach great importance to chemistry education and have related research. The connection between junior high school chemistry and senior high school chemistry should take into account the psychological factors of learners, the review of junior and senior high school knowledge, and the introduction of novel perspectives of various knowledge points. 1 Since the promulgation of the overall framework of "Chinese students' development of core literacy" and the newly revised curriculum standards for various subjects in ordinary high schools, relevant education practitioners have attached great importance to the cultivation of core literacy and "evidential reasoning and model cognition" has also been listed as one of the five core competencies of high school chemistry. In teaching, students' chemistry literacy can be improved by leading questions, conducting scientific inquiry and paying attention to classroom evaluation, and promoting the efficient achievement of teaching and learning. 2 Fayun 3 proposed the chemical society course brings students not only the knowledge of chemistry, but more importantly, the experience of rigorous scientific spirit and the burst of learning motivation and potential. Wenrong and Ling 4 studied the disciplinary value of chemistry teaching through the situational analysis of real teaching. With the implementation of new curriculum standards, more and more attention has been paid to cultivating students' core literacy. 5 In the teaching of chemistry in high school, focusing on the cultivation of chemical thinking ability can effectively promote students' learning of chemistry. 6 The integration of "teaching, learning and evaluation" advocated by the new curriculum standard focuses on the integration of teaching evaluation into the entire teaching, forming a dynamic "teaching, learning and evaluation" cycle, improving the overall effect of teaching and promoting students' development of core competencies. 7 Under the background of the current core literacy, the humanistic dimension of the core literacy of chemistry is lacking. It can be practiced from the aspects of material selection concept, teaching preparation, teaching introduction, summary and improvement, and face the weakness of humanistic education in contemporary science education. 8 Guoxian et al 9 employed the new textbook of "General High School Chemistry Curriculum Standard (2017 Edition)," and found that the exercises in the new textbook have been greatly changed compared with the old textbook. The breadth and depth of the exercises in the new textbook have been improved, it put forward higher requirements for students' chemistry core literacy. Zhijun and Guangjing 10 proposed that using chemical knowledge as a tool to solve practical problems is an important strategy for literacy-oriented high school chemistry teaching, aiming to promote the high-level development of students' core literacy. Yuhe and Lingpeng 11 studied the application and change of STSE in the new compulsory textbook of high school chemistry in Luke edition.

Research based on Baidu Index
The search engine data can help improve the sales forecast accuracy of retail, automobile housing, tourism and other industries, so as to provide managers with corresponding decision-making guidance. 12 Jihua and Mengdi 13 extracted from Baidu Index website through web crawler Keywords related to the Volkswagen brand, and obtain the daily web search volume of each keyword, and use the PCA method to synthesize the keywords, extract the main features of the keywords to build a car sales forecast model, and the final prediction accuracy reaches 96%; Qijuan et al 14 utilized the influenza appellation, symptoms, treatment and prevention to identify influenza keywords and then analyzed the correlation between the Baidu Index of influenza keywords in different lag periods and the percentage of influenza-like cases in outpatient visits. A large number of scholars have used Baidu Index to conduct related research: Zhiyi 15 proposed that the multi-directional data of Baidu Index can make a comprehensive data analysis for the development of a certain industry, and can provide real and accurate data support for guiding the development direction of the industry. The Baidu Index tool was used to reflect the society's attention to the spirit of craftsmanship, and put forward new requirements for political and ideological education in colleges and universities. 16 The data support of Baidu Index comes from the powerful information capture function of Baidu search engine, so it can provide more comprehensive portrait, regional analysis, time and space analysis for research. 17 Renfu et al 17 adopted the Baidu Index big data platform to analyze the public's attention to vocational education by qualitative and quantitative research methods, and found that the social attention is low, and the attraction of vocational education is not strong, and it is necessary to strengthen the publicity and guidance of vocational education. Yulin et al 18 utilized the Baidu Index to build a short-term predicting model to analyze the relationship between public attention and teaching research.
There are many domestic and foreign studies on chemistry, Baidu Index and social attention analysis using Baidu Index, which provide a strong basis and support for the research of this paper. However, there are few studies on the relationship between academic attention and public attention in chemistry education and even basic education using big data tools. To this end, we used the proposed CVS-LSP-GP framework in this paper to study the relationship between the academic attention and public attention of BCE combined with Baidu Index and CNKI big data.

OUR PROPOSED CVS-LSP-GP FRAMEWORK
In order to solve the problems mentioned above, we use the big data statistical analysis tool "Baidu Index" and the CNKI database to study and analyze the relationship between the public's Internet attention and academic attention. From the perspective of network search, through the analysis of the academic attention and social network attention of the BCE in the past 10 years, the correlation analysis (CVS) is carried out to filter the keywords, and the least squares (LSP) method is used to establish a prediction model according to the correlation, while using the gray prediction (GP) model to predict the Baidu Index value of keywords, we named the whole research framework as the CVS-LSP-GP, and use this framework to predict the academic attention in the next few years. In this way, a more objective and comprehensive study of the chemistry discipline in basic education will be conducted to provide solutions for the development direction of BCE discipline and provide assistance to relevant education practitioners. The CVS-LSP-GP framework is shown in the Figure 1, which is consisted four steps.
Step 1: Experimental Data Acquisition. Obtaining Baidu Index data on BCE and the number of articles published in CNKI (Detailed information can be found in Section 3.1).
Step 2: Correlation analysis is performed on the data obtained in the first step to filter keywords to build the nonlinear regression model (Detailed information can be found in Section 3.2).
Step 3: Predicting the Baidu Index of the keywords filtered in the second step by using the gray prediction model we built (Detailed information can be found in Section 3.3).
Step 4: Combining the model constructed in the second step and the prediction results of the third step to predict the academic attention of BCE (Detailed information can be found in Section 3.4).

Obtaining public attention and academic attention of BCE
In CNKI, we searched academic article with the themes of "high school chemistry + teaching" and "junior high school chemistry + teaching" for pre-analysis, getting the keywords under this topic from 2011 to 2020 and the number of articles in CNKI related to this topic in this period. The daily average value of each year's Baidu Index of the keywords obtained above in the Baidu Index Database from 2011 to 2020.

Building nonlinear regression model equations
Pearson correlation analysis is performed on the obtained public attention and academic attention to filter the keywords with strong correlation, and the obtained keywords are used for nonlinear regression analysis, so as to use machine learning methods (Least Squares, Gradient Descent algorithm) to build nonlinear regression models. Then, the obtained model is judged for the goodness of fit, and the model parameters are adjusted accordingly to optimize the model.

Least squares method
Least squares are a mathematical optimization technique that finds the best function match to the data by minimizing the sum of squared errors. The general form of the least squares' method is where (x, y) is a pair of observations, and is a parameter to be determined.

Gradient descent
Gradient descent is a kind of iterative method, which is often used in solving machine learning algorithm model parameters, that is, unconstrained optimization problems. The solution process is to solve the minimum value along the direction of gradient descent. Let the cost function be J( ), where = [ 1 , 2 , … , n ] T is the parameter to be optimized, and the gradient descent algorithm assigns the value, so that J( ) the gradient descent is the fastest direction, and iteratively continues, and finally the local minimum value is obtained, that is, convergence. The gradient descent algorithm is not only used for linear regression, but can be used to minimize any cost function. The formula is as follows: Each of these components is guaranteed to be updated synchronously until convergence; is the learning rate, which determines how fast to go in the direction that will make the cost function fall the most.

Predicting public attention of BCE
Using the gray prediction model to predict the Baidu Index of the keywords filtered in the second step. Before that, a grade ratio test is performed to verify whether the gray prediction conditions are met. Finally, the posterior difference ratio is used to verify the accuracy of the results. Gray prediction is a method for predicting coefficients with uncertain factors, by identifying the degree of dissimilarity of development trends among system factors. The original data is generated and processed to find the law of system changes, and the generated.
A data sequence with strong regularity is established, and then a corresponding differential equation model is established to predict the future development trend of things. Algorithm steps: 1. Level comparison test of data: In order to ensure the feasibility of gray prediction, it is necessary to carry out a grade comparison test on the original sequence data. For the original data column x 0 = [x 1 , x 2 , … , x n ] T ∈ R n , compute the order ratio of the sequence: If all grade ratios (k) fall within the acceptable coverage interval = , gray prediction can be made; otherwise, it is necessary to perform translation transformation on x, y = x + c, so that y meets the grade ratio requirements.
2. Build a GM(1, 1) model and calculate the predicted value column, which GM(1, 1) indicates that the model is a first-order differential equation and only contains a gray model of one variable. Then set the original data as , and then set the accumulated data of the original data as (1) adjacent values of the sequence generate a sequence, that is Therefore, the gray differential equation model of GM(1, 1) is defined as where is a called the development coefficient, z 1 (k) is the whitening background value, and is b called the gray action quantity, let Then GM(1, 1) can be expressed as Y = B , then use the normal equation to find the values of a and b. The corresponding whitening model is make t + 1 = t then This is the prediction result of gray prediction model.
3. Detect the predicted value. There are generally three methods for the accuracy test of the gray prediction model: the relative error size test method, the correlation test method and the posterior difference test method. Here is the posterior difference test method: First, accumulating subtract the predictedx 1 to getx 0 Second, calculate the residuals Again, calculate the variance of the original series and x 0 the S 2 variance S 1 of the residuals e Finally, calculate the posterior difference ratio Check the table to see if the results meet the accuracy requirements.

Predicting academic attention of BCE
Combined with the results of the gray prediction in the third step and the nonlinear regression model obtained in the second step, the number of articles of BCE in CNKI is predicted, so as to predict the academic attention of BCE.

Data sources
The public's network attention data comes from the Baidu Index, which is a massive cloud data gathered by multiple platforms of Baidu, the largest Chinese search engine. On the one hand, it analyses the search category and popularity of keywords, and on the other hand, it can deeply mine the data characteristics of search keywords and search users' information orientation, search needs, keyword characteristics, etc., which can accurately reflect the hotspots of public concern and the degree of attention to a hotspot, and it can also effectively apply the aggregated statistical data to scientific research and behavioral analysis. The academic attention data comes from CNKI. The CNKI database is currently the largest journal database in China and the most widely used academic resource database in China.

Evaluation index
In this paper, the linear regression coefficient of determination R 2 (also known as the coefficient of determination, goodness of fit) is used to measure the quality of the model. According to the formula Among them, SSR represents sum of squares for regression, and SST represents sum of squares for error.

Keyword relevance analysis
Since there are too many keywords initially obtained, and it is difficult to ensure the quality of the data, the data must be pre-processed before the next analysis. Through the SPSS statistical analysis software, the obtained Baidu Index network attention data and CNKI academic attention data are analyzed for correlation, and the keywords with strong correlation are screened out. According to the analysis of keyword relevance in the past 10 years, this paper can draw the results shown in the following Tables 1-4. After selection, the above keywords can be obtained. After analyzing the above four tables, it can be found that the Pearson correlation coefficients of the keywords in high school chemistry, junior high school chemistry and new courses are all above 0.6, which is a strong correlation, and the confidence level is above 95%; In addition, the natural logarithm ln TA B L E 1 Correlation of the keyword "high school chemistry" processing of the annual Baidu Index of the micro-lecture, and then calculating the correlation can also get good results. The retrieval results on the theme of "Chemistry + Teaching" and the subsequent correlation analysis can screen out four keywords of high school chemistry, junior high school chemistry, new courses, and micro-lectures, then use these four keywords to build prediction model.

Nonlinear regression prediction
First, build a nonlinear model. There are four features here. For this purpose, we set ] T represents each characteristic parameter, which x 1 represents "high school chemistry," x 2 means "junior high school chemistry", x 3 means "new course," x 4 indicates "micro-lecture." In order to ensure that the Baidu Index and the annual publication volume maintain the same order of magnitude, the independent variables here have been x processed by taking the logarithm ln. For this, the cost function to be optimized can be derived: Among them m is the number of samples, which ] T represents the characteristic data for the ith sample.
In this article, the annual Baidu Index is represented by the four keywords retrieved by the theme of "high school chemistry + teaching" and "junior high school chemistry + teaching," y (i) represents the annual CNKI publication volume on this topic. In addition, the second term is an additional term used for regularization, ∑ n j=1 2 j which is a regularization parameter in order to avoid overfitting of the model during gradient descent, is the regularization parameter. So, our goal is to find the parameter that minimizes J( ).
According to the gradient descent algorithm formula mentioned above (2). After substituting in the cost function and making some changes, the following formula for gradient descent can be derived Repeat the above formula continuously until the result converges, in which all parameters must be guaranteed to be updated synchronously. The gradient descent process uses the SciPy scientific computing package developed based on the python language in the actual calculation. Specifically, the fmin_tnc method under the scipy.optimize library is used, and the required parameters of the method are brought into the calculation to obtain fitted parameter results. According to the evaluation index R 2 (Equation (16)), Among them,ŷ i represents ith prediction result of the first sample, and y represents the average value of the original CNKI publication volume, which is approximately equal to 0.91 when the data is taken as a natural logarithm, and R 2 approximately equal to 0.82 after restoring the data R 2 , so it can be concluded that the model has a high goodness of fit and can predict CNKI frequency accurately to a certain extent.
The fit of the model from 2011 to 2020 is shown in Figure 2.

Gray prediction
Before the gray forecast, it is found that the Baidu Index of individual keywords fluctuates greatly with the growth of the year through the grade ratio test analysis of the data, which is not suitable for gray forecast, as shown in Figure 3. Therefore, all the Baidu Index of the keyword takes the natural logarithm and converts it into a linear trend. After transformation, all the data meet the conditions, build a gray model GM (1, 1), calculate the predicted value column, and finally check the predicted value by looking up the table, as shown in the following Table 5.
For example, perform a grade comparison test on the Baidu Index of the keyword "high school chemistry," and calculate the sequence grade ratio (k) (Equation 3); After calculation, the (k) is within the tolerant coverage interval, so the keyword can be predicted in gray prediction model; Next, input the original data into our pre-built GM (1) model to get the output result of Baidu Index prediction for the keyword "high school chemistry." This is the final predicted value, and the other three keyword prediction steps are the same as above, so there will not repeat them here. After verification by the posterior difference test method, the prediction accuracy level for all keywords has reached the first-level (good) standard, as shown in Table 6.
Take the gray prediction result for the keyword "high school chemistry" as an example, as shown in Figure 4. At the same time, calculating the coefficient of determination of linear regression, it can be concluded that the R 2 after the data is restored by natural logarithm can reach 0.9. Combined with the above Figure 4, it can be concluded that the prediction results of the keyword "high school chemistry" through the gray prediction method have a good fitting effect between 2011 and 2020.

Predicting the annual publication volume of CNKI
Using the Baidu Index results of each keyword in the next 5 years obtained from the gray prediction, and the nonlinear regression model established above, predicts the annual publication volume of CNKI in the next 5 years based on the themes of "high school chemistry + teaching" and "junior high school chemistry + teaching." The predicted results are shown in Figure 5.

Analysis of prediction results
It can be seen from Figure  However, the attention of senior high school chemistry and junior high school chemistry has declined to varying degrees and stabilized. The main reason for the decrease is that in 2014, the State Council issued the "Implementation Opinions on Deepening the Reform of the Examination and Enrollment System," which proposed the reform of the college entrance examination. 21 The city of Zhejiang and Shanghai implement the 3 + 3 policy (the first "3" represent Chinese, mathematics, and English, and the second "3" is the policy for students to choose three subjects from the six subjects of politics, history, geography, physics, chemistry, and biology). Chemistry can be taken elective examinations in the college entrance examination, so the attention to chemistry in all aspects has declined, especially in high school chemistry. At the same time, attention to high school chemistry and junior high school chemistry is essential, because high school chemistry and junior high school chemistry teaching is the foundation of chemistry teaching. The top priority of basic education is also the basis for guiding the education and teaching of front-line teachers, so it has gradually stabilized after a decline.
In Figure 3, it can be seen that the trend of attention to new courses has been flattened without major fluctuations. The main reason is that the new curriculum reform was launched in 2001, and the "Basic Education Curriculum Reform Outline (Trial Implementation)" 22 was carried out in various experimental areas, and was fully implemented after success. As of 2011, the research on the new curriculum has experienced fluctuations, and the related research is relatively mature and abundant. Therefore, from 2011 to 2020, the attention to the new curriculum is low but stabilized.
As can be seen from Figure 4, based on the data from 2011 to 2020, the number of academic research papers published by CNKI on "junior high school chemistry + teaching" and "high school chemistry + teaching" will continue to increase from 2021 to 2025. In the next 5 years, academic attention will show an exponential growth trend, and its growth will be mainly influenced by micro-lecture research, and will continue to grow with the growth of micro-lecture teaching research. The main reason lies in the advent of the era of informatization and intelligence, the wide application of smart phones and tablet computers, and the characteristics of micro-lecture teaching with short, concise and multimedia teaching, making the micro-lecture teaching mode more and more favored by front-line teachers. The success of the arrival of micro-lecture teaching can be carried out anytime, anywhere. Emerging technologies have provided development assistance for the development of modern education. The development policy of national education modernization has provided fertile soil for micro-lecture teaching. Therefore, micro-lecture teaching will be more and more applied in chemistry teaching. In the next 5 years, the main research direction of chemistry education will still be the application of micro-lecture teaching mode.

CONCLUSION
The research shows that in the next 5 years, the academic attention and network public attention on the BCE will still maintain an upward trend, which is largely related to the research attention of micro-lecture teaching in chemistry, and the attention of chemistry of high school and junior high school has declined slightly and leveled off. Therefore, in the next few years, scholars should strengthen the research on micro-teaching, and increase the research on chemistry in high school and junior high school, consolidate the foundation, increase information-based teaching technology, achieve high-quality development of basic chemistry teaching. Of course, our work still needs to be improved in the future, such as using some representative computational intelligence algorithms like elephant herd optimization algorithm (EHO), 23 earthworm optimization algorithm (EWA), 24 monarch butterfly optimization (MBO), 25 and so on to solve the problems raised in this paper.