Relationship modeling and short‐term prediction analysis between public attention and teaching research

Analyzing the relationship between Internet attention and teaching research can provide a reference for extracting and identifying research hot spots in the discipline. Currently, many kinds of literature obtain social hot‐spot information based on a single platform, such as the Baidu Index, China National Knowledge Infrastructure (CNKI), and Web of Science. Still, they ignore the common information reflected between the Baidu Index platform and the professional academic platform and the lack of relationship demonstration between them. Therefore, based on the CNKI database and Baidu Index platform, the most critical data statistical analysis service platform in the Internet era, this article proposes a relationship modeling and prediction framework (RMPF). First, RMPF analyzes the relationship between public attention and teaching research from the perspective of “high school information technology” by word frequency analysis and Pearson. Second, RMPF constructs the relationship model between them using leave‐one‐out cross‐validation. Third, RMPF predicts the development of subject teaching and research using the autoregressive integrated moving average model and multiple regressions. The results show that the correlation coefficient between public attention and teaching research is more than 0.65 from the perspective of “high school information technology,” showing a strong positive correlation. In recent years, the focus of high school information technology discipline tends to be integrating multimedia technology, cloud classroom, artificial intelligence, and other technologies and disciplines. In addition, discipline literature volume growth will slow in the next 2 years, with an average annual publication of 350 articles. The overall number of literature decreased by about 120 compared with 2020. Based on the proposed RMPF, this study clarifies the relationship between subject attention and teaching research through experimental demonstration and provides an implementation framework reference for researchers of relevant literature.


INTRODUCTION
Digital networks provide significant advantages for the extension and expansion of information. As a vital force of network public participation, the mass demand of ordinary netizens leads to a substantial increase in network traffic. Platform media, social media, multi-source, and multi-channel have become essential carriers of information dissemination. 1 Compared with various scientific research and academic platforms, the depth, breadth, and speed of information dissemination in the digital network environment can better reflect the online public's enthusiasm for related things to ensure the foresight of research hot-spots. At present, obtaining hot-spot information based on networks presents diversity. The knowledge atlas 2 helps monitor hot-spots of online public opinions by combing the literature. Theoretical analysis methods such as game theory 3 and case analysis 4 are primarily used to study hot spot communication mode. Single-pass, 5 K-means, 6 and other clustering algorithms are often used to improve the accuracy and precision of hot-spot identification. Networked hot spot extraction reflects the effectiveness of accurate decision-making and governance of big data in the intelligent era. As two significant paths to reflecting social hot spots, topics of public concern and literature on professional academic platforms are interrelated and interdependent. In the network environment, the public's strong willingness to pay attention to a hot topic promotes the output of relevant topic literature on professional academic platforms. Correspondingly, the amount of published literature on professional educational media is the equal interpretation of hot topics of public concern. As an essential window for scientific research achievements and academic exchanges, all professional academic platforms can actively pay attention to and grasp the focus and difficulties of scientific and technological research and timely reflect on the hot spots and public needs in various stages of society. 7 Based on relevant literature on professional academic platforms, some scholars have found that the research hot-spots of Chinese middle school geography textbooks mainly focus on seven aspects: comparison of geography textbooks, textbook system, textbook optimization, textbook analysis and redevelopment, infiltration of geographical thought, books and courses, and development and practice of local texts. 8 Since the 21st century, the three academic hot spots of higher vocal music education are verbal art, vocal music teaching, and national inheritance. 9 Some scholars also confirmed that investors' attention is conducive to improving enterprises' green innovation through online public opinion monitoring. 10 It can be seen that professional academic platforms and online attention are conducive to discovering current hot topics and general will. As a discipline with equal emphasis on theory and practice, high school information technology aims to comprehensively cultivate students' four core qualities of information awareness, computational thinking, digital learning and innovation, and information social responsibility, 11 which is also a powerful magic weapon to give play to students' knowledge, emotion, intention, and behavior. In recent years, this subject has gradually adopted new teaching methods to improve students' cognition under artificial intelligence (AI) background. As a product of the "micro era" in the field of education, micro-courses can promote students from "let me learn" to "I want to learn." Based on the three-dimensional teaching goal, flipped classrooms spread knowledge with the help of "micro-courses." Integrating vision, listening, feeling, and perception, multimedia fully express subject information with various materials, stimulates students' sensory perception, and improves classroom interaction. Thanks to the widespread use of Internet technology, the combination of programming cat and 3D printing with STEAM and Maker education helps to strengthen the creative materialization ability of primary and secondary school students. To further explore the subject hot-spot of high school information technology and verify the complex correlation and dynamic characteristics between network public attention and professional academic platform, this article takes "high school information technology" as the starting point and proposes a relationship modeling and prediction framework (RMPF). RMPF first analyzes the relationship between subject public attention and subject teaching research by using text word frequency and Pearson correlation coefficient. And on this basis, the relationship model between subject public awareness and subject teaching research is established according to leave-one-out cross-validation (LOOCV) theory. 12,13 Finally, RMPF predicts the discipline attention development trend and publication volume of discipline literature by the autoregressive integrated moving average (ARIMA) and the constructed relational model, providing experimental evidence for current studies.

RELATED WORK
Public attention manifests the current social public's interest in the event, a micro-data reflection of social focus and public demand. Network attention is conducive to the in-depth mining of surface information. Many scholars have done relevant research using public awareness. Feng et al. 14 found that the increase in Internet attention is conducive to the development of enterprises by matching China's A-share data and Internet search index. Lin et al. 15 found that online public attention can improve the prediction accuracy of USD/RMB exchange rate earnings. Zhang et al. 16 conducted in-depth mining through the social network attention factor and LDA topic model. They found that emotion analysis and deep learning are hot topics in AI and image processing. Many contemporary pieces of literature reflect that online attention can helpfully tap the needs of social industry, people, and disciplines. [17][18][19] At the same time, some scholars mine relevant information through professional academic platform literature. López Belmonte et al. 20 understood the degree of acceptance and application of methods in E-learning teaching through bibliometric analysis. Based on the quantitative analysis of Web of Science translation education literature, Yan 21 reveals that translation practice, implicit teaching, and translation ability are the hot spots of translation research. Kumar et al. 22 helped researchers to understand the development situation and mode of the Internet of Things (IoT) by exploring the hot spots and development trends of Industry 4.0 IoT-related literature. A professional academic platform can effectively reflect the frontier dynamics of discipline development and help the public capture hot-spot information at the current stage. Existing studies have done a lot of research from different angles of public attention and professional academic platform, laying a foundation for this article. However, the existing research has the following deficiencies and limitations: 1. Many scholars currently mine research hot spots based on a single platform, but they ignore the common ground between multiple platforms and lack in-depth thinking on the Baidu Index platform and professional academic platform. 2. As two ways to reflect social hot spots, whether there is a complex dynamic correlation between online attention and professional academic literature research is rarely demonstrated in the current literature. 3. Most existing studies are based on the hot exploration and mining of Chinese, mathematics, and foreign language disciplines. Few scholars have carried out an in-depth investigation of the high school information technology discipline.
Although there are some deficiencies in the current research, its discussion around multi-platform, multi-channel, and multi-disciplinary provided the theoretical basis for this article.
Considering deficiencies and gaps in existing research, this article proposes an RMPF framework based on relationship modeling and short-term prediction, and makes innovative breakthroughs from the following points: 1. Based on the proposed RMPF framework, this article uses the Pearson correlation coefficient, LOOCV, and other methods to analyze and demonstrate the relationship between discipline network attention and discipline literature research. 2. Based on the era of big network data, this article profoundly thinks about the common information reflected by the Baidu Index platform and China National Knowledge Infrastructure (CNKI) academic literature platform according to the RMPF framework and tries to extract the subject research hot spots from the multi-dimension of both platforms. 3. This article chooses the subject of high school information technology to carry on the research and discussion, and supplements the literature basis for the parallel progress of multi-disciplinary in elementary education through mining hot topics and predicting discipline development trends.
This article draws on the research basis of existing scholars and incorporates the RMPF framework idea to deepen theories and supplement data. It provides evidence for the current research and fills the research gap. Meanwhile, it also guides the development of information technology discipline in high school.

OUR PROPOSED RMPF FRAMEWORK
This study proposed an RMPF framework based on relationship modeling and short-term prediction based on existing literature. Figure 1 illustrates the RMPF framework. The RMPF is divided into three processes using a specific analysis method. Start: Prepare the data. Obtain the required data from CNKI and Baidu Index, respectively, and sort out the data to prepare for the follow-up experiment. Section 3.1 shows the detailed data.

F I G U R E 1 Our proposed RMPF framework
Step 1: Analyze the correlation between public attention and academic platform teaching research. This article draws on the existing practice for experimental study by word frequency analysis in the knowledge graph and Pearson correlation coefficient. 23-25 Section 3.2 presents detailed methods.
Step 2: Build the relationship model between public attention and teaching research from the perspective of high school information technology disciplines. Based on the practice of Huang et al., 26 this article uses LOOCV and relative error to build a model and analyze the model error. Section 3.3 shows detailed methods.
Step 3: Predict the literature quantity of professional academic platforms. Based on relevant practices, 27,28 this article predicts the discipline development dynamics by the ARIMA and multiple regression. Section 3.4 presents details.
End: Analyze and discuss the experimental results and draw practical conclusions. See the next section for more details.

Data source
Based on the data required by RMPF, this study obtained data from CNKI 29 and Baidu Index, 30 respectively. First, the full-text database of CNKI academic journals is selected as the statistical data source, and the theme is "high school information technology," the time range is "2011-2021," and the journal category is "all journals" as the retrieval conditions for advanced retrieval. A total of 3965 articles on "high school information technology" were selected through manual screening and elimination of themes such as conferences and forums. Then, the retrieved pieces of literature were exported in EndNote format by year and then imported into VOSviewer software to obtain keyword word frequency. Hot words are extracted through keyword frequency and taken as the research object of high school information technology subject.

Data acquisition
This article takes the selected keywords as the research object, and according to this study's purpose, the public attention data are obtained from Baidu Index. The subject teaching research data are obtained from CNKI.
1. Getting the data of discipline network attention On the Baidu Index platform, search the annual average value of each keyword's Baidu Index as the discipline's network attention data.
2. Getting the data of subject teaching research a. On the CNKI platform, search the annual number of articles published by each keyword with keywords as the retrieval condition, which is used as one of the data of discipline teaching research. These data are used to analyze the correlation between discipline attention and teaching research. b. On the CNKI platform, search the annual volume of topic literature in high school information technology with the theme as the retrieval condition used as the second data of subject teaching research. These data are used to build the relationship model between discipline attention and teaching research.

Relationship analysis
Based on the proposed experimental framework RMPF, this study divided the work into three parts. The first part is step 1 of RMPF, analyzing the relationship between public attention and professional academic platform teaching research:

3.2.1
Full counting of word frequency Keywords, as the concentration and refinement of the core content of a document, can better describe the content of scientific and technological literature; that is, they can represent the document's main content to a large extent. Keywords based hot spot identification method is the most commonly used analysis method, mainly includes word frequency analysis and co-occurrence analysis. The article uses the method of word frequency analysis in VOSviewer, which has two ways: binary counting and full counting. 31 If "X" appears three times in particular literature, the frequency obtained by the binary counting method is 1. Under the full counting method, the frequency of "X" is 3.
As the primary unit of word frequency analysis the extraction of high-frequency keywords should follow the following principles. The extracted words should be moderate because too few words may lead to an insufficient breadth of topic coverage and too many words may cause redundancy and triviality. Second, the selected high-frequency words should contain/cover the complete information on the research topic. Therefore, this article chooses the full counting method to analyze the phrase frequency to obtain the keywords that best reflect the content of the literature

Pearson correlation coefficient
Pearson correlation coefficient is a statistic that reflects the degree of linear correlation between two variables, and its value is between −1 and 1. P X,Y , represents the correlation coefficient and defines the degree of correlation. The higher |P X,Y | is, the higher the degree of correlation is. Its calculation formula is as follows: where cov(X, Y ) represents the covariance, x y is the standard deviation of two variables, n is the sample size, and X i , Y i is the actual value of two variables. The judgment criteria of correlation degree between variables are as follows: 0.8 < P XY < 1.0: robust correlation; at 0.6 < P XY < 0.8: strong correlation; 0.4 < P XY < 0.6: weak correlation; 0 < P XY < 0.4: very weak correlation/no correlation.

Relationship modeling
The second part is step 2 of RMPF, establishing the relationship model between public attention and teaching research on a professional academic platform. The model is a LOOCV model based on multi-variable linear regression. The independent variable is public attention under "high school information technology," the annual Baidu Index value of the extracted keywords. The dependent variable is the teaching research on the CNKI academic platform under "high school information technology," the annual publication volume of articles with the theme of "high school information technology." The period is from 2011 to 2021; that is, the relational model between them is established using the sample size of 11 years. Finally, the model's error is tested and analyzed to verify the rationality of the model.

3.3.1
Leave-one-out cross-validation LOOCV is a particular case of cross-validation. The core idea is to divide the samples into two sets for training and evaluating the test model. LOOCV means that in the data set composed of K data, K − 1 data are used as the training set, and the remaining data is used as the test set. Then choose the next one as the test set, the remaining K − 1 as the training set, and so on until all samples have been tested. This process is repeated for K times, and K models will be obtained in LOOCV. Then, the average value of index coefficients of K models is taken as the coefficient of the final model, which is the final model.

Relative error
To ensure the error is within the controllable range, this article tests the relational model according to existing scholars' model error test method. 32 The calculation formula is: where represents the relative error of the regression analysis model, lnX actual represents the actual value; lnX predict represents the predicted value.

Prediction analysis
The third part is step 3 of RMPF, predicting the teaching research results of the number of articles on the CNKI academic platform with "high school information technology" as the theme. First, based on the attention data of keywords from 2011 to 2021, the article predicts the social attention of keywords from 2022 to 2024 using the ARIMA. Second, based on the relationship model between public attention and teaching research on professional academic platforms, the article predicts the publication volume of teaching research from the perspective of "high school information technology" in 2022-2024.

ARIMA model
The literature review shows that many fields, such as economy, agriculture, medicine, and so on, have done much research using the ARIMA. 33,34 Considering the timeline characteristics of the sample data, this article uses Python to construct an ARIMA model to predict the network change characteristics of hot words from the perspective of "high school information technology." The expression of the model is 35 : F I G U R E 2 ARIMA modeling process In formula (3), L is the hysteresis operator. ARIMA modeling requires static data, and its process is shown in Figure 2. The optimal model can be selected through four steps.
Step A: Judge the stationarity of the sequence. Make differential or logarithmic conversion for a nonstationary line to ensure that the series has stable characteristics.
Step B: Identify parameters. The model parameters were estimated by Akaike information criterion (AIC) method.
Step C: Test the model. The model error is tested according to root mean square error (RMSE) and the actual and fitting values of the samples to determine the rationality of the model parameters.
Step D: Use the model. Use models to predict future data.

Model checking
AIC and RMSE were selected in this study to check the model's goodness of fit. 36 Their calculation methods are, respectively: In formulas (4) and (5), L is the maximum likelihood function under this model, k is the number of model parameters, m is the number of predicted times, y predict(i) is the expected value, and y (i) is the actual value.

Data preprocessing
Data preprocessing includes the selection of sample journals for empirical study and, on this basis, the preliminary extraction of all high-frequency keywords of sample journals. First, the periodical database of CNKI is used for inquiry. Set retrieval conditions: take "high school information technology" as the theme, select the period from 2011 to 2021, respectively, and export all academic literature in that year as the source of sample journals. In other words, 3965 academic papers with the theme of "high school information technology" published in 2011-2021 were selected as samples for empirical research.
Second, with the help of VOSviewer and the full counting method in the text mining module, the keywords of all the sample literature can be extracted through the statistical analysis of the word frequency of the text. Taking 2011 as an example, the specific steps are as follows: 1. Export the academic literature with the theme of "high school information technology." Then, the hot words extracted from each year are formed into the keyword word frequency matrix. Also, take 2011 as an example; the specific steps are: all the keywords in 2011 are arranged in columns, and the word frequency corresponding to the keywords are placed in the next row. If 10 keywords were extracted in 2011, a 1*10 word frequency matrix is formed. Similarly, in 2012, if a keyword did not exist in the previous year, the new word will be added to the list, and the word frequency of the last year will be set to 0. By analogy, after the final statistical analysis, there are 90 keywords in the list, forming an 11*90 word frequency matrix.
Finally, each keyword is sorted according to the sum of the word frequency. Some keywords are listed (Table 1) to complete the second part of data pretreatment and preliminarily determine the hot keywords on "high school information technology."

Data screening
According to the keyword frequency list matrix, the keywords are further filtered. The screening of words follows: the word frequency of keywords is relatively high, related to the topic of "high school information technology," and Baidu Index includes it. The specific methods are as follows: First, according to the 11*90 keyword frequency matrix list, calculate the proportion of the total word frequency of each keyword, and arrange them in order by ratio. Then, the percentage of the sorted keywords was added up. The keywords with a total proportion of 90% were selected to filter further: 1. Remove the repeated words, such as "information technology" and "high school information technology," remove "information technology," "educational informatization," and "informatization," and remove "informatization." 2. Remove words unrelated to the topic of "high school information technology," such as "geography teaching," "Senior English," and so forth. 3. Remove words not included in Baidu Index, such as "core literacy," "Computational Thinking," "experimental teaching," and so forth.
At the same time, combined with the keyword recommendation function of the Baidu search engine, seven keywords were selected as the focus of teaching research from the perspective of "high school information technology." Finally, after the screening, filtering, and other operations, keywords are high school information technology, micro-course, flipped classroom, new curriculum reform, multimedia technology, effective teaching, and AI.

Relationship analysis
According to the first part of the RMPF, this article uses the Baidu Index and CNKI platform academic literature to explore the relationship between public attention and teaching research to demonstrate the complex correlation between network platforms and professional educational platforms.

Selection of relational indicators
This article takes the extracted keywords as the object and selects the Baidu Index value of keywords to measure public attention. The teaching research is measured by the number of CNKI documents of chosen keywords. Taking "micro-course" as an example, the specific step is as follows: Take each year as the unit, search the Baidu Index value of "micro-course" from 2011 to 2021 as a quantitative index of attention; Meanwhile, on the CNKI academic platform, the keyword is "micro-course," and the literature volume of each year from 2011 to 2021 is used as the quantitative index of teaching research, that is, the index system table of keyword "micro-course" from 2011 to 2021 is constructed. This way, the index data of other keywords are counted separately to build the relation table between public attention and teaching research from the perspective of "high school information technology."

Analysis of correlation result
According to the relation table between public attention and teaching research constructed by each keyword, and Pearson correlation coefficient (formula 1), the results were calculated by SPSS (Table 2). It can be seen from the table that under the perspective of "high school information technology," there is a significantly strong correlation between public attention and teaching research, indicating that the research hot-spots of academic literature tend to be more and more issues of public concern, that is, to a large extent, academic teaching research is a lateral exposition of things concerned by the public. Zhang et al. 16 also found that the cutting-edge hot knowledge in the subject field of text vocabulary mining based on social network attention has a particular frontier, timeliness, and accuracy.
In recent years, under the background of "Internet plus" teaching, the emergence of many new classroom teaching modes, such as micro-class, flipped classrooms, and multimedia technology has improved teaching efficiency, promoted students' learning enthusiasm, and stimulated high school students' innovative awareness of information technology. General High School Information Technology Curriculum Standards (2017 edition) 37 advocate AI in the classroom. Teachers should stimulate students' attention to AI to cultivate students' innovative thinking. Therefore, more attention is paid to applying new teaching methods, such as micro-class, multimedia technology, and flipped classrooms, to meet the challenges of diversified, personalized, and differentiated subject education in the digital environment. In curriculum reform and development, senior high school information technology subjects also pay more attention to creating an efficient class mode for effective teaching. Such the famous "121" mode in recent years, namely, 10 min of self-study, 20 min of interactive exploration, and 10 min of summary feedback and expansion, can be a reasonable interpretation of the classroom' truth, reality, novelty, activity, emotion, specialty, and effectiveness. Under the background of intellectualization, while advocating the implementation of multiple ubiquitous learning in education and teaching, the research results of its professional academic platform are also increasing. Therefore, from the perspective of the high school information technology discipline, public network attention positively correlated with the teaching and research results of the professional academic platform. It also reflects the applicability of Bradford's law and literature growth law to the Internet search index in the network environment. 38

Modeling analysis
Based on the second part of the RMPF, this article establishes a relationship model between public attention and teaching research under the perspective of "high school information technology" to predict the achievement trend of conducting research in the next few years and provide a reference for the development of subsequent disciplines.

Selection of independent and dependent variable
First, it is necessary to determine the model's independent and dependent variables. The independent variable selects the representative hot words under the perspective of "high school information technology" and uses the annual average of the Baidu Index to quantify. The dependent variable selects the yearly number of CNKI papers with the theme of "high school information technology," reflecting the communication influence of the subject research. The sample data from 2011 to 2021 is selected to establish a regression model of multiple independent and dependent variables. Table 3 defines variables.

Construction of relationship model
As the research object is from 2011 to 2021, the sample data is small, and the model's reliability is considered, the article adopts the LOOCV method to establish the model based on Python coding. The steps are as follows: in 11 groups of sample data, one data group is taken for testing each time, and the other 10 groups are used to train the model. Such training and testing are repeated 11 times. Since 10 different samples are used to train the model each time, the model obtained each time is not the same, but the model obtained by 11 training is approximately the same. A multiple regression model was built based on seven independent variables:

Relationship Indicators Variables Symbol
Teaching research Annual number of documents by CNKI Annual number of papers with the theme of "high school information technology" Effective teaching x 5 New curriculum reform x 6 AI x 7 y = b + a 1 x 1 + a 2 x 2 + a 3 x 3 + · · · + a n x n + n .
In formula (6), y is the explained variable, the annual number of CNKI articles with the theme of "high school information technology." b stands for intercept, x n is the independent variable, a is the coefficient of the independent variable, is the random perturbation term. Considering that the units of variables are inconsistent, the logarithms of independent and dependent variables are taken, respectively, to obtain formula (7): ln y = b + a 1 ln x 1 + a 2 ln x 2 + a 3 ln x 3 + · · · + a n ln x n + n .
The independent and dependent variables from 2011 to 2021 were substituted into formula (7). The mean values of coefficients of the respective variables were obtained through LOOCV model training, thus constructing a multiple linear regression model (formula 8).

Error analysis
The independent variables from 2011 to 2021 were substituted into the regression model (formula 8) to obtain the predicted value from 2011 to 2021. Second, the expected and actual values are analyzed by regression error (formula 2) to get the relative error of each year and then the average relative error (Table 4). It can be seen from the table that the relative error is distributed in the range of 0.1%-7.64%, while the average relative error in the recent 10 years is about 2%.
In addition, the relationship model RMSE = 0.1704. The model error is small, the accuracy of the relationship model is high, and the model construction is reasonable.

Short-term predictive analysis
Based on the third part of the RMPF and the relationship model between public attention and its teaching research, this article predicts the literature volume of CNKI teaching research under the perspective of "high school information technology" in the next 3 years, to provide a reference for the development of information technology discipline. Specific steps: 1. Prediction of Baidu Index values of each keyword: First, obtain the historical data of each keyword from 2011 to 2021, and then predict the Baidu Index value of each keyword in the next 3 years (2022-2024) according to ARIMA model (formulas [3][4][5], which is used as the independent variable of the relationship model.

TA B L E 4 Error analysis of regression model
Year 2. Prediction of the amount of CNKI articles themed "high school information technology": According to the multiple linear relationship model of public attention and teaching research, combined with the independent variable in (1), forecast the publication amount of teaching research in the next 3 years (2022-2024).

Application verification of ARIMA
Take the keyword AI for example. Figure 3 shows the Baidu Index value from January 1, 2011, to December 31, 2021, with 132 sample data, taking the monthly average as the statistical unit. It can be seen from the original figure that the data shows fluctuating changes and nonlinearity. Combined with the unit root test of the original sequence (p = 0.4049), the series belongs to a nonstationary line. Therefore, it is necessary to convert the original data into first-and second-order differences to satisfy the sequence stationarity. Based on Figures 3 and 4, it can be seen that the second-order difference is in a stationary state.
To ensure the stationarity of the second-order difference sequence, the mean and variance of the second-order difference sequence need to be tested ( Figure 5). Combined with the unit root test results in Figure 5 and Table 5, it can be seen that the second-order difference sequence meets the stationarity requirements when the statistical value is less than the critical value of 1%. Therefore, the parameter d of the model is 2. Critical value (1%) −3.486535e+00 Critical value (5%) −2.886151e+00 Critical value (10%) −2.579896e+00 In order to determine the p and q parameters, this study calculated and selected the optimal model ARIMA(0, 2, 1) according to the modeling process of ARIMA. And to further determine the applicability of ARIMA(0, 2, 1), this article tests and judges through the fitting diagram of actual value and predicted value ( Figure 6) and error analysis. As can be seen from Figure 6, the predicted value is consistent with the actual value, and RMSE = 14.5763, indicating a relatively small model error. Therefore, the model ARIMA(0, 2, 1) is feasible to predict the network attention of AI hot words from the perspective of high school information technology.

Prediction result of keyword attention
According to the Baidu Index data of each keyword over the years, the ARIMA model predicts the remaining keywords, such as "micro-course" and "multimedia technology." The prediction process adopts the validation process of the AI model mentioned above to build the prediction model of each keyword. Figure 7 shows the predicted results, where the x-axis represents the time, and the y-axis represents the search frequency of keywords on the Baidu platform, namely, the Baidu Index value.

F I G U R E 7 Prediction results of each keyword
According to the prediction results in Figure 7, micro-course, flipped classrooms, and AI has attracted significant attention in discipline development. The attention has been increasing year by year. That is to say, information technology teaching in high school increasingly emphasizes students' subjectivity and exercises students' self-learning ability and thinking mode. Ma et al. 39 said AI education is conducive to developing students' thinking consciousness of solving problems. The new curriculum reform, multimedia technology, and effective teaching show a gradual fluctuation trend, indicating that the focus of attention on "high school information technology" still tends to these aspects in the next few years. However, the main focus has gradually shifted to micro-video, cloud classroom, AI, subject integration, and other elements.

Prediction result of academic literature
The predicted value of each keyword from 2022 to 2024, combined with the multiple linear relationship model between public attention and teaching research, then forecast the number of CNKI articles themed "high school information technology" in the next 3 years. The prediction of the number of CNKI documents follows the following ideas.
According to the multiple regression model established above, the index value of each keyword in 2022, 2023, and 2024 are, respectively, substituted into the model (formulas 8-10) logarithmic value of articles volume in corresponding years could be obtained. The number of CNKI articles under the "high school information technology" theme in 2022-2024 can be obtained by further anti-logarithm processing ( Figure 8).
As shown in Figure 8, in the next 3 years, the annual literature volume of the CNKI academic platform with the theme of "high school information technology" increases first and then decreases, with a slight fluctuation of about 350 papers. The reason for the slight increase in the literature quantity may be comprehensive practice curriculum guidelines of primary and secondary schools (2021 edition) issued and the pilot implementation of the application of "5G + smart education," which makes more and more teachers of primary and secondary school and education scholars focus on the relevant documents, as well as the interpretation and implementation of the documents, to promote the teaching achievements output of the discipline. In the following 2 years, the amount of literature showed a slight downward trend. The possible reason is that the research hot-spots of information technology education in primary and secondary schools present limited diversity, mainly focusing on students' information literacy, professional development of information technology teachers, discipline integration, inquiry-based learning, and information technology education in underdeveloped areas such as rural areas. 40,41 The content of teaching and research is homogeneous, leading to a decline in the number of articles published.
High school information technology courses have more depth and breadth than other subjects, which can open students' closed knowledge system, liberate their thinking, enable students to obtain different knowledge from multiple angles, and learn to construct and internalize gradually. There is still a lot of research space on information technology, such as the innovation of subject teaching mode, the cultivation of students' creative ability under intelligence, and the improvement of teaching design ideas under the new media literacy. Therefore, it is necessary to broaden the research field of information technology, fully excavate the critical information points within the subject, and improve the quality of information technology teaching and research. Meanwhile, break the relatively independent state of the research field, opens up new research space, extends and expands the field theme, and improves the quantity of academic literature.

DISCUSSION
Based on the proposed RMPF, this study uses various methods to conduct relationship modeling and short-term prediction, demonstrates the relationship between public attention and teaching research from a micro perspective, and discusses the development of high school information technology. In addition to providing evidence for existing research, this article also puts forward some suggestions for information technology teaching and research. For example, literature research can be mined and innovated from the following aspects. Teaching mode innovation of information technology; How to integrate students' information emotion and information ethics in teaching design; The cultivation of students' digital literacy in the digital age; AI module reflects the specific aspects of students' core literacy. However, this study still has some deficiencies: first, this study is based on Baidu Index, and there are some limitations in data collection. The subsequent research can be further extended to 360 Index, and so forth. A comparative analysis of the two indexes can also be conducted. Second, when constructing the relationship between the two, the public attention of keywords is taken as the independent variable, and the number of articles published is taken as the dependent variable. The argument is unidirectional, so the follow-up research can carry out a reverse statement to enhance the reliability of the bidirectional view. Finally, the applicability of the experimental framework RMPF proposed in this study to research objects or disciplines in other fields needs to be further implemented and demonstrated.

CONCLUSIONS
Based on CNKI and the Baidu Index platform data from 2011 to 2021, this study proposed an experimental framework RMPF based on relationship modeling and short-term prediction. Based on the RMPF, this article constructs a relationship model between public attention and teaching research and makes a short-term prediction of the number of articles with the theme of "high school information technology" in the next few years. This study draws the following conclusions: First, RMPF uses word frequency analysis of knowledge map to conclude that the hot words of "high school information technology" discipline are high school information technology, micro class, flipped classroom, multimedia technology, effective teaching, new curriculum reform, and AI. And the correlation test shows that the correlation coefficient between the subject attention represented by keywords and its teaching research is more than 0.65, which is strongly correlated. Second, RMPF constructs the relationship model between public attention and teaching research from the perspective of "high school information technology" based on LOOCV. Through the data test in recent 10 years (2011-2021), the average relative error of this model is about 2% and RMSE = 0.1704. The model has high accuracy, and the relationship model is reasonable. The model shows that in the era of big data, there is a two-way interactive relationship between the network attention of disciplines and their academic platforms. Finally, using the ARIMA, RMPF predicts the attention trend in the high school information technology discipline. It is found that although the increase in the attention of the micro class, flipped classroom, and AI is small, they all remain above 1000, indicating that the focus of public attention in the future will still be the integration of information technology subjects and modern teaching technology. In addition, the number of high school information technology subject documents may decline in the next 3 years after 2021, with an average annual publication of about 350 papers. The overall publication volume decreased by 120 compared with 2020. Therefore, it is necessary to break the limited diversity of topics and the homogeneity of content in the information technology subject, fully excavate the key information points, expand the research space in the subject field, and improve the quality and quantity of literature research.
This study demonstrates the relationship between Baidu Index and CNKI and clarifies that they are interdependent and interactive as two platforms for obtaining information hot spots. Second, this study helps understand the research trends of information technology discipline in high school and provides a reference for the attention direction of the subject. The contributions of this research are: (1) based on the existing studies, this article clarifies the dynamic relationship between public attention and professional academic platform through experimental demonstration to supplement the argument for the current research. (2) Based on high school information technology, this article profoundly analyzes and excavates its research hot-spots, providing a reference for the direction of teaching research on this subject. (3) This article proposes an RMPF framework and discusses the theory and experiment through many data, methods, and models to provide a methodological reference for relevant researchers. This article's original intention and persistence contribute to the development of the information technology subject.