Mining public behavior patterns from social media data during emergencies: A multidimensional analytical framework considering spatial–temporal–semantic features

Studying human behavioral patterns from social media data is an important part of emergency management. However, the multidimensional characteristics of social media data have rarely been fully utilized. This study proposes a multidimensional analytical framework for social media user behavior that integrates time–geographic–semantic features. The framework defines the spatiotemporal semantic multidimensional relationship of social media user behavior and maps it into a time–geographic–semantic (TGS) cube, based on which a TGS‐weighted similarity measure was created. We then applied a spectral clustering algorithm to cluster the trajectories of the user behavior. Subsequently, a prefix‐projected pattern growth algorithm was used to mine frequent semantic patterns from the clustering results and analyze their spatiotemporal distribution characteristics. Taking the COVID‐19 pandemic as a case study, we analyzed Weibo user behavior in China from January 9 to March 10, 2020. The results showed that the clustering of TGS similarity was better than that of the commonly used edit distance on real and longest common subsequences. Five semantic patterns of public responses were identified during the COVID‐19 pandemic. The semantic patterns of categories 1, 2, 4, and 5 were “spindle‐shaped,” meaning that their core semantics were stable and concentrated on one or several topics despite the frequent semantic changes in the middle stage. Category 3 was “wave‐shaped,” indicating that their semantics fluctuated between serval topics during the pandemic. This discovery shows that the framework is suitable for analyzing and comprehensively understanding public behavior during pandemic emergencies. This framework has good universality and great potential for extension to other emergencies.


| INTRODUC TI ON
With the rapid development of location-based services, social media have generated massive amounts of humanrelated data, providing enormous opportunities for geographic information science research.These emerging "human sensor" technologies of social media can provide an innovative approach to reveal human behavioral patterns during the preparedness for, response to, and recovery from disasters in real time (Koylu, 2019;Miller & Goodchild, 2015;Steiger et al., 2016).Human behavior is defined as "something done, felt, or thought in response to a situation or event, which includes not only direct actions but elements related to emotions" (Dunkel et al., 2019).User behavior in emergencies is characterized by irrationality, strong infectivity, conformity, and abundant appearance in a short time (Yuan, 2010), which is a great challenge for urban disaster risk management.
Group behavior patterns refer to the regularities of public behavior such as space-time distribution, topic evolution, and behavior laws.Studying group behavior patterns can provide a scientific basis for the governance of public opinion, resource allocation, and strategic planning in urban emergency management (Chen et al., 2020;Hong et al., 2021;Loo & Leung, 2017;Meerow et al., 2016;Wang & Ye, 2018).Traditional public behavior acquisition methods (e.g., questionnaires and interviews) have the disadvantages of time consumption, high cost, poor timeliness, and difficulty of large-scale implementation (Cao et al., 2017;Chen et al., 2011;Jiang et al., 2022;Wang & Ye, 2018).With the rapid development of location-based services, social media has the advantages of being dynamic, real time, large-scale, and rich in semantics, making descriptions of human behavior more abundant and accurate (Heikinheimo et al., 2022;Sui & Goodchild, 2011).Social media data have been extensively used to study group behavior patterns in geographic information science (Hong et al., 2021;Miller & Goodchild, 2015;Zheng et al., 2019).
Despite the growing use of social media data for mining human behavioral patterns, constraints such as dimensional differences and heterogeneity make the correlation and quantification of time-geographical−semantic (TGS) dimensions difficult, thereby causing methodological challenges in revealing potential public behavioral patterns (Li et al., 2018;Wang et al., 2018).Most existing studies begin by studying dimensions separately and then simultaneously analyzing them from a two-dimensional perspective based on spatial statistics, machine learning, and deep learning techniques (Dunkel et al., 2019;Kryvasheyeu et al., 2015;Yuan et al., 2020).For example, scholars have separately investigated temporal, spatial, and semantic characteristics by describing the temporal dimension as statistical charts (e.g., histograms and line plots) of social media texts over time; that is, the spatial dimension as spatial distribution maps (e.g., kernel density maps) and the semantic dimension as statistical charts (e.g., pie charts) on different topics.For simultaneous analyses, these studies combined the spatial and semantic during the COVID-19 pandemic.The semantic patterns of categories 1, 2, 4, and 5 were "spindle-shaped," meaning that their core semantics were stable and concentrated on one or several topics despite the frequent semantic changes in the middle stage.Category 3 was "wave-shaped," indicating that their semantics fluctuated between serval topics during the pandemic.This discovery shows that the framework is suitable for analyzing and comprehensively understanding public behavior during pandemic emergencies.This framework has good universality and great potential for extension to other emergencies.
dimensions by exploring the spatial distribution of social media text topics and integrating the time and semantics by investigating the temporal trend of text quantity under different topics.Although methods and concepts have been developed to analyze public behavior by integrating three-dimensional characteristics (Dunkel et al., 2019;Koylu, 2019;Koylu et al., 2019;Steiger et al., 2016), the lack of computable models and dynamic behavior descriptions necessitates the urgent study of a multidimensional analytical methodology for public behavior patterns integrating TGS dimensions.
Methodological approaches for mining social media user behavior patterns include similarity measurements of user behavior, user behavior clustering, and pattern recognition (Fedoryszak et al., 2019;Liu & Guo, 2020;Nguyen et al., 2016).Similarity measurements form the basis for clustering and pattern mining.Common similarity measurements (Chen et al., 2005;Jin et al., 2020;Tao et al., 2021;Vlachos et al., 2002), such as edit distance on real (EDR) and longest common subsequences (LCSS), are effective for one-dimensional time-series data and two-dimensional trajectory data, but their applicability in three dimensions is indistinct.For multidimensional trajectories, some researchers have proposed multi-attribute-based similarity measures such as MSM (Furtado et al., 2016) and MUITAS (Petry et al., 2019), which are mostly used for mobility trajectory data rather than social media data.
Our goal is to analyze the spatiotemporal dynamics of the appearances of a topic in social media texts during disaster events by proposing a multidimensional analytical framework of social media user behavior that integrates spatial-temporal-semantic features: (1) Based on the results of topic extraction, a TGS cube (TGSC) was constructed to map public behavior into the spatiotemporal semantics space and realize three-dimensional dynamic modeling at different scales; (2) a TGS-weighted behavioral trajectory similarity measurement algorithm was designed to convert the visualization model into a quantifiable index; (3) spectral clustering algorithm was utilized to identify user communities with similar behaviors, based on which frequent semantic behavior patterns of different groups were mined.We then analyzed the spatial-temporal-semantic characteristics of the behavior patterns.This study is expected to provide insights into discovering potential public response behavioral patterns to disaster events using large-scale social media data.

| DATA AND ME THODS
This study proposes a method for mining social media user behavior patterns that combines the features of time, space, and semantics.Figure 1 shows the framework, including the following.(1) Data and user behavioral definitions.Based on collected social media data, the user topics were extracted and converted into behavioral trajectory data.(2) Construction of the TGSC.The relatively independent dimensions of time, space, and semantics were integrated into a cube to clearly and intuitively express the multidimensional and dynamic characteristics of user behavior.(3) TGS-weighted similarity measurements.This method decomposes the similarity of user behavior into the dimensions of time, space, and semantics to balance the multidimensional characteristics of the behavioral trajectories of a user.(4) Clustering user behavioral trajectories.Based on the TGS-weighted similarity measurement, a spectral clustering algorithm was applied to obtain trajectory clustering, thereby identifying social media user groups with similar behavioral patterns.(5) Mining frequent patterns of social media user behavior.Based on the clustering of user behavioral trajectories, the frequent semantic behavior patterns and corresponding user groups of each user trajectory set were extracted using the prefix-projected pattern growth (PrefixSpan) algorithm, and their spatiotemporal distribution characteristics were analyzed.

| Data and user behavioral definition
Data are collected through Sina-Weibo (Twitter-like microblogging services in China) APIs with the Chinese keywords "pneumonia" and/or "coronavirus" related to COVID-19.The given temporal frame is from January 9, 2020 to March 10, 2020.The original Weibo texts contained interference information, such as spaces, HTTP links, and punctuation marks.Raw data must be filtered to eliminate noise and improve segmentation efficiency (Liu et al., 2022).In this study, Python regular expressions were used to filter the original social media texts, removing interference information, stop words, low-quality texts, and duplicate texts.After text filtering, we obtained 3,427,933 Weibo texts were obtained, of which 197,118 had geographical coordinates.We used the topic extraction and classification model proposed in our previous studies (Han & Wang, 2019, 2022) to process Weibo texts.The model combines the latent Dirichlet allocation (LDA) model and the random forest (RF) algorithm.The LDA model was utilized to realize semiautomatic sample labelling, which efficiently generated sample data for the RF.Furthermore, RF was used to classify Weibo texts into different topics by training a text classification model.In the experiment, we randomly selected 10,000 Weibo texts to generalize topics by the LDA model, which was implemented by the "Gensim" package in Python.Through repeated experiments, the optimal number of initial topics was determined to be 20.We assigned each Weibo text to a topic that most closely resembled the probabilities in the document-topic lists.Based on the topicterminology lists, 20 topics were generalized into 15 by merging similar topics and discarding irrelevant ones Based on the following definition, Weibo texts were converted into user trajectory data, and the behavioral trajectories of 9881 Weibo users during COVID-19 were obtained.A histogram of the trajectory length, which exhibits a long-tailed distribution, is shown in Figure 3.The maximum and minimum values of the trajectory length were 518 and 2, respectively, with an average of 4.858, a median of 2, and a standard deviation of 13.566.The 80% percentile of trajectory length was 4.
The original Weibo data included user ID, time, text, and location information.After data pre-processing and topic extraction, the original data were converted into user ID, topic, longitude, latitude, and time.Each Weibo message was regarded as a user-behavior point, and its data structure was <topic, longitude, latitude, time>, as shown in Definition 1 (Figure 4).Definition 1. (user behavior).When user u sent a Weibo text related to disaster events, we considered that the user had exhibited one user behavior.User behavior can be expressed as a fourdimensional vector r i = s i , x i , y i , t i ; that is, user u posts on Weibo with topic s i at position x i , y i at time t i .
F I G U R E 2 COVID-19 related topics and corresponding codes.Definition 2. (behavior change event).In the timeline of user u, when the topic of the text published by user u changes; for example, the topic changes from A to B; then, it is considered that the user u has a behavior change event, expressed as

| Construction of TGSC
The TGSC maps the dimensions of time, space, and semantics of the public behavior stated in Definition 1 to a cube.At the macro level, the interaction between online public behavior and urban geographic space can be displayed on the scale of counties.At the mesoscale level, we can study the spatiotemporal semantic characteristics of group behavior at the city level.At the microlevel, the changes of individual behavior in the spatial-temporalsemantic dimensions can be revealed.The TGSC framework is illustrated in Figure 5. F I G U R E 4 User behavior example.

| Gridding of spatial data
This study transforms the longitude and latitude of user behavior data into a spatial index.The main idea is divided into three steps.First, we convert the longitude and latitude coordinates into projection coordinates, then divide the projection coordinate range into regular grid cells and place different trajectory points into different grid cells, with each grid cell corresponding to a unique spatial index value.
The projection coordinate range of the trajectory data is defined as X max , X min , Y max , Y min , and the cell width is defined as d.The number of columns and rows are given by Equations ( 1) and ( 2), respectively, as follows: Suppose x i , y i are the coordinates of the user's behavioral point.Column ID col i and row ID row i of the grid cells corresponding to this point are expressed in Equations ( 3) and ( 4), respectively, as follows: Then, the spatial index of point is represented by Equation ( 5) as follows:

| Behavioral trajectory cube of user
Given a time interval t, the behavioral trajectory of user u can be expressed as uid is the unique ID of user u and identification of the trajectory.s i , g i , t i (1 < i < n) are the triplets of the user behavior points.s 1 , g 1 , t 1 , s 2 , g 2 , t 2 , … , s n , g n , t n is the sequence of action points in the trajectory, and n is the length of the trajectory, that is, the number of different topics in the trajectory, where X is the semantic dimension, Y is the spatial dimension, and Z is the time dimension; a three-dimensional point represents a user's behavior.The line between the points in the Z-axis direction represents the change of user behavior.

| TGS-weighted similarity
Based on the TGSC model described in Section 2.2.1, the following concepts were defined: represents the moment of user u generating topic s i , t j is the moment when user u changes the topic.Definition 7. (TGS trajectory).The TGS trajectory of user u is composed of a group of ordered trajectory points, TGS u = tgs 1 , tgs 2 , … , tgs i , … , tgs n .Each TGS trajectory point tgs i can be expressed as s i , g i , tin i , tout i , where s i is the user's topic, g i is the spatial index of the user's location, tin i is the moment when user u generates topic s i , tout i is the moment user u's topic changed, and n is the number of trajectory points.Let the time threshold be t .Time similarity of a and b equals 1 when tdis (a, b) ≤ t and equals 0 otherwise (Equation 7).
The tsim between each point in TGS A and TGS B was calculated to generate an m × n time similarity matrix (Equation 8).The gsim between each point in TGS A and TGS B was calculated to generate an m × n geographic similarity matrix (Equation 11).

| Semantic similarity
The ssim between each point in TGS A and TGS B was calculated to generate an m × n semantic similarity matrix (Equation 13).

| Similarity fusion
The similarity matrix TGS m×n of two trajectories TGS A and TGS B can be obtained using the weighted sum of time and geographic and semantic similarity matrices, as shown in Equation ( 14).The equal weight of , , is between 0 and 1, and the sum of weights is 1.
With the idea of dynamic programming, the similarity value of trajectories TGS A and TGS B is the mean value of the maximum path sum of the similarity matrix TGS m×n .The maximum path sum s, s(i, j) represents the maximum path sum from the upper left corner to the element (i, j), as shown in Equation (15).
Let s(m − 1, n − 1) be the maximum path sum of the similarity matrix TGS m×n , The similarity values Sim TGS A , TGS A can be obtained using Equation (16).

| Clustering of user behavioral trajectory
Based on similarity measurements, clustering algorithms can identify user groups with similar behaviors.The DBSCAN (density-based spatial clustering of applications with noise) algorithm (Heikinheimo et al., 2022;Wang et al., 2018), K-means (Alsayat & El-Sayed, 2016;JafariAsbagh et al., 2014), K-medoids (Chen et al., 2017), Hierarchical Clustering (Hong et al., 2021), and Spectral Clustering (Hong et al., 2017) are extensively used for pattern mining based on a predefined similarity matrix.Among them, spectral clustering is easy to implement, has low complexity when processing high-dimensional data, and is generally superior to traditional clustering algorithms, such as the k-means algorithm (Park & Zhao, 2018;von Luxburg, 2007).In contrast to k-means clustering, which is not applicable to non-convex sets, spectral clustering can solve general problems, such as intertwined spirals, and can be implemented efficiently even for large datasets (von Luxburg, 2007).Consequently, this study uses a spectral clustering algorithm to cluster social media user trajectory data and identify user groups with similar behavioral changes.scikit-learn (https:// sciki t-learn.org/ stable/ ) (also known as sklearn) is a free machine-learning library for Python.
Spectral clustering is an unsupervised clustering approach; hence, the real cluster label is unpredictable.
Therefore, the internal evaluation indicators were more suitable for clustering.This study compared the running times of four internal evaluation indicators (Xu & Tian, 2015): the Silhouette Coefficient (SC; Rousseeuw, 1987), the Davies-Bouldin index (DBI; Davies & Bouldin, 1979), Calinski-Harabasz index (CH; Caliński & Harabasz, 1974), and Dunn's index (Dunn, 1973).As shown in Figure 7, the DBI and CH with the single shortest running time for different numbers of clusters were utilized as metrics for evaluating the clustering algorithm. (13) The DBI is an internal evaluation scheme used to validate clustering using quantities and features inherent to the dataset (Ros et al., 2023).The DBI uses the ratio of within-cluster distance (the similarity of the same clusters) to between-cluster distance (the difference between different clusters) as the evaluation index.The smaller the ratio of the within-cluster distance to the between-cluster distance, the better the data concentration of the clustering results.
CH is a metric used to calculate the goodness of a clustering technique based on the ratio of separation to compactness (Caliński & Harabasz, 1974).The compactness within the cluster is the sum of the squares of the distance between each point in the cluster and the center of the cluster.The separation of the clusters is the sum of the squares of the distances between the center of each cluster and the center of the dataset.Thus, a higher CH value indicates that the clusters were dense and well separated.
Therefore, we used the PrefixSpan library in Python to mine frequent semantic patterns from the clustering results of social media users' behavioral trajectories.
First, we define the minimum number of occurrences of sequential patterns (i.e., the minimum support) and then determine the frequent sequential patterns that meet the minimum support requirements using a prefix projection (Pei, 2001).The relevant definitions are as follows: | 69 topic sequence of each user finally consists of a group of "topic codes."An example of a generated user topic sequence is presented in Table 1.

| Visualization of TGSC
We constructed a Weibo user behavior cube related to COVID-19 based on the TGSC model proposed in this study.In spatial data gridding, the latitude and longitude coordinate (WGS84) range of China is (50.43, 5.43, 133.66, 75.99).The WGS84 coordinates were transformed into a Pseudo-Mercator projection.The projection coordinate ranges of the trajectory data X max , X min , Y max , Y min were approximately (6,521,079.996, 605,371.717, 14,878,963.139, 8,459,168.105).The north-south and east-west distances in China are 5200 and 5500 km, respectively.We set the grid cell d to 100 km.For convenience, the number of rows and columns in the regular grid is approximately 50 × 50.The spatial index ID of each behavior point corresponding to the grid cell was calculated to generate a trajectory data triplet.This study used Plotly (a Python graphing library [https:// plotly.com/ python/ ]) to visualize the trajectory cube.As shown in Figure 8, the points in the cube represent user-behavior points, the colors of the points represent different topics, and the lines represent changes in user behavior.As shown in Figure 8b, the visualization results of Weibo users' behavioral trajectories during the COVID-19 pandemic showed a high-density columnar distribution.From the perspective of time semantics (Figure 8c), before January 19, the topic change was relatively sparse; after January 19, it gradually intensified.Topics 0-10 were distributed over the entire time range, whereas topics 11-13 were relatively sparse before February 10, and gradually became denser after February 10.Topics 14 and 15 were primarily in the later stages.The cube shows that the spatial distribution of user behavior did not change significantly with time and was mainly concentrated in the eastern region, whereas it was relatively sparse in the western region (Figure 8d).In the semantic-space distribution (Figure 8e), topics 15, 10, 6, 5, 4, and 3 were TA B L E 1 Example of a user topic sequence.

| Clustering of user behavior
Using the similarity measurement method described in Section 2.2.2, we built TGS similarity matrices using 9881 Weibo user trajectories during the COVID-19 pandemic.The weights , , represent the importance of temporal, spatial, and semantic features in calculating the similarity between two user behaviors.The higher the feature weight, the greater the impact on the similarity score (Furtado et al., 2016;Liu & Guo, 2020;Petry et al., 2019).
In this model, if a user's topic changes are more important, the weights of the semantic features can be defined as higher.If the most important feature is the spatial location of the user, the weight of the spatial feature can be enhanced.Because this study focuses more on the semantic attributes of user behavior than the spatiotemporal attributes, the weight of semantics is greater.After comparing the clustering evaluation indicators of different semantic weights through experiments (Tables S1 and S2), the weight of Equation ( 14 the results indicate that spectral clustering outperforms the K-medoids and hierarchical clustering algorithms.
Therefore, spectral clustering was used to complete the clustering experiments.
According to the statistics of the five cluster categories (Table 2), the number of user trajectories of each category accounted for 14. 79, 15.65, 24.16, 19.65, and 25.75%, respectively.Categories 3 and 5 had the highest and lowest numbers of user trajectories, respectively.A visualization of the clustering results is shown in Figure 12.The time series of categories 1 and 2 show a similar overall trend, reaching a peak in the early stage and decreasing in the later stage (Figure 12a).The sequences of categories 4 and 5 peaked during the early and later fluctuations, respectively.
Category 3 was unique and peaked in the medium term.Figure 12b shows the spatial distributions of the five cluster categories.The spatial distribution of the other five categories was similar.A more detailed spatial analysis was conducted for the five cluster categories, as described in Section 3.2.Figure 12c shows the semantic distribution of topics in each category.Category 1 mainly focuses on S3, S10, and S1.Category 2 focuses on S4, S5, S1, and S15.Category 3 focuses on S6, S4, S3, and S10.Category 4 focuses on S10, S1, and S5.Category 5 focuses on S6, S10, and S5.

| Mining of semantic pattern of user behavior
In this study, the PrefixSpan algorithm was used to mine the frequent semantic patterns of each category.The minimum and maximum lengths of frequent sequences are the required parameters in PrefixSpan.As shown in Figure 2, the minimum values of the trajectory lengths were 2. The mean values were 4.858.Further, 80% of the trajectories were less than 5 in length.Moreover, we discard the minimum value to avoid the impact of extreme values.Therefore, we set the minimum and maximum sequence lengths to 3 and 5.The Python prefixspan (https:// pypi.org/ proje ct/ prefi xspan/ ) library was used to mine the top 20 patterns.Figure 13 shows a visualization of the five semantic patterns.Figure 14 shows the spatial distribution of the five semantic patterns.
As shown in Table 3, the semantic pattern of category 1 mainly consisted of sequence d, and the numbers of sequences c, b, and a gradually decreased.The highest number of initial and final topics was for "S3: criticizing bad habits," while the number of intermediate topics consisted mainly of "S10: staying at home and taking necessary precautions," "S3: criticizing bad habits," and "S6: spreading positivity and encouragement."Therefore, the semantic model of category 1 changed mainly between the three topics of "S10: staying at home and taking necessary precautions," "S3: criticizing bad habits," and "S6: spreading positivity and encouragement." Figure 14a shows the spatial thermal map distribution of category 1, which was scattered across Beijing, Tianjin, Wuhan, Chengdu, and Guangzhou.
The semantic change sequence of category 2 was dominated by sequence a, and the number of sequences b, c, and d decreased accordingly.The semantic pattern of category 2 can be summarized as changing between "S4: factual comment," "S5: taking scientific protective measures," and "S15: staying at home and taking necessary precautions."Figure 13b shows the spatial distribution of category 2, which formed hotspot centers in Wuhan, Beijing, and the Pearl River Delta.The semantic change pattern in category 3 was dominated by sequence b.The overall semantic pattern can be summarized as a transformation between "S6: spreading positivity and encouragement/S4: factual comment/ S10: staying at home and taking necessary precautions/S3: criticizing bad habits" in the early stage, and finally converged to "S6: spreading positivity and encouragement/S4: factual comment/S3: criticizing bad habits." As shown in Figure 14c, the spatial distribution of category 3 is centered in Wuhan and Beijing.
The semantic pattern of category 4 can be summarized as changing from "S10: staying at home and taking necessary precautions" and "S5: taking scientific protective measures" to other topics in the earlier stage, and finally it focuses on "S10: staying at home and taking necessary precautions," "S1: fear and worry," and "S5: taking  constructed a TGS-weighted similarity measurement algorithm to mine the behavior patterns of different groups through multidimensional integration.The developed analytical framework can be fully utilized for comprehensive mining and quantitative analysis of public behavior during disaster events.The proposed framework is universal and can be applied to data with similar structures in other disaster events (fields).
Weibo data related to the COVID-19 pandemic were comprehensively analyzed in terms of time, space, and semantics for the experimental verification of the developed analytical framework.The results show that public behavior during the COVID-19 pandemic is complex and changeable.We obtained five semantic patterns using clustering and frequent pattern mining.The semantic patterns of categories 1, 2, 4, and 5 were spindleshaped; that is, the initial topic was the same as the end topic, with changing topics in the middle, and different categories had different initial and end aggregation topics.Category 1 was dominated by "S3: criticizing bad habits."Category 2 was dominated by "S4: factual comment."Category 4 was dominated by "S10: staying at home and taking necessary precautions."Category 5 was dominated by "S6: spreading positivity and encouragement."These observations indicate that, although the semantic patterns of the above three groups have undergone various changes, their core semantics are relatively stable and focus on a single topic.Category 3 was wave-shaped; that is, the initial, middle, and end topics were the same, changing, and fluctuating within a fixed number of topics, and the proportion of each topic varied during different periods.Obvious geographical differences were observed in the semantic changes of the users.In emergencies, this framework can provide decision support to urban managers regarding emergency response, public opinion monitoring, restoration, and reconstruction.
Furthermore, the interaction between cyberspace and geospatial information has emerged as a new field of geographic research in this information age (Gao et al., 2019).We demonstrated methods for the collection and integration of information on public behavior from cyberspace, the visualization of such information in geographic space, and intelligent analysis of the evolutionary pattern of public behavior by linking the characteristics of cyber and geographic spaces.This approach enables the quantitative analysis of public responses to disaster events and realizes a transformation from static to dynamic, two-dimensional to three-dimensional, and cyberspace to geographical space.Moreover, the framework proposed in this article extends the research on the multidimensional modeling and mining of geographic semantic data.The traditional spatiotemporal cube takes two-dimensional geographic coordinates as the X-Y axis and time as the Z-axis.Based on the traditional spatiotemporal cube, this research converts two-dimensional geographic coordinates into a one-dimensional spatial grid and fuses them with the topic and time information of user behavior to obtain the spatial-temporal-semantic triplet, which is mapped into the traditional spatiotemporal cube, and clearly and intuitively realizes the spatial-temporal-semantic integrated modeling of user behavior.
Nevertheless, this study had some limitations.First, the spatial scale (d = 100 km) used in the experiment was empirically defined, and the TGS cube was sensitive to it.Excessive spatiotemporal units may lead to data mixing and information loss, and overly small spatiotemporal units may cause data sparsity and overfitting of patterns.Further research is required to investigate the impact of different spatio-temporal scales on the research results.By exploring the optimal spatio-temporal scale, we can improve the accuracy and effectiveness of the framework and gain a deeper understanding of the spatiotemporal dynamics of public behavior and semantic changes on social media.Second, the study focused on the spatiotemporal distribution of semantic changes and did not provide a detailed analysis of the underlying factors driving such changes.Furthermore, an investigation is needed to explore the effectiveness of emerging large language models (LLMs), such as bidirectional encoder representation from transformers (BERT) or generative pre-trained transformer (GPT), for topic modeling.Future studies should employ additional strategies to mitigate the representativeness and bias issues associated with Weibo data, such as incorporating data from multiple social media platforms and demographic information.Moreover, there is scope for exploring key parameter settings (e.g., the grid scale and similarity weight) and conducting sensitivity analyses to evaluate their impact on the topic-modeling process.

| CON CLUS IONS
We propose a framework for mining social media user behavior patterns by integrating spatial-temporal-semantic features to describe the public semantic behavior pattern during emergencies and verify the effec- events.The developed framework is portable and can be applied to data with similar structures during emergencies.In addition, the framework can provide comprehensive situational awareness of public behavior during disaster events to help decision-makers better understand public opinion during emergencies and develop effective and targeted urban emergency management strategies.

F
Process framework for mining of TGS-based user behavioral pattern.training samples.The trained RF model was then used to classify the entire Weibo dataset.The Weibo texts were divided into 15 topics as shown in Figure 2.
Users may post multiple Weibos during a disaster.When a Weibo topic changes, the user's behavior changes.By collecting all the behavioral changes in the user's timeline, we can obtain the user's behavioral trajectory, which is composed of several behavioral points, expressed as {< topic, longitude, latitude, time > … < topic, longitude, latitude, time >}, as shown in Definition 2.
5 TGSC frame of the trajectory of public behavior.| 65 HAN et al.
Definition 3. (semantic sequence).A set of ordered semantic points S u = s 1 , s 2 , … , s n , which represents the topic sequence in user u's trajectory.Definition 4. (spatial sequence).A set of ordered position points G u = g 1 , g 2 , … , g n , which represents the sequence of the geographical position in user u's trajectory.Definition 5. (time series).A set of ordered time points T u = t 1 , t 2 , … , t n , which represents the time series in user u's trajectory.Definition 6. (stay time).The residence time of a user u with topic s i in position g i can be expressed Based on Definition 7, we acquired the trajectory points for each user behavior.The temporal, semantic, and spatial similarities between the trajectory points were calculated.Finally, the weighted sum is used to obtain the TGS similarity.Given that the TGS trajectories of users A and B are TGS A = a 1 , a 2 , … , a m and TGS B = b 1 , b 2 , … , b n , the following aspects are defined.F I G U R E 6 Example diagram of behavioral trajectory cube.
2.3.1 | Time similarityLet a ∈ TGS A , b ∈ TGS B , a = s a , g a , tin a , tout a , b = s b , g b , tin b , tout b , dist t 1 , t 2 = | t 1 − t 2 |.The time-distance function between a and b is defined in Equation (6) as follows: Let a ∈ TGS A , b ∈ TGS B .The spatial distance function between a and b is given by Equation (9): Let spatial threshold be g .The geographic similarity of a and b equals 1 when gdis(a, b) ≤ g and equals 0 otherwise (Equation 10).
Let a ∈ TGS A , b ∈ TGS B .The semantic similarity function between a and b is shown in Equation (12) as follows: (6) tdis(a, b) = 1 − dist tout a , tin b dist max tout a , tout b , min tin a , tin b

Definition 8 .
(topic sequence).A user's topic sequence consists of a group of elements arranged by time ST u = s 1 , … , s i , … , s n , where s i represents the ith topic in the user's topic sequence.The F I G U R E 7 Running time comparison of the SC, DBI, CH, and Dunn indicators.

Definition 9 .
(topic subsequence and super-sequence).For topic sequence = a 1 , a 2 , … , a n and = b 1 , b 2 , … , b m , and there is a number sequence1 ≤ j 1 ≤ j 2 ≤ … ≤ j n ≤ m, meeting a 1 ⊆ b j 1 , a 2 ⊆ b j 2 … a n ⊆ b j n ,then called , is the subsequence of , and is the super-sequence of .Definition 10. (support threshold).For a given set of multiple-user topic sequences ST = ST u m u=1 , where m represents the number of users in the topic sequence collection.Hypothetical topic sequence α is the subsequence of the topic sequence set ST, then the support threshold of topic sequence is in the proportion of the number of occurrences of to the total number of topic sequences in ST.Definition 11. (frequent sequence).Given minimum support threshold , if the support threshold of topic subsequence in topic sequence set ST exceeds , subsequence is called the frequent sequence of topic sequence set ST.

F
I G U R E 8 Visualization of Weibo user time-geographic-semantic cube related to the COVID-19 pandemic: (a) spatial distribution; (b) time-geographic-semantic cube; (c) observation from a temporal-semantic perspective; (d) observation from a temporal-spatial perspective; and (e) observation from a semantic-spatial perspective.
) was set to = 0.3, = 0.3 , = 0.4.We evaluated the effectiveness of the proposed TGS similarity measurement by comparing it with EDR and LCSS using the DBI and CH indices.Three different types of clustering methods were used in the experiment: K-medoids, spectral clustering, and hierarchical clustering.All these methods were implemented using scikit-learn in Python.We compared the single running results under different numbers of clusters ranging from 2 to 10.The lower the DBI value and the higher the CH value, the better the clustering effect.As shown in Figures 9 and 10, F I G U R E 9 DBI index of the proposed TGS to LCSS and EDR methods with three algorithms: (a) spectral clustering; (b) K-medoids; and (c) hierarchical clustering.the TGS similarity proposed in this study had a higher CH index value, lower DBI index value, and better clustering effect than the EDR and LCSS similarities in different clustering methods.Values corresponding to the lowest point of the DBI curve and the highest point of the CH curve were selected as the optimal number of clusters.Hence, the number of clusters was set to five.To validate the rationality of choosing spectral clustering, we compared it with hierarchical clustering and K-medoid algorithms by calculating the DBI and CH indices based on the TGS similarity.As shown in Figure 11, F I G U R E 1 0 CH index of the proposed TGS to LCSS and EDR methods with three algorithms: (a) spectral clustering; (b) K-medoids; and (c) hierarchical clustering.F I G U R E 11 Comparison of three clustering algorithms based on TGS similarity: (a) DBI index; and (b) CH index.

F
Visualization of clustering results of the behavior of Weibo users related to the COVID-19 pandemic: (a) time series; (b) spatial distribution; and (c) semantic distribution.| 75 HAN et al. scientific protective measures."Category 4 was mainly distributed in Beijing, Zhengzhou, Wuhan, Xi'an, Chengdu, the Yangtze River Delta, and the Pearl River Delta.(Figure 14d).The semantic pattern of category 5 can be summarized as changes between "S6: spreading positivity and encouragement" to other topics in the early stage, and finally back to "S6: spreading positivity and encouragement."Figure14eshows the spatial distribution of category 5, with hotspots mainly distributed in Wuhan, Beijing, Shanghai, Xi'an, Chengdu, and the Pearl River Delta.

Future
research should investigate the socio-geographical factors that influence public behavior patterns and semantic changes, such as cultural norms, political ideologies, and economic conditions.By revealing the influence mechanisms of these factors, we can better understand the dynamics of public opinion and behavior on social media and develop more effective strategies for crisis management and public communication.Third, our study assumes that each Weibo post contains only one topic, whereas in reality, posts can encompass multiple themes.Future research should explore methodologies to analyze multiple topics within a single post, providing a more comprehensive representation of the diverse perspectives on Weibo.Finally, the framework considers three features (time, space, and semantics) of social media data.Other dimensions of social media data, such as user demographics, network structure, and social connections, may also play important roles in mining public behavior and semantic changes.The possibility of incorporating these additional features into the framework can be explored to improve the accuracy and usefulness.
tiveness and applicability of this framework based on Weibo data during the COVID-19 pandemic.First, the TGSC model provides a detailed definition of the spatiotemporal semantic relationships of social media user behavior, transforms user behavior data into spatiotemporal semantic trajectories, and visualizes the TGSC of user behavior.The spectral clustering algorithm was then used to cluster Weibo user behaviors during the COVID-19 pandemic into five categories based on the constructed TGS-weighted similarity measures.The results of the DBI and CH indices show that the clustering effect of the TGS similarity measure proposed in this study is better than that of the commonly used EDR and LCSS similarities.Finally, five categories of semantic patterns were mined using the PrefixSpan algorithm, and the spatial distribution characteristics of each pattern were analyzed.The experimental results indicate that the TGS-based framework proposed in this study can effectively realize the quantitative analysis and comprehensive mining of public behavior during disaster Results of clustering of Weibo user trajectories related to the COVID-19 pandemic based on TSG similarity.
TA B L E 2