The Relationship Between Cyberbalkanization and Opinion Polarization: Time-Series Analysis on Facebook Pages and Opinion Polls During the Hong Kong Occupy Movement and the Associated Debate on Political Reform


  • Editorial Record: First manuscript received on November 19, 2015. Revisions received on May 22, 2016, September 12, 2016, December 16, 2016 and May 13, 2017. Accepted by Noshir Contractor on June 25, 2017. Final manuscript received on July 3, 2017.


Online activity is often cyberbalkanized, but it remains unclear whether this phenomenon leads to polarization of public opinion or if the relationship works in the reverse direction. This study tested the temporal association between cyberbalkanization and opinion polarization during the debate on political reform in Hong Kong. Online communities were constructed by a post-sharing network of 1,644 Facebook pages (101,410 shares); the differences between intra- and inter-community shares were derived, and a cyberbalkanization index was computed. A time-series analysis showed that the index temporally preceded the opinion polarization, i.e., most of the opinion poll's respondents gave extreme ratings to government leaders, but not vice versa. The index was particularly predictive of polarization among youth.

During its inception, the Internet was commonly conceived as a virtual venue for pluralistic and rational communication (Fuchs, 2014) or deliberation and collaboration (Dahlberg, 2001). But such an ideal expectation of an online public sphere does not often materialize. Recent research suggests a rather balkanized form of public discussion on the Internet.

The term cyberbalkanization2 was first coined in Van Alstyne and Brynjolfsson's early article (1996) to describe the information technology-driven division of virtual space into special interest groups. It is later put into the political context as an online phenomenon in which “people seek out only like-minded others and thereby close themselves off from ideological opposition, alternative understandings, and uncomfortable discussions” (Brainard, 2009, p. 598). Sunstein (2008, p. 94) also describes a similar online phenomenon as a group of bloggers living “in echo chambers of their own design” or “in information cocoons.”

One typical example of cyberbalkanization occurs in the online political debate among English-speaking Americans during presidential elections. Previous results indicate that online discussions follow a bipartisan pattern, within which ideologically compatible online users tend to cite and mention each other more frequently (Adamic & Glance, 2005; Conover et al., 2011)3. However, evidence also suggests that the segregation of communication according to political ideology on social media is largely issue-dependent, showing that such segregation was seen more often in the discussions of political issues, such as the 2012 presidential debates, but less profound in the public's exchanges on other issues, for example the 2013 Boston Marathon Bombing (Barberá et al., 2015).

Cyberbalkanization and Opinion Polarization

Even though cyberbalkanization has been described as a pervasive online phenomenon, whether or not it actually results polarization of people's opinions in real life is subject to further investigation. Opinion polarization is known as a state referring to “the extent to which opinions on an issue are opposed in relation to some theoretical maximum,” and polarization as a process that “refers to the increase in such opposition over time” (DiMaggio, Evans, & Bryson, 1996, p. 693). It has been a global concern.4

At the core of our theoretical concern, the central question of this study is to examine whether or not, and to what extent, cyberbalkanization on social media reflects or contributes to the polarization of the public's views. Sunstein (2009) theorizes the process of selective exposure and polarization of political views, suggesting that Internet communication can increase political polarization because like-minded people tend to discuss political issues with each other, and consequently they end up reinforcing each other, leaving them holding more extreme or more polarized views than they had before (Sunstein, 2009). Sunstein's hypothesis is supported only, however, with “unnatural” experimental results in which “Internet-like” face-to-face interactions with like-minded individuals drove subjects to adopt more extreme views. On the other hand, Farrell argues that even though research clearly supports the case that the Internet can bring like-minded people together, there is insufficient evidence to substantiate Sunstein's hypothesis that cyberbalkanization causes opinion polarization (Farrell, 2012). Specifically, Farrell notes that the causal mechanism of cyberbalkanization and opinion polarization remains uncertain.

Sunstein (2008) lists out three possible reasons to explain why polarization is caused by cyberbalkanization, namely selective exposure, social comparison, and social corroboration. However, his explanations rest only on online information seekers (readers). Lawrence et al. (2010) point out the differences in roles played by readers and authors on social media and such distinction must be considered in studying the process of cyberbalkanization.

Information Seeker's Information Bias

Cyberbalkanization is thought as a mechanism through which an individual's preference towards certain information sources leads to reinforce one's skewed opinion, i.e., a user voluntarily selects like-minded peers for interactions and filters out, consciously or unconsciously, less-preferred contacts, which is an innate aspect of human communication long documented in the literature on selective exposure ( Zillmann & Bryant, 1985). Selective exposure of human communication represents an individual's tendency to favor information that reinforces one's pre-existing views and filters any contradictory content (Klapper, 1960). The process is grounded in the theory of cognitive dissonance (Festinger, 1957), positing individual's preference towards cognitive consistency and avoiding information that likely induces discomfort.

Indeed, selective exposure to information sources seems to play a key role in polarizing the public (Hollander, 2008). Selective exposure to traditional media sources (such as television viewing) has long been seen as a factor that can drive public opinion sectors apart (Iyengar & Hahn, 2009). Even though selective exposure to blogs is found to be pervasive among Internet users (Lawrence, Sides, & Farrell, 2010), the ease of receiving information via the Internet can also facilitate online exposure to opposite viewpoints (Garrett, 2009). Nonetheless, evidence for a causal relationship between exposure to partisan information and change in political attitude or behavior is equivocal (Prior, 2013).

Another reason is social comparison. Individuals in a group opt for adjustment of opinions towards the perceived norm to be perceived well by their fellow group members (Stroud, 2010) and this shift consequently generates opinion polarization. As the third explanation suggested by Sunstein (2008), social corroboration conveys that when an individual's opinion is reconfirmed by the members in a group setting, one gains social acceptance, becomes more confident and thus extreme in belief (Baron et al., 1996).

Some scholars further argue that in the online environment, algorithmic personalization of Internet experience, for example Google's personalized search or Facebook's news feed, might promote exposure to biased information because less preferred information is algorithmically eliminated, a pattern known as the “filter bubble” (Pariser, 2012). However, in a study supported by Facebook (Bakshy, Messing, & Adamic, 2015) and a few independent works (e.g. Flaxman, Goel, & Rao, 2016), the effect of algorithmic personalization on reduction of information consumption diversity appears modest. Human's voluntary selective exposure seems to play a stronger role than the “filter bubble” in promoting cyberbalkanization (Bakshy, Messing, & Adamic, 2015).

Information Source's Selective Sharing

The subject we are studying is Facebook page, not individual Facebook user. Like open Twitter accounts or blogs, Facebook pages act like public media which are designed for one-to-many communication and function as information producers and curators. Apart from publishing original messages to their readers, Facebook pages can also rebroadcast information from other Facebook pages who share common interests or political views whereas ignore the information from pages with differing opinions. This selective sharing of like-minded information on Facebook pages can overemphasize one-sided arguments and effectively downplay counterarguments to their readers. Based on persuasive arguments theory, sharing of new arguments can induce attitude change, particularly in the case of restricted “argument poll,” e.g., abundance of similar arguments and high perceived persuasiveness of arguments (Vinokur & Burstein, 1974). Hence, Facebook pages' selective sharing of posts can reduce diversity of perspectives presented to their readers, i.e., a limited “argument poll.” This restriction of “argument poll” has been found to promote polarization within groups (Hamlett & Cobb, 2006). Sunstein (2000) argues that a limited diversity in perspectives can also promote online enclave deliberation and make reaching consensus difficult. Such a persuasion-based explanation can explain how the ideologically slanted news outlets drive polarization (Prior, 2013), even though exposure to slanted news can drive opinion polarization using non-persuasive routes such as reinforcement of group identity and promotion of motivational reasoning (Prior, 2013).

Empirical studies find that frequency of information shares between like-minded information seekers/producers is associated with online polarization (Conover et al., 2011; Gruzd & Roy, 2014). In particular, Conover et al. (2011) reveal that the network of content sharing exhibits ideologically segregated community structure, but the network of mentioning does not. These studies suggest that the difference in finding between sharing networks and mention networks comes from politically motivated users' ability to insert partisan content into the timeline of users with opposing viewpoint by mentioning their name. Sharing content cannot function as such and therefore it is mostly an act of endorsement.

Early social psychology literature has suggested that effective persuasive communication depends on the source, the message, and the audience, each of which plays an important part (Hovland, Janis, & Kelley, 1953). In this study, we argue that both Facebook's information seekers and Facebook's information producers are formative parts of the mechanism of opinion polarization but they have not been holistically studied. A study has indicated that overlapping ‘audienceship’ is much higher between those Facebook Pages who share with each other more frequently, suggesting that the cyberbalkanization of information producers and that of information seekers are closely related to each other (Chan & Fu, 2017). The conceptual diagram of the holistic integration of the two processes is presented in Figure 1.

Figure 1.

Conceptual Diagram showing the interactions between information producers and information seekers in the process of cyberbalkanization-induced opinion polarization.


In this study, we aim at showing the relationship between cyberbalkanization of information producers and offline opinion polarization. In the subsequent paragraph, cyberbalkanization refers exclusively to cyberbalkanization of information producers. This area has not been studied extensively. Previously, a few survey studies examined the correlations between a set of social network characteristics, including media consumption habits and political attitudes of online information seekers (Dvir-Gvirsman, 2016; Huckfeldt & Sprague, 1987; Lee, 2016; Mutz, 2006). These survey data relied primarily on individually self-reported information. However, social network information derived from survey methodology based on self-reporting suffers from both random and nonrandom measurement errors5.

The availability of social media data can partly resolve the above measurement problems because they provide a relatively complete digital trace of a user's online activities. However, social media data also come with the problem of representativeness, as the population of social media users is not a statistically representative sample of the target population (Tufekci, 2014) and therefore we cannot infer individual political behavior based solely on social media data (Freelon, 2014; Jungherr, 2015). As the current study focuses only on a very specific target population of Facebook pages, it is appropriate to use social media data to study the cyberbalkanization of those pages.

We expected that the indicator of cyberbalkanization (like-minded sharing) is related to opinion polarization but the effect on the public has not been studied. Empirical evidence from our previous pilot study of three months of Facebook data (Chan & Fu, 2015) has shown that the quantity of sharing between like-minded Facebook Pages is a leading indicator of offline opinion polarization. In the current study, we seek to test the same hypothesis with data collected over a longer period (12 months). Therefore, this study seeks to test the following hypothesis.

H1: Cyberbalkanization is positively correlated with opinion polarization

Although some studies have indicated that the online sphere is largely balkanized, many scholars still agree that different political views can still be channeled to individuals via an online social network's strong ties (mostly like-minded close friends and family) as well as weak ties (mostly acquaintances as sources of novel and non-redundant information). The definition of cyberbalkanization only considers the communications between “like-minded others” and therefore only considers strong-tie connections. However, recent studies reveal that social media not only bring like-minded individuals together but also facilitate broader exposure to alternative views, usually coming from weak connections (Grabowicz et al., 2012; Bakshy et al., 2012). Barberá (2014) also suggests that weak ties can moderate political extremism and reduce opinion polarization based on results from mixed methods of survey study and web trace analysis. He proposes two possible mechanisms for the mediation of political extremism by information sourced to weak ties in the social network: 1) “greater awareness of rationales for opposing views” (Mutz, 2006, p. 69) and 2) triggering affection to acknowledge that there are people inside one's social networks who hold opposite views (Iyengar, Sood, & Lelkes, 2012).

With this background, this study proposed a new conceptualization of cyberbalkanization which isolates the contributions of strong- and weak-tie connections. Based on previous studies, we hypothesized that information from strong-tie connections largely reinforces pre-existing views, resulting in more extreme views, but the information from weak-tie connections tends to ease polarization because the information flow facilitates broader exposure to alternative views. The frequency of sharing determines the tie strength. This conceptualization is not optimal: It is possible to have social media users who rarely share with each other on the social network but still share with those who have the same political stance. Nonetheless, our approach isolates the relative contributions of strong-tie connections and weak-tie connections, and the strong- and weak- ties connections is expected to have a stronger correlation with opinion polarization than the strong ties connections alone. Thus, we derive the second hypothesis as follows:

H2: Cyberbalkanization among both weak-ties and strong-ties connections can explain the correlation between cyberbalkanization and opinion polarization better than that found among strong-ties connections alone.

Sunstein's hypothesis suggests that selective exposure to online media content can lead to opinion polarization. One essential criterion for such a causal claim is to establish the temporal precedence of events (Hill, 1965), establishing that an increase or decrease in cyberbalkanization temporally precedes an increase or decrease in opinion polarization. Even though a statistical correlation between the level of cyberbalkanization and opinion polarization might have been established, an alternative explanation for such correlation might be that the degree of cyberbalkanization is simply an online manifestation of offline opinion polarization. In the current study, we intend to show that the relationship between cyberbalkanization and opinion polarization can either be 1) unidirectional, i.e., only cyberbalkanization can lead to future opinion polarization; or 2) bidirectional, with only the relationship between cyberbalkanization and future opinion polarization statistically and practically significant. We must emphasize that, per Hill's criteria of causation (Hill, 1965), establishing a temporal precedence relationship between two events is only one among many essential conditions for causality. It is not our intention to claim causality in this study.

H3: Changes in offline opinion polarization temporally lead to a change in cyberbalkanization.

Nonetheless, no empirical study has so far examined the above three hypotheses with empirical social media data nor investigated the changes in degree of cyberbalkanization, i.e. the extent of strong or weak ties, with respect to the process of opinion polarization, in a longitudinal research setting. This research approach is missing not only because it is difficult to measure cyberbalkanization, which poses a methodological challenge to extract online social network interactions between users in a specific polity, but also because there is no available method to quantify and operationalize the level of cyberbalkanization inside a single communication network that can be effectively tracked across time. Previous approaches to measuring selective exposure (e.g., Clay, Barber, & Shook, 2013) might be adopted to track cyberbalkanization but the main methodological challenge is the feasibility of classifying “pre-existing” stance of the user, which is essential for determining whether communications are among like-minded others or not. The approach using human classification is often labor-intensive. Following another study of selective exposure on Twitter (Himelboim, Smith, & Shneiderman, 2013), we developed a social network analysis methodology to assign a possible stance of users automatically based on the content source they shared.


Research setting: 2014 Hong Kong Occupy Movement

In March 2012, Hong Kong's city leader, the Chief Executive Chun-Ying Leung, was elected with 689 votes out of 1,200 Selection Committee members, the majority of whom “could be easily manipulated” by the Chinese government (Chan, 2014 p. 574). Although the Hong Kong Basic Law, the constitutional document of Hong Kong, states that the Chief Executive will be ultimately selected “by universal suffrage upon nomination by a broadly representative nominating committee in accordance with democratic procedures” (The People's Republic of China, 1991), this legally binding article has yet to be implemented. In 2012, the Beijing Government announced that the Hong Kong Chief Executive may be selected by universal suffrage in 2017. But in 2014, when the Hong Kong citizens were being consulted to reform the electoral system, the Standing Committee of the National People's Congress (NPCSC) on 31 August established a set of nomination procedures that would eliminate candidates whom the Beijing government disliked. The 2014 Hong Kong Occupy Movement was a collective action of a large group of citizens who were mobilized to protest against the decision of the NPCSC and the Hong Kong Government. The community was dichotomized into two camps that were symbolized by colored ribbons: Yellow ribbons supported the Occupy Movement and blue ribbons supported the police and the authorities in their crackdown on the social movement6.

Hong Kong is among the places with the highest Facebook penetration rate in the world, with over 60.1% of the population were using Facebook at least once a month7. During the Occupy movement, Hong Kong's social media, especially Facebook, played a key role in mobilizing citizens to join the protest and in disseminating updates, especially among the youth (Wong & Chan, 2015). In another survey8, 19.4% of respondents said their main source of information about the Occupy Movement in Hong Kong came from Facebook. A survey study during the Hong Kong Occupy Movement reveals that political communication employing social media was associated with extreme attitudes toward the political situation in Hong Kong (Lee, 2016).

Data collection and Social Network Analysis

The unit of analysis in the current study is a Facebook page9. We selected five Facebook pages—namely scholarism, supporthktv, passiontimes, salutetohkpolice, and supportnationaleducation—to begin snowball sampling based on their indications of explicit support or opposition to the movement. Using these elements as the initial set of pages, we collected additional Facebook pages by tracing the links shared on the included Facebook pages. The sample inclusion criteria were that the pages originated from Hong Kong or that most of the posted messages were about Hong Kong. The first author (CHC) manually confirmed the inclusion of pages.

In total, we included 2,983 pages. All sampled pages' publicly available posts published between July 1, 2014, and June 30, 2015 were retrieved using the Facebook Graph API, and in turn the data constituted a Facebook post-sharing network. The terminology of “sharing” in this study is same as “retweeting” in the parlance of Twitter: re-posting someone else's message10. It represents a page owner's click on the “Share” button underneath a post to “send this [post] to friends or post it on your timeline.” A Python program was developed by the first author to scan the timeline of each included page for all posts shared from other pages during the study period. In the post-sharing network, a node represents an individual Facebook page and an edge denotes a sharing relationship between pages such that an edge's weight is the total number of shares between two pages (nodes). A directed edge is used to denote shares and the direction of edge is modeled as the flow of information between pages, i.e. A-8- > B means eight posts of Page A are shared by Page B.

In order to study the temporal association between cyberbalkanization and opinion polarization, a time series of daily degree of cyberbalkanization and opinion polarization were derived.

Time series of cyberbalkanization

In order to test our H2, we developed our own index of cyberbalkanization11. In this study, daily degree of cyberbalkanization was quantified based on the relative proportion of “strong ties sharing” and “weak ties sharing” within the sharing network. To define page sharing within the same or between different communities, the operationalization is based on the behavioral process of cyberbalkanization in which people “seek out only like-minded others.” The strong ties-sharing is measured by sharing count among pages belonging to same community and weak ties-sharing is estimated by the page sharing count between different communities. Community structure in a network is defined as groups of nodes within which each group has high intragroup concentration of edges and low intergroup concentration of edges. Similar to cluster analysis as used in social science, community detection algorithms are a set of unsupervised learning algorithms which are developed to assign community membership to each node of a network based on the concentration of edges between nodes (Fortunato, 2010; Girvan & Newman, 2002). In the current study, Facebook pages that were frequently shared with others tended to belong to the same community. Community membership of each Facebook page in the current study was determined by Walktrap community detection algorithm (Pons & Latapy, 2006). The Walktrap community detection method is implemented by modelling a random flow of information within the network in which pages belonging to same community are more likely to connect frequently with each other.

Only large communities, i.e., more than 30 Facebook pages, were included in the analysis. This decision was built upon an assumption that the pages of large communities were more likely to demonstrate the behavioral trait of “people seeking out only like-minded others.” The daily counts of within-communities sharing (strong ties, Sst) and between-communities sharing (weak ties, Swt) were calculated. The daily degree of cyberbalkanization, also called the Cyberbalkanization Index (CBI), was quantified using the following three operational indicators12:

display math

Time series of opinion polarization

The opinion polarization data were obtained from a secondary data source, the University of Hong Kong's Public Opinion Programme (HKUPOP)13. The data were collected by the phone survey method and were used to derive an index of opinion polarization. HKUPOP is an independent pollster who announces approval ratings for the Chief Executive of Hong Kong on a monthly basis and makes the raw data available online. One question item was selected to represent opinion polarization, asking the respondents to rate the extent of support for the Chief Executive on a 0-to-100 scale, where 0 indicates absolutely no support and 100 indicates absolute support. For each wave of the survey, a Political Polarization Index (PolPolI) is defined as the proportion of respondents giving extreme ratings below 2.5 and above 97.5 marks, adjusted for the population's age-gender distribution.

In order to test the hypothesis about opinion polarization and cyberbalkanization among youth, another PolPolI, namely PolPolIyouth, was also derived with only data for respondents aged 49 years or less. The PolPolI and PolPolIyouth of each wave of the survey in the study period were interpolated and resulted in two sets of daily time-series data.

Time-series analysis

In the tests for Hypothesis 1 and 2, lead-lag associations between CBIs and PolPolIs were examined by using the cross-correlation function (CCF) approach (Cryer & Chan, 2008). The CCF approach has been used to analyze temporal relationships between online sentiment and public opinion (Fu & Chan, 2013). The CCF of two time-series is the correlation between the lagged version of one series and the other as the function of the time lag between both series. Since the autocorrelation within a time series confounds the true value of cross correlation in CCF, both time series were adjusted for autocorrelation before conducting CCF, i.e., a step known as prewhitening. Each pair of the three CBI and the two PolPolIs was “prewhitened,” (Cryer & Chan, 2008) which results in white-noise time series, i.e., the current values of a time series have no relationship with past values14. The CCF between these two prewhitened time series of CBI and PolPolI was calculated and plotted. The relationship between CBI and PolPolI was concluded to be significantly associated, with cross correlation significant at the 5 percent level in any lag and lead units within the range of −30 to +30 days.

In order to test the robustness of the above prewhitening and CCF approach as well as to establish the temporal precedence of cyberbalkanization over opinion polarization (i.e., Hypothesis 3), we reversed the input and output order of the procedure, using the auto.arima function to determine the best-fitted ARIMA models of the PolPolI time series. Then we used those models to filter the time series of CBIs. If the reversed-order analysis also arrived at the same conclusion as the original analysis, evidence for the temporal precedence of the events was considered to be reconfirmed.

All data analyses were conducted with R version 3.1.2 for Linux. (R Core Team, 2013).


The overall sharing network was constructed with 38,295 weighted edges through which 2,983 pages (nodes) were connected. Ten communities with the size of at least 30 pages were detected and the number of members in each community ranged from 30 to 605. Basic characteristics of these communities are listed in the online appendix. Political communities within the Facebook sharing network seemed to be organized by political ideologies, and the segregation of communities was consistent with the general understanding of political polarization in Hong Kong (Cheng 2014). In the subsequent analysis, only 1,644 pages belonging to the ten large communities were included.

H1: Correlations between cyberbalkanization and opinion polarization

Time series analysis between the two PolPolIs and the three CBIs are presented in Figure 2 and Figure 3 respectively. To contextualize the analysis, four significant political events that happened within the study period are marked in the plots. Spikes are observed only in the plots of CBIdiff and CBIraw at the four significant time points. The CCFs between the prewhitened PolPolIs and CBIs are presented in Figure 4. In general, the cross correlations between CBIs and PolPolyouth were more significant than those between CBIs and PolPolI. The CBIdiff and CBIraw series were significantly associated with the two PolPolIs in some negative lag units. The findings indicate that the changes in CBIdiff and CBIraw were associated with future changes in PolPolIs. Therefore Hypothesis 1 is supported.

Figure 2.

Time series of Political Polarization Indices (PolPolI) in the overall population (Overall PolPolI, Top) and the subpopulation of those age < 49 (PolPolIYouth, Bottom). Note: The three lines on both graphs represent the proportion of the HKUPOP survey samples who responded extreme ratings in telephone polls in relation to the Chief Executive's approval (y-axis) during the study period (x-axis). The dashed and dotted lines show the proportions of extremely low (<2.5) and extremely high (>97.5) ratings and the solid lines are the sum of the two. The gray bar shows the actual duration of telephone polls. The vertical dotted lines denote the dates of four key events within the study period: 1) September 28, 2014 - Hong Kong Police Force fired tear gas to disperse protesters, which marked the beginning of the Occupy Movement; 2) December 15, 2014 - The final crackdown of the Occupy Movement, 3) April 22, 2015 - The Hong Kong government unveiled the political reform package, and 4) June 18, 2015 - The Legislative Council vetoed the political reform package.

Figure 3.

Time series of Cyberbalkanization Indices (CBI) calculated as a ratio of strong ties sharing to total sharing (CBIRate, Top), difference of strong ties and weak ties sharing (CBIDiff, Middle) and strong ties sharing only (CBIRaw, Bottom) Note: The dotted vertical lines denote the dates of four key events within the study period: 1) September 28, 2014 - Hong Kong Police Force fired tear gas to disperse protesters, which marked the beginning of the Occupy Movement; 2) 15 December 2014 - The final crackdown of the Occupy Movement, 3) 22 April 2015 - The Hong Kong government unveiled the political reform package, and 4) 18 June 2015 - The Legislative Council vetoed the political reform package.

Figure 4.

Cross Correlation Functions (CCF) among overall PolPolI (Top panel) and PolPolIYouth (Bottom panel) with CBIRate (Left), CBIRaw (middle) and CBIDiff (Right) Note: For each CCF plot, each vertical line represents a cross correlation coefficient between PolPolI and CBI (y-axis) at a specific lag unit (x-axis, from -30 days to 30 days). The blue dotted line boundaries are the 95% confidence limits of the cross correlation. Lag unit with cross correlation exceeding the blue dotted line boundaries is considered statistically significant at 95% level. For example, if the cross-correlation coefficient is positive and statistically significant at a negative lag, indicating that the change in CBI is temporally preceding the change in future PolPolI.

H2: Cyberbalkanization Indices that taking weak ties into account are more predictive for opinion polarization

The operationalization of cyberbalkanization that considers weak-tie sharing, i.e. CBIdiff, was found to be a more significant leading indicator for overall PolPolI and PolPolYouth for at least 21 days. The cross correlations between CBIdiff and PolPolIs were more significant than those between CBIraw and PolPolI as indicated by the higher cross correlation shown by CCF. Therefore, Hypothesis 2 is supported, suggesting that the inclusion of both strong-tie sharing and weak-tie sharing in the operationalization of cyberbalkanization can uncover a stronger association with opinion polarization.

H3: Temporal precedence of cyberbalkanization over opinion polarization

When we reversed the order of independent and dependent variables in the above time-series analysis, it generated additional results, presented in Figure 5. While the positive correlation appeared at around −20 days in the original analysis, we still observed positive correlation peaks in around +20 days in all combinations of CBIs and PolPols in the reversed-order analysis, with the only exception being the results for CBIrate and PolPolIYouth. Therefore, both the original analysis and the reversed-order analysis arrived at the same conclusion, and thus Hypothesis 3 is supported.

Figure 5.

The reversed Cross Correlation Functions (CCF) among overall PolPolI (Top panel) and PolPolIYouth (Bottom panel) with CBIrate (Left), CBIraw (middle) and CBIdiff (Right) Note: For each CCF plot, each vertical line represents a cross-correlation coefficient between PolPolI and CBI (y-axis) at a specific lag unit (x-axis, from -30 days to 30 days). The blue dotted line boundaries are the 95% confidence limits of the cross-correlation. Lag unit with cross correlation exceeding the blue dotted line boundaries is considered statistically significant at 95% level. For example, if the cross correlation coefficient is positive and statistically significant at a positive lag, indicating that the change in CBI is temporally preceding the change in future PolPolI.


To the authors' knowledge, this paper reports the first empirical study that intends to quantify the degree of cyberbalkanization in a social media environment and the real-life opinion polarization amid a political controversy, and the results support a temporal association between the two variables. Our approach to operationalization of cyberbalkanization is computationally derived and therefore is not subject to coder bias, which is an unavoidable methodological limitation in previous media content analysis studies. We also reveal that the association between cyberbalkanization and opinion polarization among the younger generation is markedly profound, which echoes a survey finding in Hong Kong showing that members of the younger generation were more inclined to use Facebook as their source of political information than was the case among other age groups (Wong & Chan, 2015). This finding can also serve as indirect supporting evidence of a dose-response relationship between opinion polarization and the extent to which social media were balkanized. But, while the study is specifically situated within a political controversy, further analysis in other contexts is required to verify the finding.

We found that both CBIrate and CBIdiff are correlated with the future degree of opinion polarization, and CBIdiff is the better predictor for opinion polarization of the two. This finding echoes a previous finding (Barberá, 2014) showing that weak ties could possibly ease opinion polarization. However, the current study also explored the association between the volume of weak-tie sharing per day and opinion polarization, and no significant association was found. Therefore, the current study cannot establish the claim that weak ties per se can ease political polarization. However, the combination of weak ties and strong ties can certainly become a better leading indicator for opinion polarization. But again, further analysis, over a longer study period, is warranted to confirm the result.

Alternative explanations and limitations

There are two alternative explanations that can significantly threaten the validity of our conclusion. The survey data showed that the proportion of Hong Kong citizens giving the Chief Executive an extremely low rating (<2.5) fluctuated during the study period, whereas those who gave high ratings (>97.5) remained stable over the same period. From the results of community detection, the blue-ribbon camps, which held more favorable views toward the government, constituted only a minority part of the sharing network. This seems to suggest that strong-ties sharing can motivate only those who did not support the city leader to take a more extreme stance. One alternative explanation for this observation is that the surge in strong ties-sharing is a surrogate marker for a set of unknown exogenous factors that hampered the Chief Executive's rating, and therefore this set of factors renders the association between cyberbalkanization and opinion polarization spurious. But this explanation is technically unverifiable in an observational study.

Another alternative explanation is that the interpolation of the monthly telephone poll data might have generated measurement artifacts that lagged the telephone poll time series. One previous study has shown that approval ratings on the basis of a telephone poll constitute a lagged-behind indicator for online public opinion (Fu & Chan, 2013). It is possible that such an artifact can produce spurious temporal precedence. We acknowledge that it is an inherent limitation of using data interpolation and it is impossible to tease out the possible impact under the current study design. This problem can be solved partially by using non-interpolated, daily telephone-poll data, but such a dataset is not available due to massive human resource and cost of administration (Fu & Chan, 2013). Future research should consider using alternative, high-frequency but validated method to track the changes in real-life public opinion.

With these alternative explanations, the temporal precedence found in the current observational study should be interpreted cautiously. However, we have also conducted a series of post hoc analyses to study whether our main finding could be an artifact of the opinion polarization measurement-related limitations (Online Appendix: Our analysis showed that our main conclusion is very robust and unlikely to be just a measurement artifact.

While temporal precedence is only one necessary condition for causality (Hill, 1965), the finding should not be understood as definitive evidence for causality between cyberbalkanization and opinion polarization. In order to establish causality, future studies should test other criteria of causation (Hill, 1965), for example ruling out alternative explanations and studying the underlying mechanism.

There are three limitations of this study. First, the time period of the current study was only one year, and therefore we can only demonstrate a relatively short-term relationship between cyberbalkanization and political polarization. The study's conclusion may not generalize to a longer period of such association. We have attempted to experimentally extrapolate our findings for few months and the correlation between cyberbalkanization and opinion polarization is still significant (online appendix: Second, one assumption of this study was that opinion polarization is largely bipolar, i.e. either supporting or not supporting the city leader. But indeed, the political spectrum in Hong Kong can be understood to be far more complicated than a simple bipolar model. Even among the antigovernment communities, the groups have completely different views of the ways to achieve a genuine mode of democracy in Hong Kong. Future study should deploy a more stringent indicator of opinion polarization to reflect the real-life political dynamic. Last, this study only considered the volume of sharing but did not assess media content. It would be possible to encounter weak-tie sharing between pages, but they would still be expected to reflect common political stances.


We conceptualized and operationalized cyberbalkanization of the social media landscape in Hong Kong in the context of a Facebook sharing network, within which we evaluated the degree of cyberbalkanization. We found that phenomenon to be associated with the polarization of public opinion in Hong Kong, especially among the younger generation. The contributions of this study are twofold, in both theoretical and practical aspects. Theoretically, this study's result supports and validates the Sunstein hypothesis, empirically at least in a short-term manner, even though the causal mechanism of cyberbalkanization and opinion polarization is still elusive and warrants further investigation. Practically speaking, it is evident that the degree of cyberbalkanization can serve as an opinion polarization indicator to supplement the traditional opinion measurement mainly employing resource-intensive methodologies such as telephone polling. We call for replication studies in other research settings and a re-examination of our finding over a longer study period.


  1. 1

    This research project (Project Number: 2013.A8.009.14A) is funded by the Public Policy Research Funding Scheme of the Central Policy Unit of the Government of the Hong Kong Special Administrative Region. Part of the first author's PhD studentship is supported by the HKU SPACE Postgraduate Fund.

  2. 2

    The reasons for using the term “cyberbalkanization” throughout the text instead of similar terms such as “social network homophily”, “echo chamber” and “selective exposure” are twofold. First, we want to emphasize that such phenomena are facilitated by the Internet whereas the other terms can also be referred to non-Internet mediated scenarios. Second, these terms could not independently describe the construct of cyberbalkanization entirely as it is an inseparable trinity of the three.

  3. 3

    An almost identical pattern was also observed among Canadian Twitter users during the 2011 Canadian Federal Election (Gruzd and Roy, 2014) and among Egyptian Twitter users during the June 2013 protests between secularists and Islamists (Borge-Holthoefer et al., 2013, Weber, Garimella, & Batayneh, 2013).

  4. 4

    Regardless of political culture, political polarization is observed in some Western democratic countries such as America (Fiorina & Abrams, 2008), nonwestern democratic countries like Taiwan (Clark & Tan, 2012), and even in authoritarian regimes like Communist China (Wu, 2014).

  5. 5

    The self-reported network data has a less than 50% accuracy rate (Marsden, 1990). Dvir-Gvirsman, Tsfati, & Menchen-Trevino (2016) even argue that, when compared with objective behavioral data (i.e. web-log data), measurement of self-reported online ideological exposure is highly inflated.

  6. 6

    Dearden, L. (2014, October 5). Hong Kong protests: A guide to yellow ribbons, blue ribbons and all the other colours. Independent.

  7. 7

    Perez, B. (2013, September 9). Facebook to Spur More Digital Advertising in Hong Kong. South China Morning Post.

  8. 8

    Lee, F.L.F. (2014, December 5). Hong Kong Citizens media repertoire during the Occupy Movement [In Chinese]. Ming Pao.

  9. 9

    The reasons for analyzing Facebook Pages instead of individual users are twofold: 1) there are privacy and authorization concerns and restrictions related to data collection from individual user accounts, but Facebook Pages are open to the public, especially those pages of social organizations or protest groups; 2) Facebook Pages are mostly associated with political organizations involved in the Occupy Movement, from media outlets (online version of traditional media and new media) and political figures. These pages are widely read among Hong Kong citizens.

  10. 10

  11. 11

    Previous research has developed metrics to measure network polarization intend for network-to-network comparison, such as modularity (Waugh, Pei, Fowler, Mucha, & Porter, 2009). But to our knowledge, there is no generally accepted method to quantify the longitudinal change in degree of cyberbalkanization of a single network. Moveover, the existing network polarization metrics cannot determine the relative contribution of ‘weak ties’ and ‘strong ties’.

  12. 12

    For example, in a network with node A, B and C such that only A and B belong to same community. Suppose that on a given day, A shares B 5 times, C shares B 1 times, and B shares C 2 times, i.e. where B -5- > A, B -1- > C and C-2- > B, only the B -5- > A represents strong ties sharing. The CBIs of that day in such network are calculated as follows: Sst = 5; Swt = 1 + 2 = 3; CBIdiff = log(5-3) = 0.693; CBIrate = 5 / (5 + 3) = 0.625; CBIraw = log(5) = 1.609

  13. 13

  14. 14

    The prewhitening procedure is as follows: (Cryer & Chan, 2008): The optimal autoregressive, integrated and moving average (ARIMA) model parameters of the CBI time series were first determined by the auto.arima function provided by the forecast R package (Hyndman, 2013). This auto.arima function is a model selection function to determine a set of ARIMA parameters that maximize the Akaike Information Criterion (AIC), the model goodness-of-fit measure. Residuals of the best-fitted ARIMA model for the CBI time series, ResidCBI, were then extracted. The same ARIMA model structure was applied to the PolPolI time series and the resultant residuals, ResidPolPolI, were obtained; These two residual time series are supposed to be white-noise and still preserve the variation of values due to extraneous events.


  • Chung-hong Chan is a PhD candidate at the Journalism and Media Studies Centre, The University of Hong Kong. His research interests include computational media studies, online opinion polarization, and platform intervention. E-mail:

    Address: Journalism and Media Studies Centre, Room 116, Eliot Hall, Pokfulam Road, The University of Hong Kong, Hong Kong.

  • King-wa Fu is Associate Professor at the Journalism and Media Studies Centre, The University of Hong Kong. His research interests cover political participation and media use, computational media studies, health and the media, and younger generation's Internet use. E-mail:

    Address: Journalism and Media Studies Centre, Room 206, Eliot Hall, Pokfulam Road, The University of Hong Kong, Hong Kong.