Importance of Linking Patterns
While the Internet allows a user to view content from any server, the technical ability to view information does not correspond with the ability to understand that information. As content in languages other than English increases and the number of non-English users increases, information has become fragmented into different language groups. Pimienta, Prado, and Blanco (2009) found the percentage of English webpages fell steadily from 75% to less than 45% from 1996 to 2006. During the same time, the percentage of Internet users who were native English speakers also fell from 80% to less than 30%. While the other languages in the study—Spanish, French, Italian, Portuguese, Romanian, and Greek—made steady gains, each often accounted for less than 5% of webpages. While the number of pages in each language is important, an understanding of how the pages link together is also important and little work has examined links between language groups.2
How Languages are Connected and Why it Matters
Hyperlinks may be used as a proxy to measure the awareness of foreign-language content among bloggers. Although all the nuanced motivations for creating cross-lingual hyperlinks are not known, hyperlinks are among the best data available as they can be observed passively, are publically available, and possess a similarity to citations. While bloggers create hyperlinks for a multifaceted number of reasons, a hyperlink within a blog post at the very least signals the author's awareness of the content linked to. With this minimal definition, it is possible to measure to what extent bloggers are aware of content in languages other than the languages in which they write. This definition is well-supported by previous work, which has justified an even deeper meaning of interaction or communication (e.g. Adamic & Glance, 2005; Hargittai, Gallo, & Kane, 2007), and the awareness individuals have for information in other languages is important. Crystal (2003), discussing the dangers of unawareness as linguistic complacency (p. 17), states that a third of British exporters miss opportunities because of poor language skills according to a study by the UK-based Centre for Information on Language Teaching and Research.
Multilingual individuals creating content in peer-produced spheres (e.g. blogs, Wikipedia, open-source software) may create opportunities for information exchanges akin to Granovetter's (1973)“weak ties.” Weak-tie acquaintanceships form “crucial bridge[s] between two densely knit clumps of close friends” (Granovetter, 1983, p. 202) and have been found to be important to many areas including the spread of ideas and innovations (e.g. Fine & Kleinman, 1979; Burt, 2004). In the same manner, cross-lingual hyperlinks may represent similarly crucial bridges in the exchange of information online. Human-produced translations, while not as ubiquitous as their machine-made counterparts, often better capture nuances in meaning and have the potential to translate cultural meaning in addition to linguistic meaning. This is especially true of more distant language pairs such as Japanese and English. These exchanges could present novel information as the content available in various languages may be very different: Hecht and Gergle (2010) found very little overlap in topics and article content between different language editions of Wikipedia, for example.
Many link analysis studies of the blogosphere have focused on the structure of links between U.S. political blogs. These studies (e.g. Adamic & Glance, 2005; Hargittai et al., 2007) showed bloggers in the U.S. political blogosphere were highly polarized by ideology and linked to blogs with similar political affiliations over those with different affiliations, demonstrating that homophily (Lazarsfeld & Merton, 1954), commonly expressed by the adage “birds of a feather flock together,” truly does “structure [] network ties of every type, including marriage, friendship, work, advice, support, information transfer, exchange, [etc.]” (McPherson, Smith-Lovin, & Cook, 2001). Perhaps unsurprisingly then, Internet websites have been shown to cluster by topic (Chakrabarti, Joshi, Punera, & Pennock, 2002) and language (Hale, 2010). However, even a small number of intercluster bridging ties in a highly clustered network can drastically decrease the path length between any two nodes (Watts & Strogatz, 1998). These large networks with comparatively small average path lengths are said to exhibit the small-world property, and their impact upon innovation and the spread of ideas (e.g. Fleming, King, & Juda, 2007; Uzzi & Spiro, 2005) demonstrates the importance of weak ties.
The limited prior research available suggests different languages have different interlinking patterns. A Berkman Center project mapping the Arabic blogosphere found no hard division between English and Arabic blogs (Etling et al., 2009, p. 19), while a similar study by the Berkman Center did find a clear division between Farsi and English blogs (Kelly & Etling, 2008). The Arabic project (Etling et al., 2009) found several large national clusters as well as two clusters linking more to foreign language blogs: one to English and one to French. Thelwall, Tang, and Price (2003) investigated linking patterns between academic institutions in Western Europe. They found most interlinking throughout Europe occurred in English. Regional linking between countries sharing a common non-English language was also present. Notably, a typical academic site had about half of its pages in English, with the remaining half in the national language(s). Finally, Zuckerman (2008) states that Japanese language blogs are widely considered less political and more personal than U.S. blogs. Exploration of a set of 9.2 million Japanese blog posts by Fujimura, Inoue, and Sugisaki (2005) seems to confirm this by revealing remarkably few posts (about 1.25%) linked to other blog posts in the set. Indeed, only 16.3% of the blog posts linked to any other webpage at all.
Analysis of data from a pilot study led to three findings: First, the blogosphere demonstrated linguistic homophily with bloggers preferring to link to same-language content over foreign-language content. Second, most cross-lingual links were found to involve English as opposed to directly connecting Spanish and Japanese pages. Finally, the data suggested English might be used more to broadcast than to receive cross-lingual information; however, the number of cross-lingual links to English pages was higher than, but not significantly different from, the number of cross-lingual links from English pages.3 The pilot study used a sample of 1,968 pages in Spanish, Japanese, and English about the Haitian earthquake at a single point in time. It began with a seed sample of 100 blogs in each language and expanded the set by following all off-site links to pages mentioning Haiti and earthquake. The present study allows analysis of how the network of links changes over time by collecting blogs over a longer period and capturing the date each blog was published. By aggregating a much larger initial set of blogs and not expanding outward from this set, better conditions are established to measure insularity, modularity, and other network properties.
This final result of the pilot study, while not significant, is consistent with findings about the diffusion of television and would suggest bloggers writing in English are generally less aware of foreign language content than bloggers writing in other languages. Nordenstreng and Varis (1974) found television content flow was generally one-way in that a small number of countries exported but did not import content, while many countries imported television content without exporting much content. This also stands in line with the fear of linguistic complacency that Crystal (2003, p. 17) identifies as a danger of English as a global language and with a 2002 European Business Survey by Grant Thornton (cited in Crystal, 2003), which found the percentage of businesses with an executive able to negotiate in another language was much lower in the UK than elsewhere in Europe.
This study compares the number of cross-lingual links from and to blog posts in English, Japanese, and Spanish. The literature suggests the following hypothesis, which this study tests: fewer cross-lingual links will originate from English language blogs than either from Spanish or Japanese language blogs (H1). In addition to looking at links for the full time period, this study will also look at how the distribution of hyperlinks changes over the 45-day period following the earthquake. In particular, this study will test the hypothesis that bloggers' awareness of foreign-language content as measured by the separation between language groups will increase with time (H2). This follows the conventional thinking that if one blogger bridges a language gap, and bloggers read one another, then other bloggers may cite the same foreign source or the blog referencing it. That is, for each translation or cross-lingual link made, there should be a ripple or knock-on effect as additional bloggers become aware of the foreign-language content and possibly link to it or a blog citing it.
Who Connects Languages?
Mainstream media sources and others at times rely heavily on cross-lingual blogs to monitor foreign events. A survey of foreign correspondents in China found that nearly three times as many survey respondents followed English-language blogs on a daily basis as compared with Chinese-language blogs (MacKinnon, 2008). MacKinnon (2008) writes that “this suggests that English-language ‘bridge blogs' about China have greater direct influence on China correspondents than Chinese-language blogs” (p. 19). Thus, it is important to analyze who is creating the multilingual connections and the nature of these connections.
Within the blogosphere, multilingual bloggers may bridge language gaps by blogging about content in other languages. Qualitative evidence (e.g. Zuckerman, 2008) shows examples of cross-lingual or bridgeblogging,4 but how common it is and the nature of cross-lingual links remain unclear. Nevertheless, where it occurs qualitative studies suggest cross-lingual blogging “play[s] an increasingly important role in connecting [culturally and linguistically] disparate spheres of conversation and argument together [online]” (Zuckerman, 2008, p. 47).
Global Voices (http://globalvoicesonline.org/), founded by MacKinnon and Zuckerman, seeks to aggregate bridgeblogs and encourage translation between languages. Other services such as Meedan (http://news.meedan.net/) and Mojofiti (http://www.mojofiti.com/) also seek to encourage cross-lingual blogging through a combination of machine and human translation. This study examines the impact and the importance of encouragement in such communities.
The meaning of cross-lingual hyperlinks in the blogosphere has not been previously studied. Links are often considered a form of citation, and Benkler (2006) suggests the prevalence and importance of linking to sources online is part of a “see for yourself” link culture. This culture and the general linking structure of the Internet, Benkler argues, mitigate against polarizing and fragmenting forces and actually form a more egalitarian landscape online than is possible with the skewing forces of capital investment required in the mass-media sphere. However, Benkler only cites examples from English language websites, and it is unclear to what extent a “see for yourself” culture can overcome fragmentation tendencies between languages online. Hargittai et al. (2007) found a diversity of types of links through a qualitative coding of cross-ideological links in the U.S. political blogosphere suggesting reasons for creating cross-lingual links may be equally varied.
Building upon the recognition of Zuckerman (2008) and Hargittai et al. (2007) that there are meaningful differences between bloggers and types of hyperlinks, this work examines all blogs posts with cross-lingual links qualitatively. The type of author creating the cross-lingual link and the nature of the link are classified by human coders in order to test the hypothesis that blogs with multiple authors, professional affiliation, and/or higher traffic are more likely to create cross-lingual links (H3). Categories are developed through analysis of data from a pilot study and refined through the coding process.
The next section will describe the methods used to test the three research hypotheses in reference to English, Spanish, and Japanese blogs about the Haitian earthquake. Based on the literature above, the three research hypotheses are that there will be fewer cross-lingual links from English than either from Spanish or from Japanese (H1), the awareness of foreign-language content will increase with time (H2), and blogs creating cross-lingual links will more likely have multiple authors, professional affiliation, and/or a high amount of traffic (H3).