Nowadays, there is little controversy over how to deal with Hebrew on the Internet. The consensual solution has two institutionally interrelated but technologically distinct parts, each of which deals with a different aspect of the problems presented above. As I shall explain below, the solution involves using Unicode to solve the problem of encoding and logical Hebrew to solve the problem of directionality.
The recognized solution for working with the problem of directionality in Hebrew websites is to build them using logical Hebrew. As mentioned, this is now the standard used in government websites, and, in addition, it was incorporated into HTML 4.01 as far back as 1999.13
The dominance of logical Hebrew nowadays has a lot to do with Microsoft. Microsoft wanted to enter the Middle Eastern market and quickly learnt that there were a number of ways of dealing with Hebrew (and Arabic) directionality issues. It understandably wanted one standard to be agreed upon that it could subsequently integrate into its operating system, but expressed no opinion as to which standard that should be. Indeed, this was Microsoft's policy in every country it entered. The standard that was decided upon in Israel was that of logical Hebrew, which was subsequently integrated within Windows. The fact that around 90% of the world's desktop computers have some version of Windows installed is part of the explanation for the dominance of logical Hebrew as the consensual standard. In other words, while Microsoft may have brought the issue to a head, it did not offer a priori support to one standard over another.14
Moreover, as powerful as Microsoft has become, its success has depended on the personal computer (PC) replacing mainframe computer systems. This suggests that the dominance of logical Hebrew can be partly explained in terms of actor-network theory (ANT), and particularly its recognition that nonhuman actors, termed actants, may play as important a role in constituting technology as human ones. As John Law puts it, a successful account of a technology is one that stresses ‘the heterogeneity of the elements involved in technological problem solving’ (Law, 1987). ANT also teaches us to view technologies as contingent ways of dividing up tasks between humans and objects (Latour, 1992). Accordingly, it seems appropriate to introduce into this analysis two objects—the dumb terminal of a mainframe computer system, and the personal computer—a move which requires a small amount of explication.
Simply put, representing visually stored text on a screen demands much less computer power than representing logically stored text. Indeed, dumb terminals—display monitors attached to mainframes with no processing capabilities of their own—were barely capable of doing so. Visual Hebrew was an apt solution for the world of mainframes. Personal computers with their own microprocessors, however, offered much more computing power and were more than capable of putting logical Hebrew on screen. Because using logical Hebrew saves time at the programming end—it cuts out the ‘flipping’ stage, because you do not need to turn the Hebrew back to front—programmers preferred it.15 In terms taken from ANT, therefore, we can understand the shift from mainframes to PCs as also involving the redistribution of a certain task, namely, ‘flipping’ Hebrew text: The dumb terminal was not able to do so, and so a human had to do it; the PC, however, is able to take on that task by itself.
This, however, merely begs another question to do with the timing of the dominance of logical Hebrew in the Internet: If ‘everyone knew’ that logical Hebrew was better, and if Microsoft started integrating it into its operating system from the early 1990s, how do we explain the Israeli government's decision in 1997 that all government sites must be written in visual Hebrew? Why was logical Hebrew not adopted for use on the Internet at this stage?
Again, the answer does not lie with the technological superiority of one solution over another, but rather with a particular aspect of computing at the time, namely, the political economy of web browsers. In 1997, the most popular Internet browser by far was Netscape Navigator, a browser that did not support logical Hebrew. In other words, sites written in logical Hebrew would simply not be viewable in the browser that most people were using. Netscape Navigator did support visual Hebrew, however.16 Therefore, if you wanted to write a website that would be accessible to the largest number of people—users of Netscape Navigator—you would do so using visual Hebrew. Netscape Navigator only introduced support for logical Hebrew in version 6.1, which was released in August 2001. By then, however, Microsoft's own browser, Internet Explorer, had attained supremacy, and it, of course, had full support for logical Hebrew. Previously, then, there had been a trade-off between ease of website design and number of potential surfers to that site. With the emergence of Internet Explorer, there was no longer any need to make that trade-off.
In summary, for quite understandable technical reasons, programmers and web designers have long preferred to produce sites using logical Hebrew. However, the dominance of that standard has more to do with the shift from the dumb terminal to the PC, Microsoft's interest in the late 1980s and early 1990s in global expansion, and the success of that company's browser, at the expense of Netscape Navigator, than it does with its inherent technological qualities.
The consensual solution to the problem of encoding has been provided by the Unicode Consortium, whose website declares: ‘Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.’17 In other words, instead of each set of scripts and alphabets requiring their own code sheet, Unicode provides one huge code sheet for all of them. It offers a standardized way of encoding all documents in all languages. As Gillam writes in his guidebook on the subject, Unicode solves the problem of encoding
by providing a unified representation for the characters in the various written languages. By providing a unique bit pattern for every single character, you eliminate the problem of having to keep track of which of many different characters this specific instance of a particular bit pattern is meant to represent (Gillam, 2003, p. 7).
Put even more simply, Unicode makes it impossible that the same number might refer to more than one character, as we saw with the example of code number 240 being able to represent both the Greek letter pi and the Hebrew letter nun. Unicode also incorporates logical Hebrew within its standards, thereby providing a comprehensive solution not only for Hebrew, but also for Arabic (another script written from right to left), Greek, Russian, Swedish, Japanese, Swahili, and, so says the Unicode Consortium, every language on the planet.
Unicode has been widely implemented in websites. In January 2010 Google reported that 50% of web pages were encoded in Unicode,18 and today over 70% of the web's 10,000 most visited sites use Unicode.19 It is also used in the Java programming language (which, among others, is used to program web applets, which in turn include Flash movies and browser games). In addition, it is built in to XML,20 which lies at the foundation of Microsoft Office and Apple's iWork software. Google is another technology giant that has adopted Unicode, where this has enabled it to roll out services in languages other than English. For example, having been released in April 2004 in English, Gmail was made available in Hebrew in May 2006;21 in November 2008 Google Calendar was offered in Hebrew (and Arabic);22 in August 2009 Google started offering search results in Hebrew and Arabic on mobile devices;23 and in September 2009 Google Sites started operating in Hebrew and Arabic.24 Google itself has said that the reason for its adoption of Unicode ‘is to give everyone using Google the information they want, wherever they are, in whatever language they speak’.25
This is not to say that the user experience for users of non-Latin scripts is immediately equivalent to that of English speakers. Shortly after the release of the iPad in April 2010, for instance, Elizabeth Pyatt, wrote a blog entry about the state of play of Unicode in those new devices. Her overall impression was that there was still work to be done, but she acknowledged that improvements appeared to be on the way (Pyatt, 2010). For example, the ability to input Arabic was not supported on first generation iPads immediately upon their release, though one could buy an Arabic keyboard from the App Store, and an official update from Apple in November 2010 included a native Arabic keyboard. One blogger's first impression was that, in general, the multilingual capabilities of the iPad lagged behind those of the iPhone (Gewecke, 2010). However, this was attributed to Apple's wanting to release the product quickly rather than lacking the required technological know-how for dealing with Arabic. Non-English speaking users of Android mobile devices have also experienced certain difficulties: While the Android operating system uses Unicode, not all fonts are preinstalled on all devices, meaning that not all languages are equally accessible on them.26 As noted by John Paolillo, Unicode encoding ‘causes text in a non-roman script to require two to three times more space than comparable text in a roman script’ (Paolillo, 2005b, p. 47), which costs might be ‘enough of a penalty to discourage use of Unicode in some contexts’ (p.73). So readers and writers of non-Latin scripts would still appear to be at a disadvantage when it comes to using the newest communication devices. However, the adoption of Unicode by the world's largest technology companies means that this disadvantage is both smaller and much shorter-lived than that faced by the community of Hebrew-language webmasters in the early- to mid-1990s.
The appearance and increasing popularity of Unicode have been met with widespread (but not universal) approval across the computing industry. Indeed, articles in newspapers and trade magazines greeted the emergence of Unicode extremely warmly (for instance, Ellsworth, 1991; Johnston, 1991; Schofield, 1993). For example, one article in a trade magazine talks about how Unicode is ‘bringing more of the world in’ (Hoffman, 2000). The Unicode Consortium itself claims that ‘[t]he emergence of the Unicode Standard, and the availability of tools supporting it, are among the most significant recent global software technology trends’ (Unicode Consortium, 2004). In a book on Unicode for programmers, Tony Graham deploys Victor Hugo to describe it as ‘an idea whose time has come’ (T. Graham, 2000, p. 3), and says that for him, a computer programmer, it is a ‘dream come true’ (p. x).
However, there are three elements of the discourse surrounding Unicode which call for critical attention: the first is the determinist tendency to represent Unicode as the next stage in a natural process; the second is the existence of alternatives to the dominant technology; and the third is a discussion of the technology's purported ‘impacts.’
The first element—the tendency to see Unicode as the outcome of an almost natural process of development—can be seen, for instance, in an article by a programmer and writer who terms Unicode ‘the next evolution’ for alphabets (Celko, 2003). In their books, both Graham and Gillam present Unicode as the natural and obvious solution to a problematic state of affairs. In keeping with technological determinist ways of thinking, they talk as if Unicode was out there waiting to be discovered.
One way of presenting an alternative to this discourse is to use the concept of ‘relevant social groups,’ and particularly Pinch and Bijker's assertion that ‘a problem is only defined as such, when there is a social group for which it constitutes a “problem”’ (Pinch & Bijker, 1987, p. 414). To this we could add that the relevant social group also needs to have the means of formulating and disseminating a solution. Unicode expert Tony Graham, for instance, offers an explanation of the origins of Unicode:
The Unicode effort was born out of frustration by software manufacturers with the fragmented, complicated, and contradictory character encodings in use around the world. […] This meant that the ‘other language’ versions of software had to be significantly changed internally because of the different character handling requirements, which resulted in delays (T. Graham, 2000, p. x).
This is borne out by other histories of Unicode, which locate the origins of its invention in the difficulties of rendering a piece of English-language software into an Asian language. And indeed, the Unicode project was initiated by programmers at Xerox and Apple.27 The most obvious relevant social group, then, is clearly that of computer programmers working in multilingual environments.
Their superiors also had a clear interest in Unicode insofar as it can dramatically reduce the time it takes to turn software from English into another language. For instance, the American version of Microsoft Windows 3.0 was released in May 1990, but the Japanese version was shipped only 18 months later. Partly as a result of the technology being discussed here, the English and Japanese versions of Windows 2000 were released on the same date. Unicode is thus represented as aiding in computer companies' internationalization efforts.
However, another social group can be identified which is enjoying the spread of Unicode, but which could not have developed it itself. This group is made up of librarians, who have long been working with large numbers of scripts. The implementation of Unicode means that American students of Chinese literature, for example, can search for titles and authors using Chinese characters without having to guess at how they might have been transliterated into Latin text. It also means that libraries with multilingual holdings can maintain them all in a single database, which has obvious implications for more efficient information management and searching (see Nichols, Witten, Keegan, Bainbridge, & Dewsnip, 2005 for a description of Unicode-based software for libraries with multilingual holdings).
The concept of relevant social groups can also be deployed in order to understand the limits of Unicode, whose focus has largely been on scripts used in business. For the core membership of Unicode, the problem that it purports to solve is that of internationalization. However, alternative relevant social groups, such as UNESCO and the Script Encoding Initiative,28 attribute different meaning to the project. For the latter, the importance of successfully integrating a minority language into Unicode has nothing to do with business; rather, it ‘will help to promote native-language education, universal literacy, cultural preservation, and remove the linguistic barriers to participation in the technological advancements of computing.’29 Moreover, the reasons given by such groups for the absence of minority languages from Unicode are explicitly political and include references to the relative poverty of speakers of minority languages, the obvious barriers they face in attending standardization meetings and drawing up proposals, and the fact that they do not constitute a large consumer base. Indeed, as Gee notes, citing Anderson (2004), ‘[w]hile the business interests have been actively behind much of the character encoding. … advocates for the lesser-known scripts have not had a similarly strong presence among the Unicode Consortium membership’ (Gee, 2005, p. 249).
UNESCO devotes a large section of its website to the issue of multilingualism on the Internet, and its framing of the issue is patently clear. A page entitled Multilingualism in Cyberspace opens with the following sentence: ‘Today various forces threaten linguistic diversity, particularly on the information networks,’30 thus locating their interest in encoding issues in the field of language preservation. However, it is also framed as pertaining to the digital divide:
Increasingly, knowledge and information are key determinants of wealth creation, social transformation and human development. Language is the primary vector for communicating knowledge and traditions, thus the opportunity to use one's language on global information networks such as the Internet will determine the extent to which one can participate in the emerging knowledge society. Thousands of languages worldwide are absent from Internet content and there are no tools for creating or translating information into these excluded tongues. Huge sections of the world's population are thus prevented from enjoying the benefits of technological advances and obtaining information essential to their wellbeing and development.
Likewise, linguistics expert and UNESCO consultant John Paolillo writes that, ‘[f]or the Internet to allow equivalent use of all of the world's languages, Unicode needs to be more widely adopted’ (Paolillo, 2005a, p. 73).
This is an example of ‘interpretive flexibility’ (Kline & Pinch, 1996; Pinch & Bijker, 1987). For one group Unicode is a way to simplify software internationalization and thus increase profit margins, while for another it is a means of preserving endangered languages and narrowing the digital divide. The former interpretation is currently dominant, though organizations such as the Script Encoding Initiative and UNESCO are trying to impose their interpretation as well.31
The second aspect that a student of technology must be sure to highlight is that of alternatives to the dominant technology; that is, we must avoid the tendency to see the ‘victorious’ technology as the only one in the field (Pinch & Bijker, 1987) and make room in our analyses for competing, though less successful technologies too. With the case of Hebrew, we saw how visual Hebrew constituted competition to logical Hebrew; with Unicode, the competition, such as it is, would seem to be coming from what is known as the TRON project, a Japanese-based multilingual computing environment.32 Raising a theme to which I shall return below, a senior engineer from that project sees the ascendancy of Unicode as directly linked to its support from leading U.S. computer manufacturers and software houses, who promoted Unicode for reasons of economic gain, not out of consideration for the end user. Indeed, the full members of the Unicode Consortium are Microsoft, Apple, Google, Yahoo!, HP, IBM, Oracle, Adobe, and Sun Microsystems.33 Their economic gain, it is argued, lies in the development of a unified market, especially across East Asia, which would ‘make it easier for U.S. firms to manufacture and market computers around the world.’34 Programmer Steven J. Searle, leading representative of the competing TRON project, makes the point, therefore, that Unicode did not become the dominant standard on account of its technological superiority alone (indeed, that in itself is questioned), but rather because of the alliance of U.S. firms supporting it.