“Speaking Shadows”: A History of the Voice in the Transition from Silent to Sound Film in the United States


Department of Anthropology
University of Toronto
19 Russell St.
Toronto, Ontario, M5S 2S2, Canada


In this paper I examine the media discourse surrounding the voice in the silent to sound film transition in American cinema. When the technologies of synchronized sound became widespread in the late 1920s the question of how this new technology would be incorporated into the well-established film culture was of great interest, revealing some of the underlying ideologies of language at the time. These discussions worked to stabilize the new sound cinema around an ideology of the voice, closely tied to an ideology of American society, which became less audible as it became more certain, leaving behind its now naturalized structures of voiced race, class, gender and ethnicity. [voice, technology, cinema, race, gender]

The magic of the familiar is that it often seems entirely natural. It is because of this familiarity that moviegoers can watch an entire film without once being aware that there is no necessary connection between the bodies they see on the screen and the voices they hear as emanating from those bodies. It is only in unusual circumstances that these fragile connections become noticeable, circumstances usually involving a mismatch between expected voice and expected body which makes their conjuncture appear unnatural and reminds us that film is a construction and not a simple reproduction. For example, some years ago I watched an old Jackie Chan film which had been dubbed in a series of startling Australian accents, disturbing my expectations of the (naturalized) American accent generally used in dubbing in North America (and of the connections between visible and audible race) and bringing the act of dubbing into view. Yet the cinema did not always contain a voice and body unified and unremarkable. In the early years of cinema, it developed as a form mostly free of the human voice (or, at least, free of the voice as unified with the screen image).1 When the technologies of synchronized sound became widespread in the United States in the late 1920s, then, the question of how this new unification of sound and shadow would be incorporated into the already well-established film culture was a matter of great interest, revealing some of the underlying ideologies of language current at the time.

In this paper I take a closer look at the development of this seemingly seamless joining of voice and body through an examination of some of the public discourses surrounding the voice in the transition from silent to sound films in American cinema. First, I take a look at how the voice has been considered in scholarly work, suggesting that there is a place for work that considers voice as a category beyond its use as the metaphorical location of authenticity. Next, I give a short history of the silent to sound transition itself. In the 1950s, the story of this period was told in the popular film musical Singin' in the Rain (1952). The story of an established silent film star and an aspiring actress finding their way through the uncertainty of the transition, Singin' in the Rain embodies many of the issues that surrounded the unification of voice and body in film: those of gender, ethnicity, race, and class. While the film was made twenty years after the transition, amidst very different social conditions, it is what most people think of when they think of this period. In this paper I will use it as a way into the historical issues through their mythologization as popular film, alongside an examination of commentary in popular film media of the time, such as Variety and Photoplay. Even before the full event of sound, film critics such as Edwin Justus Mayer (in an article titled “Speaking Shadows”) wondered whether

the blending of shadow and voice is really a desirable thing after all both to the public and the star. The possibility that Madge Kennedy will perhaps be heard saying, “I love you,” is not particularly alarming, for instance, because Miss Kennedy's charming voice helped her to achieve her stage triumphs. But with others how will it fare? When dainty ingenues are heard to speak with a Forty-seventh street (New York) accent? When burly heroes are heard to speak with a lisp? When granddames are heard to talk as though they lived, off screen, on Eighth avenue (also New York)? [Moving Picture Stories 1920]

Discussion of the “talkies” in the media revealed a number of propositions underlying ideologies of the voice and its connection to bodies and star personas. Not just anyone's voice was validated by the media as worthy of being heard and appropriate for mass consumption. This discussion revealed that the unity of body and voice was a desirable thing, but only when there was a “match” between the social meanings of the voice and those of the image. On certain occasions commentators found this match of body and voice in black actors and actresses. In the case of women stars, this match required the proper class accent and the proper ethnic accent (as above, where the prospect of an ingénue with a Forty-Seventh St. accent is off-putting); while the discussion on men centred on the voice as an expression of masculinity (therefore the horror of a lisping hero). The movie magazines of the time, then, exemplify the heteroglossia that Bakhtin (1981) discusses in the case of the novel. They took what many perceived to be a cacophony of voices and organized them into categories of success and failure. These discussions worked to stabilize the new sound cinema around an ideology of the voice, closely tied to an ideology of American society, one which became less and less audible as it became more certain, leaving behind its now naturalized structures of voiced race, class, gender and ethnicity.

Making Voice Audible

In cinema studies the voice is often the forgotten sister of the image. As Chion states, “discussions of sound films rarely mention the voice, speaking instead of ‘the soundtrack’ ” (1999:3)—a fact which necessitates a move towards, as Branston suggests, new “ways of thinking about voices in cinema in their perceived relation to the body” (1995:37). Chion suggests that voice is forgotten because it becomes submerged in speech: “By what incomprehensible thoughtlessness can we, in considering what after all is called the talking picture, ‘forget’ the voice? Because we confuse it with speech. From the speech act we usually retain only the significations it bears, forgetting the medium of the voice itself” (1999:1). That is, we take the voice at its referential face value. When it first appeared in cinema, however, the voice was not forgotten by commentators, nor was it only confused with speech or its content in words. It was hyperaudible—both entertaining and disturbing.

As Webb Keane describes it in his article on voice, “the concept of voice, meaning the linguistic construction of social personae, addresses the question ‘Who is speaking?’ ” (2001:268). Yet, when “voice” is used in anthropology, even linguistic anthropology, it often slides into metaphor. Amanda Weidman, in her study of the politics of voice in classical music in South India, suggests that “in anthropology especially, the voice [. . .] has been identified as a vehicle of empowerment, self-representation, authentic knowledge, and agency. The assumption that underlies this metaphorization of voice, a central tenet of western philosophy, is that the speaking subject is the ground of subjectivity and the source of agency” (2006:11). From this follows the use of “voice” to refer to a group's or individual's opinions and perspectives. Thus a title like American Silent Film: Discovering Marginalized Voices (Bachman and Slater 2002), where voice stands for the agentive presence of subjects, rather than a necessarily linguistic or auditory presence. Indeed, Weidman suggests that modern subjectivity itself “hinges on the notion of voice as a metaphor for self and authenticity and on the various techniques—musical, linguistic, and literary—by which particular voices are made to seem authentic” (2006:8). While the term authentic is itself seldom used in movie magazines, much of the discussion centers on these questions of voice and authenticity, within the context of an especially modern technology.

This association between the speaking subject and modernity has been traced by a number of those interested in language ideologies such as Bauman and Briggs (2003). As Weidman suggests, within the context of modernity there are two main ways in which the voice becomes conceptualized:

on the one hand, the association of voice with agency and sincerity is at the heart of notions of the rational subject; the voice in this sense is imagined as referring to, or directly expressive of, an individual, interiorized self. On the other hand, such a notion of voice is formed in relation to other voices that come to be labelled in their plurality “oral tradition”—those voices which call attention to performance, sound, and materiality and thus fail to privilege referentiality. [2006:8]

Miyako Inoue, in her studies of Japanese women's language (2002, 2003, 2006), analyzes an exemplary case of this, where the circulation of “fragments of female voices” by male intellectuals in the late 19th and early 20th century became essential to the intellectuals' development of themselves as rational modern subjects (2006:70). The movie magazines of the 1920s and 1930s develop both of these notions of voice. The rational speaking subject of the stars with “good voices” is developed in relation to both those whose voices draw too much attention to their materiality (accent, etc.) and those for whom the referentiality of the rational speaking subject is not allowed (e.g., black actors). In the voice, then, there is an excess beyond language. As Weidman suggests “ideas about the voice (what it should sound like, where it comes from, how it relates to a singer's or speaker's body, its status in relation to writing and recorded sound, etc.) are undoubtedly part of language ideology, just as they are part of a perhaps broader regime or politics of voice that includes but is not exhausted by language” (2006:9–10). As Feld et al. suggest “the physical grain of the voice has a fundamentally social life” (2004:341). Weidman suggests that there is a need to “imagine the voice itself as having a history. In doing so, we begin to think toward a critical anthropology of the voice, one that might destabilize [. . .] the representational metaphor of ‘voice’ ” (2003:2).

Just like the body, which is increasingly viewed as something which must be historicized and contextualized, “the voice” is not a natural object, but a discursive category (with physical manifestations) which is produced in particular moments in time and space. For example, we must remember that the voices discussed here are not being directly produced by human vocal cords, nor are the bodies discussed flesh and blood. Instead, they are previously recorded sounds being produced by speakers unified in the minds of audience members with images produced by passing light through film (see Figure 1). Yet the commentators of the time address them both as technological representations and as authentic reproductions of actors and actresses. With sound technologies (not just sound film), the category of voice expanded to include a voice separated from its owner's physical presence. A voice which is then reattached to a representation of its body in sound film. The unsettling of the status quo during the silent to sound transition allows us to denaturalize the fusion of voice and image in cinema as it currently exists, to remember that as Altman states, “recordings do not reproduce sound, they represent sound” (1992:40). It gives us an insight into “voice” as a category continually in development, with particular interactions with categories of gender, race, class, nation and technology. This, then, is one moment in the history of the voice.

Figure 1.

Western Electric Sound System Advertisement, Photoplay September 1929

“Out of the Silence”: A Short Introduction to the Silent to Sound Transition

While the voice is generally overlooked in studies of the cinema, in the consideration of the transition from silent to sound cinema in Hollywood it is hyperaudible. As told in stories about stars with failed voices and movies such as Singin' in the Rain, the transition was a period of upheaval, where the right voice could make or break a career. As Crafton states, “over the years the story [. . .] has been retold so many times that it has become a kind of urban legend [. . . with] the emphasis often on the effects of sound on individual actors—the great lover whose career was wrecked by a squeaky voice” (1997:1). While often described in terms of an abrupt change, the shift between silent and sound cinema was in fact more of an evolution than a strict division (Crafton 1997:4). In fact, the term silent film is a bit of a misnomer, as silent films were seldom silent, often including live or recorded music, sound effects, and other accompaniment. Throughout film's early history experiments in synchronized sound were developed and exhibited many times without much long-lasting success. The Film Daily 1929 Yearbook, in a special section “Sound Pictures: Revolutionizing an Industry” traces the beginning of the revolution to August 7, 1926 when Vitaphone (a synchronization system) was introduced with the film Don Juan at the Warner Theater in New York. Don Juan was not a talkie, as such, but it had a synchronized score and was shown with a short talking film. Film Daily described it as “electrical” and suggested that “through the industry the event was hailed as presaging a new era in entertainment.” Yet the beginning of the end for silent film is often dated a year later to the release of the movie The Jazz Singer (1927), a film starring stage performer Al Jolson singing in blackface, and to Al Jolson's second sound picture The Singing Fool (1928) which broke box office records. During the transition years (approximately 1927 to 1931), silent films were being made as well as talkies, and talkies often included only certain segments of synchronized dialogue and sound (percentages varied, allowing films to be advertised as “all dialogue” or “part dialogue”). It was only in the 1930–1931 season that distribution in the United States switched completely to full-sound features (Gomery 2005:2). Sound technology also moved into the international film world, often mediated by American engineers and companies, who saw their project as one of bringing a modern American unification of sound and image to the world (Thompson 2004).

While scholars such as Williams (1992) challenge the mythology of events which traces the changes in the cinema in terms of “bad voices” and “good voices”,2 an attention to both what was said about voices at the time and what has been repeated in the folk history can reveal some interesting ways in which voice in the cinema is imagined and related to ideas of embodiment, gender, race, and ethnicity. The story of the voice in the silent to sound transition, then, might be described using Miyako Inoue's term as a “socially powerful truth” (2002:393). Inoue uses this term in relation to Japanese women's language, suggesting that it “is a critical cultural category [. . .] a space of discourse in which the Japanese woman is objectified, evaluated, studied, staged, and normalized through her imputed language use and is thus rendered a knowable and unified object” (2002:393). The importance of voice as it travelled through industry magazines like Variety, fan magazines like Photoplay and the pages of the New York Times, is thus less whether John Gilbert's voice “recorded badly” or not, but how discussions formed the voice as a “critical cultural category.” An examination of the classic instantiation of this “socially powerful truth,” the movie musical Singin' in the Rain, might help to bring out some of the relevant issues.

The movie Singin' in the Rain is what many of us picture when we think about the transition from silent to sound cinema. In the film, Don Lockwood (played by Gene Kelly) and Lina Lamont (Jean Hagen) are silent film stars at the top of their game when the news about the instant success of The Jazz Singer disrupts their world. While Don's voice is just fine for pictures, Lina Lamont's is not (Branston describes it as a “signifier of all that is to be rendered outside the feminine norm: ‘harsh’, ‘over-loud’, with a Bronx accent clearly signified as vulgar and lower class” (1995:41–42); Crafton as “like a chainsaw” (1997:2–3); and Chion as “shrill, nasal, piercing” (1999:133)). After a preview showing of their first sound film is laughed out due to their melodramatic acting and Lina Lamont's unacceptable voice, Don and the studio enlist Cathy Seldin (Don's girlfriend, played by Debbie Reynolds, who, according to Crafton is “everything Lina is not. She is bright, spunky, and independent, has a golden voice, and she can dance!” (1997:2–3)) to ghost (dub) Lina's voice. The new version of the film is launched successfully, with Lina lip-synching a song at the premiere to Cathy's live accompaniment hidden behind a curtain. The deception is finally unveiled by Don and a studio head, when they draw back the curtain to reveal Cathy singing. Thus, the proper connection between voice and body is restored, as Cathy wins Lina's career and screen partner (despite the fact that there seems to be little evidence that she shows well on film).3 As Chion describes it “the audience [at the film premiere] understands and attributed the voice to its true body. [. . .] The voice carries the day in this strange contest where men, those who decide whether to raise or lower the [. . .] curtain, play at being masters of the voice” (1999:133).

Matching the Voice to the Image

Singin' in the Rain brings to the forefront one of the key issues in discussions of the melding of voice and image in the talking picture: that of the need for a match between the sound of the voice and the look of the body. Not only was this match considered necessary, but, as obvious from Lina's loss of her film career, the voice and the body had to belong to the same star.4 As Chion points out, “the sound film [. . .] is dualistic [. . .]. The physical nature of film necessarily makes an incision or cut between the body and the voice. Then the cinema does its best to restitch the two together at the seam” (1999:125). There is no necessary connection between the voice and the body on the screen, although, as Chion states “we are often given to believe, implicitly or explicitly, that the body and voice cohere in some self-evident, natural way” (1999:126). Instead, this connection is created through the techniques of cinema and the discourses surrounding it.

While dubbing was used temporarily as a stop-gap, the idea that there could be two people involved in the sound cinema, one providing the image and the other the voice, was discounted by film critics and those involved in film production as untenable. Eyman quotes Sam Warner as stating that “the audience would soon discover what was happening and resent the imposition . . . suppose the thing could be faked . . . what a battle there would be between the face and the voice for the money!” (1997:80). An article in Variety describes the difficulties one British production company was encountering with the issue:

Ghosting dialog for dumb stars has British International in for a lot of razzing. Company has had to explain that its first all-talker, “Blackmail,” just trade-shown, had to double Anny Ondra, Czechoslovakian, as Anny no speak English and picture was made into a talker after having begun as a silent production.

Joan Barry stage actress, doubled and got no screen credit [. . .]

Company officials say Hollywood used to double hands, legs and even whole bodies and never gave screen credit to doubles–so why the squawking on voices.

Press boys and stage actors reply that voice is the chief added attraction in a talker; when an English stage actress is doubling for a dumb bohunk in a British production they're against plugging the outsider and giving the native the blue pencil.

Patriotic argument usually wins over here and producers are burning plenty about how the mob will take all this. [Variety 1929a]

Here, the voice is presented as naturally connected to the image (this, presumably, is the reason no credit is given to the voice double—to credit her would be to destroy the illusion of unity and thus discredit the picture).5 To separate them, while a seemingly straightforward solution to the problem of a non-English speaking star, generates patriotic outcry. The reason the voice doubling generates an outcry not occasioned by the doubling of hands and legs is not just the added attraction of the voice (after all, legs are an attraction), but the modern location of authenticity and self in the voice, as discussed earlier. To let one's legs go uncredited does not offend one's personhood in the same way as letting one's voice go uncredited would.

It is this assumption of a natural fit between body, voice, and self which shapes the intense focus on the act of hearing the voices of the stars. If the voice is an expression of inner personhood, then it must match the outer persona, in order to maintain a unity of self. Thus James R. Quirk's editorial in Photoplay (1929c):

Colleen came through her test with a voice that matched her sweet personality, and from Clara's voice the sound apparatus returned a pert echo that fitted her shadow self perfectly.

One hundred per cent was the report of the new gods of the studios, the sound technicians, on the inimitable Marion, and the same judges said that Corinne's voice sounded like Corinne looked. You cannot ask any more than that and expect to get it.

The ideal result to a sound test is a voice that “matches” or “fits.” Here the voice is something beyond words, Clara Bow's “pert echo” which fits her “shadow self.” As Herb Howe in “Hollywood finds its Voice” flippantly writes “one must cultivate a pleasing voice, or at least one that matches one's pictorial personality, like the perfume or the cigarette” (Photoplay 1928b).

This idea of a match between stars' voices and their bodies is audible behind the outcry over the matches that failed. William deMille, a director, summed it up by stating that

Many delightful young women lose all their charm the moment their voices are heard; stalwart “he-men” may shed their virility with the first sentence they speak; the rolling Western “r” gives the lie to an otherwise excellent “society” characterization, and uncultured enunciation destroys the illusion created by beauty. [Scribner's 1929 quoted in Crafton 1997:450]

Here greater authenticity is located in the voice, as hearing it can destroy the “illusion” created by the image. Likewise, the critic George Nathan makes veiled references to Clara Bow, Mary Pickford, and Greta Garbo, in an acid description of the disillusioning effects of hearing disappointing stars' voices:

The yokel who once imagined that the Mlle. X., were she to whisper to him “I love you,” would sound like a melted mandolin, now hears his goddess speak like a gum-chewing shopgirl. The worshipper of the Mlle. Y.'s seductive girlishness now beholds her, in the grim, hard light of the talkies, to be a middle-aged woman with the voice of a middle-aged woman. The farmhand who once dreamed of the Mlle. Z. as an exotic and mysterious dose of cantharides will now see her simply as a fat immigrant with deradenoncus and over-developed laryngeal muscles assisting in the negotiation of pidgin-English. Valentino died in time. Think what would have happened to his flock of women admirers if the unsparing lighting of the talkies had betrayed his imminent baldness and the movietone his bootblack voice. [American Mercury 1929 quoted in Crafton 1997:451]

Despite the talk around disappointingly non-matching voices and bodies, though, the melding of sound and sight was envisioned as opening new and wonderful doors for the cinema. Mrs. Denison Clift, a director's wife, is quoted in Variety as making “quite an impression” with her remark that “there is a secret chamber in the heart of every human being that can be opened only by the human voice, and that is what your sound and effect pictures will do if properly handled” (Variety 1928a). Edmund Goulding, a director, states his hope that with the sound film

Now, the whole will be presented, for the story, being seen and heard, will be fully sensed for the first time in human history. This theatre of the future will completely picture human life. The world and all its human mind and soul reactions, every detail of its drama–its intensity, throbs, holiest emotions and worst iniquities will be, not merely thinly imitated, but will be reproduced in actuality, including sound! [Variety 1928e]

and an advertisement for Vitaphone in Photoplay (1929b; Figure 2) read as follows:

Figure 2.

Advertisement for Vitaphone, Photoplay January 1929

Audiences are saying it, everywhere—At last, “pictures that talk like living people!”

Vitaphone recreates them All before your eyes. You see and hear them act, talk, sing and play—like human beings in the flesh!

These commentators all praise the sound film as being particularly “human.” Here, the authenticity which resides in the speaking subject, “the human voice” which can open “the secret chamber of the heart,” is central. To speak (and to hear the voice) is to be human. Yet, as is clear elsewhere in these magazines, just to “talk like living people” is not enough. Only some kinds of voices from the world's drama were heard as being appropriate for an on-screen melding of sound and image.

The Jazz Singer: Race and Voice

In Singin' in the Rain the catalyst of the silent to sound transition is the film The Jazz Singer. The Jazz Singer is less a talking picture than a singing picture. While it has a few segments of synchronized speech, The Jazz Singer is anchored by its star Al Jolson's singing performances, most of which he does in blackface. In The Jazz Singer, Al Jolson plays a young Jewish man, the son of immigrants, who is caught between his father's wish for him to be a cantor and his own desire to be a jazz singer and burgeoning career as a blackface performer. Thus, the voice enters film surrounded by tensions of ethnicity and race.

Michael Rogin (1996) has suggested that blackface in films in the late 1920s and early 1930s served to Americanize immigrants. It “accepted ethnic difference by insisting on racial division” (Rogin 1996:56). That is, immigrant characters such as those played by Al Jolson become American through their performance of blackness. But while films like Singin' in the Rain and The Jazz Singer leave actual black actors out of the picture, Alice Maurice (2002) suggests that some critics in the twenties and thirties saw African American voices as particularly appropriate for the new talking pictures. In 1929 the first feature-length movies with all-black casts were released; “both were musicals, and both capitalized on the combined ‘novelty’ of an ‘all talkie’ and ‘all Negro’ spectacle” (Maurice 2002:31). If the ideal was a voice which matched the image perfectly, then black voices were heard as the perfect match. Maurice quotes from a reviewer named Robert Benchley who articulates the suitability of black voices for the sound cinema in his review of Hearts in Dixie(one of the two aforementioned musicals):

With the opening of “Hearts in Dixie” . . . the future of the talking-movie has taken on a rosier hue. Voices can be found which will register perfectly. Personalities can be found which are ideal for this medium. It may be that the talking-movies must be participated in exclusively by Negroes, but, if so, then so be it. In the Negro the sound-picture has found its ideal protagonist. [Opportunity: A Journal of Negro Life 1929 quoted in Maurice 2002:32]

Unlike some white male voices which did not fulfill the promise of appropriate gender made by their body (the “stalwart he-men” who “shed their virility with the first sentence they speak” described by deMille), “in the case of black voices, the action is reciprocal: colour/race promises a particular kind of sound, and that sound, once heard, is supposed to refer back to the colour/race that produced it” (Maurice 2002:33). The New York Times review of Hearts in Dixie (which described it as an “Outstanding Achievement in Dialogue and Singing”) comments extensively on the authenticity of the film, its racehorses, steamboat, and its sound:

The same fidelity to detail is conspicuous in those scenes in the cotton fields where the black folk are heard singing as they go about their work with no great haste. The steamboat's whistle is heard and so that it will seem all the more natural the whistle becomes fainter and as the camera leaves it and approaches nearer the Negro workers their melodies become louder, but never so loud as to spoil the effect. The vocal renditions and the talking are a tribute to the patience of the producers. [New York Times 1929c]

Yet what was audible as authentically black and “wonderfully natural” was not only a particular timbre of voice, but also a particular racially marked accent. Thus, an advertisement for Hearts in Dixie in the New York Times hails the audience with a Southern accent: “Goin' South! All Abo'd Fo' Vickburg-Natchez N'awlins! The Fields of cotton and the land of Song! HEAR the happy beat of ‘Hearts in Dixie’ ” (1929a; Figure 3). Concerning another film, Herb Howe gossips in Photoplay that “the Christies hired a troupe of colored players from a Los Angeles theater and a white man had to be engaged to tutor them in Negro dialect. I guess they'd never heard of Mammy. She's in the cold, cold ground so far as they're concerned. And so goes another illusion with Santa Claus” (1929i). The “authentic” accent is not necessarily the one you use in daily life, but the one which matches up to audience expectations. African American actors, then, had matching voices when they had an accent, when their voices were audibly marked as a particular kind of African American (generally speaking, Southern).

Figure 3.

Advertisement for Hearts in Dixie, New York Times, March 7, 1929

Yet while African American performers' voices may have been heard to “register perfectly,” the racial segregation of the movie industry guaranteed that they would not be the main players in the new sound film. For instance, Hearts in Dixie was not promoted in the same manner as films with mostly white casts. The previously quoted New York Times review is part of a review spread that includes publicity stills from the other films and an illustration for Hearts in Dixie. Likewise, the advertisements for the film elsewhere include no photos of the film's stars, but instead only illustrations. While ads for films with non-black stars in the New York Times at that period included both illustrations and photos of stars, ads for Hearts in Dixie, with an almost all-black cast, included no photos. While black actors like Stepin Fetchit (the stage name of Lincoln Perry) became famous and their voices may have been lauded, they were not accorded the same star status. Those positions would be reserved for actors whose issues of voice would be discussed in terms of gender, ethnicity, and class, naturalizing the role of their whiteness.

Recording Gender

As discussed above, the ideal during the transition and in the new sound cinema was that of a union of voice and body, where the voice matched the expectations raised by the body. These expectations were, in many ways, determined by discourses around appropriate gender performance. Weidman (2003) has discussed how in the development of Classical Indian music in South India in the early twentieth century, the idea of a disembodied recorded voice allowed certain groups of women to participate in musical performance and recording—stating that “in South India, the disembodied female voice came to be thought of as the essence of music itself” (Weidman 2003:22).6 She references a male reviewer who suggested that one reason “for women's vocalists' rising popularity was the gramophone. Indeed, in the early years of recording in South India, more records were made of female vocalists than male vocalists (Menon 1999:74). The recording microphone ‘favoured’ female voices perhaps because their higher pitch made gamakas more easily audible when reproduced” (Weidman 2003:20–21). In contrast, as she goes on to point out, the opinion on the suitability of gendered voices for recording was exactly the opposite in the West.

Indeed, media references around the transition implied that women's voices were less suited to the technology than men's.7 This was not a phenomenon exclusive to the cinema, as McKay has described how female radio announcers were initially unsuccessful, with listeners making comments about the unsuitability of their voices such as “that women depend upon everything else but the voice for their appeal. Their voices are flat or they are shrill, and they are usually pitched far too high to be modulated correctly” (quoted in McKay 1988: 200). Crafton suggests that weakness in the voice (for which tutors were being hired in the early days) was “almost always a female trait. Their travails when they faced the technology of the recording microphone was said to be physiological:

Most of their [the USC experts'] effort will be concentrated on the feminine player. “Women, more than men,” states one of the professors, “will be forced to [take] intensive and scientific training for talking pictures, because of a simple scientific fact. The voice of a man is naturally heavier, vibrating at between 100 and 300 vibrations a second, while woman's goes up to around 500 to 700. At this vibration the sibilant sounds, such as the ‘s’, ‘z,’ the hard ‘o,’‘x,’ and ‘p’ become hisses or blasts, as they are vibrated at a higher speed than the balance of the vocal sounds” that is why, he explains, “few soprano singers have succeeded in making successful phonograph records.”[Mayme Ober Peak, quoted in Literary Digest, 20 October 1928, quoted in Crafton 1997:453]

Ruth Waterbury, in Photoplay (1930a) makes a similar suggestion that “it is easier for a male star to succeed in talkies than it is for a female star” because of scientific differences between men's and women's voices:

[Science] knows the average female voice is just an octave—that is, eight notes—above the male voice. It knows, likewise, that the bass voice has the greatest auditory range; the tenor next; then the contralto; then the soprano. This makes male voices easier to reproduce than female voices and bass voices better than tenors and contraltos better than sopranos. Yet, just to be contrary, the greatest personality voices are those of tenors and sopranos.

As the tag line of Waterbury's article states, the sound movie is a “wedding of science and romance,” with science playing the part of the groom and romance the part of the bride. Science, then, must examine its bride. This trope appeared not only in Photoplay's reporting, but also in its fiction set in the movie industry. During the transition period, at least three different stories traced the troubles of a female star and her romance with a man involved with the recording industry. In “The Broad A Baby” (Photoplay 1929g) film star Brenda meets a recording engineer at a studio party. A romance follows, but trouble ensues when the voice training she has been receiving fails to garner her good reviews. Unlike Singin' in the Rain, where the recording engineers themselves are not important characters, in these stories they are central. Here, the negotiations in the industry are played out on the stage of romance, with science represented by a man and the voice by a woman. And if the voice has troubles, they must be female troubles. Yet, for all the discussion of the difficulties of recording women's voices, the one specific star Photoplay incessantly discusses as having troubles with recording their voice is John Gilbert, a man.

These stories about the difficulty of recording women's voices strongly contrast with the fact that, as discussed earlier, African American voices were talked about as particularly well suited to recording technologies. For example, an advertisement for Hearts in Dixie suggested that “Stephen [sic] Fetchit [is] the funniest thing imaginable, his dialect recording perfectly via Movietone” (New York Times 1929b). Likewise, Bill Foster, an actor and agent, claimed that “tests proved one great outstanding fact—the low, mellow voice of the Negro was ideally suited for the pictures” (Maurice 2002:45). What Maurice does not discuss is whether this “low, mellow voice” is a particularly black male voice, or whether black women were also heard as evading the apparent difficulties of recording. In any case, while white women's voices may have been seen as presenting more difficulties to the technology, their bodies were an important part of the star system and they could not be cut out.

Speaking American: Gender and Accent

Beyond the question of recordability, then, issues of gender-appropriate speech were still in play. Different aspects of their voices were considered inappropriate sounding for men and women. Generally speaking, inappropriate voices for female stars were those that too strongly indexed working class and non Anglo-American ethnicity. Thus, in Singin' in the Rain, Lina's unsuitable voice is very much a classed (particularly New York) one. As Rogin describes it, the end of Singin' in the Rain“reunites image to voice by freeing the all-American girl from the older ethnic woman whose voice she has been ventriloquizing” (1996:205). Just as Rosina Lippi-Green has suggested in the context of Disney films, “to be truly sexually attractive and available [. . .], a character must not only look the idealized part, but he or she must also sound white and middle-class American or British” (1997:97).

The prominence that the issue of accent had in the transition speaks to the high visibility of immigration in the United States at the time. Between 1900 and 1914, more than 13 million immigrants arrived in the United States, mostly from Europe (Hollitz 2004:140), with an increasing percentage of them being Eastern and Southern Europeans. The early 1920s saw the increasing success of efforts to restrict this immigration. National and racial differences were key topics of discussion. The silent film industry had a number of actors whose first language was not English and/or who came from outside of North America. This was one of the common tropes in Photoplay for representing the movie industry. Thus, the following joke in Photoplay (1928a): “Just for the benefit of historians, I want to record the first talking picture gag. They are saying that a certain producer ordered a retake of a dialogue scene because he couldn't hear the ‘k’ in ‘swimming.’ ” This joke is, as it says, particularly a talkie joke. Its humor rests on the shared assumptions that producers speak with an accent (perhaps an Eastern European Jewish one?) and that the readers of Photoplay know better. In one talking short reviewed in Variety called “America or Bust” and described as “a gem,” the main character, “the diminutive Daphne is a cockney woman making her sixth attempt to crash Ellis Island, having missed the quota on five previous occasions. She wants to get in because ‘Arnold, her freckled kid, wants to see the Hinjuns’ ” (Variety 1930b). Here the accent of the immigrant matches her immigrant image in a particularly American story.

However, in many cases to be too much of an immigrant was not desired. As Rogin describes it, “cultural guardians feared early silent cinema as an immigrant menace to the dominant culture. Attending storefront nickelodeons and small-time vaudeville in their own neighborhoods, immigrants watched (in addition to foreign imports) depictions of life around them and comic violence against authority” (1996:78). As cinema became more popular, “the 1920s motion picture palace, with its narrativized feature, live orchestral accompaniment, lavish appurtenances, and mass audiences, silenced and incorporated the participant, immigrant crowds” (78). Many reviewers before the full advent of sound film comment on the possibility that the stars' voices would not match those that the audience has imagined for them. The unspoken assumption behind these worries is that the audience is imagining an idealized white Anglo middle-class voice, that their imaginations have been assimilated and that any disruption of this assimilation by immigrant or lower class voices coming from the mouths of stars would disrupt the audience's more general assimilation. Crafton too suggests that statements from producers such as Fox and Sarnoff that the actors could not speak properly reveal that “they expected the talkies to disseminate an ideal of cultural homogenization and assimilation through quality speech” (Crafton 1997:449).8 Thus, Vilma Banky's “decidedly heavy” Hungarian accent would not allow her to make sound films (Eyman 1997:267). Some of the vitriol in George Nathan's mockery quoted earlier (“The farmhand who once dreamed of the Mlle. Z. as an exotic and mysterious dose of cantharides will now see her simply as a fat immigrant with deradenoncus and over-developed laryngeal muscles assisting in the negotiation of pidgin-English”) depends on the identity of immigrant being negatively viewed. Certainly that attitude is also present in the earlier excerpt concerning “ghost voicing” (“when an English stage actress is doubling for a dumb bohunk in a British production they're against plugging the outsider and giving the native the blue pencil”). The accent marks the actress as an outsider and unable to be properly incorporated into the star system.

However, accents, if they fell properly into the matching of voice and image and personality were sometimes welcomed. Certain accents were seen to be particularly appropriate to comedy, as a few reviews of short comedy films in Variety speak favourably of “hebe dialect comedy” and of what seems to be a fake Spanish accent skit:9“Smith and Dale arrive in a gondola to keep a date. They spill some mild hebe dialect comedy before disembarking” (Variety 1929b); Nat Carr is described as “a hebe dialect comedian with a peculiar type of voice capable of glorifying eccentric lyrics [. . .] In his opening number ‘My Hungarian Rose’[. . .] Carr injects various Yiddish vocal expressions, also adding a great deal of interest in the offering, particularly surefire here” (Variety 1928c); Billy and Elsa Newell's piece “Those Hot Tamales” was written up as “good talkie material, registering nicely on appearance and voice [. . .] acts of this type seem to manage nicely in the talkers”(Variety 1928d).

Similarly, Alastair Phillips (2002) has explored Charles Boyer's (a native French speaker) move into the American sound cinema and how he managed to create an image of which his accent was a great part. Within this context, immigrant performers acted to “create exotic aural and visual appeal for domestic North American film-goers” (Phillips 2002:188). Charles Boyer developed a “commercially successful profile with US and international audiences as ‘the French lover’, and it introduced that key mixture of emotion, romance, Frenchness and authenticity that came to define his star persona in many of his subsequent Hollywood roles” (Phillips 2002:190). Likewise, Photoplay praises Adolphe Menjou in Fashions in Love, saying that he “breaks out with a voice, a French accent and the best performance he has given in many a movie moon. [. . .] His French accent is excellent, although he was born in Pennsylvania. Not a great picture but big entertainment” (1929e). Thus, if an accent was sufficiently “intelligible” enough and matched the star's screen persona and image, then the voice and the image could fuse together harmoniously.

“Lusty as Walt Whitman”: Masculinity, Voice, and Melodrama

A man with an accent, then, was acceptable. What was unacceptable in male actors was an insufficiently manly voice. As Chion discusses, the outfitting of a body with an inappropriate voice (in sex, age, expression) in film produces “a profound malaise” and is most popular in horror films “giving a hoarse and vulgar voice for example to the little girl in The Exorcist” and occasionally in comedy, where there may be “amusement in exchanging male and female voices” (1999:132). Singin' in the Rain contains one such scene at the initial showing of the sound film Don and Lina make together. At this showing, the sound loses synchronization at one point and Lina and the villain's voice exchange briefly, with the villain exclaiming “No, no, no” in her voice and she “Yes, yes, yes” in his. Branston suggests that while in cinema

the voice in both sexes seems to be more naturalised than, say, the face which is more readily seen as coded, as “made-up”[. . .] in both there are of course systematic codes at work, often inherited from theatrical melodrama, and often in relationship with assumptions about appropriate male and female voices in the rest of our social lives. The “evidentiality” of masculinity is often signified by a deep voice, and this in itself supports and recreates cultural over-emphases on real biological differences between men and women. [1995:38–39]

Men's voices which are heard as too high are a particularly popular topic in the myths of the silent to sound transition. John Gilbert, especially, is often cited as someone whose too high voice prevented him from successfully making the transition. This idea that a “somewhat high and nasal voice” (Chion 1999:12) would ruin someone's career only makes sense within the discourses discussed above of the perfect fusion of voice and image. As Photoplay exclaims:

Gilbert's voice! What about Gilbert's voice? What about the voice of the man who is virile as a steel mill, lusty as Walt Whitman, romantic as a June moon? Gilbert's voice! You heard it in ‘His Glorious Night.’ It is high-pitched, tense, almost piping at times. His friends have known for years that it was completely unsuited to the strength and fire of the man. [1930b]

So, Gilbert (and his image) is virile, lusty, and romantic, yet his voice (as recorded) is high-pitched and piping. A certain genre of masculinity is therefore equated with a certain timbre of voice and a mismatch generates instant disapproval. The appropriate voices, then, as described under the sub-heading of Voices by Edmund Goulding in Variety, are highly gendered in very particular ways:

The soft, insinuating voice of an Elsie Janis, the attractive utterance of a whispering Smith, the characteristic gruff shout of a policeman, voices which can imply so much more than their words say, will be sought-for treasures. Voices will be effective more because of their color and implication than because of any mere sound quality. Only when talking motion picture projection has been developed to a perfection not as yet attained will the quality and tone of the voice, its graded richnesses and tonal picturesquenesses be of interest to the public.

The girl who in a close-up can sing a soft lullaby to her baby and whisper—“good night, my darling,” in such a way that the camera might be listening in through the key-hole—she will be the new star. (Variety 1928e)

Here, women's voices are described as “soft” and male voices as “gruff.” The new female star will be one who uses her voice to express maternal sentiments and perform the approved care-giving in quiet tones, while the men will undertake the vocal work of maintaining law and order.

But it is not just the voice which must fit with the new balance between sound and sight. Williams suggests that the image also had to adjust, as men whose acting style was expressive (and somehow feminized) did not do well in the new film culture (1992:134–135). Thus, in Singin' in the Rain, Don's melodramatic pronouncement “I love you, I love you, I love you” in the first screening of his film generates laughter, not romantic sighs. Related to this, the voice and body action of crying became more limited for men, as “around the time of the transition to sound that male characters begin to cry far less frequently—and that crying begins to signify, not admirable sensitivity, but hysteria and sexual ambiguity” (Williams 1992:134–135). This change can be seen in a review of a short film Head Guy in Variety, which describes the actor Harry Langdon's “goof manner and dizzy gestures [as] always good for a laugh, but strongest when done in pantomime and without his verbal accompaniment” and states that “one of the weakest bits here and muchly in need of cutting was Langdon's ‘crying’ scene. Too drawn out and repetitious” (1930c).

Citation and Imitation

It may now be clear that throughout the transition, the movie magazines themselves managed a number of voices in order to report on the developments of the sound film, much like Bakhtin's heteroglossic novel which “permits a multiplicity of social voices” (263), “drawn in by the novelist for the orchestration of his themes and for the refracted (indirect) expression of his intentions and values” (292). In her examination of Japanese women's language, Inoue has looked at reported speech “as a product of the modern observer's social practice of listening and citing,” in particular, “how the male elite crafted narratives of the indexical order of linguistic corruption of schoolgirl speech and how this metapragmatic practice was a form of strategic containment to domesticate competing forms of Japanese modernity and modernization” (2006:70–1). In discussing language and alterity, Hastings and Manning (2004) have suggested, with reference to Goffman (1974), that we take a closer look at the analytic term of figures, especially cited figures and the “voices attributed to others—‘anti-registers’- [which] create monstrous or deviant figures of alterity, with respect to which the (normal) identity of the speaker emerges as a sort of unmarked ground to the figure of abnormal alterity” (304). How then are the multiplicity of voices cited by movie magazines in the transition?

These newly audible movie star voices appear in a number of ways in the fan magazines. Photoplay, of course, makes direct descriptive reference to voices, as discussed above, but they also appear as a cited figure and what Goffman might call a “mocking or say-for” figure (as Hastings and Manning refer to it, “acts of performative mimicry or ventriloquism with respect to individual persons or objects, but also including certain kinds of stereotyped ‘voices’ or ‘registers’ of social categories such as ‘baby talk, ethnic and racial accents, national accents, and gender role expressions’ (Goffman, 1974, p. 536)” (161)). Especially interesting are a series of interviews with stars by various Photoplay writers, which both cite the stars' marked voices and then reproduce them in the voice of the interviewer. Caught between mockery and playful imitation, these practices point to the underlying source of some of the tension behind many of the discussions of the voices on screen—the idea that the audience will reproduce in their own speech what they hear from the stars.

Take for example, “The films go Baby Talk” by Helen Huston, in which she cites actress Helen Kane and then reproduces her style: “ ‘Init silly?’ Helen went on. (Aw, gee, she's the only person in the wurruld can do it and get away with it)” (Photoplay 1929h) Or “A Jungle Lorelai” by Herbert Howe (an interview with African American actress Nina May):

“Oh, you the gentleman from Photoplay magazine?” her eyes bulged and her being jelled. “Um-um! I just love write-ups!”

“Um-um!” said I. “I just love being a writer-up!”[Photoplay 1929a]

Or “Stepin's High-Colored Past” also by Herbert Howe:

“Mah real nae is Lincoln Theodore Peary, yes-suh [. . .] understand what Ah'm talkin’‘bout?”

“Ah does,” said Ah. [Photoplay 1929f]

In these articles, actors outside the unmarked category (baby talk, African American) have their voices cited, then reproduced. These voices are not being cited to be censured (although they do come off as remarkably racist representations), like the schoolgirl voices Inoue discusses. Instead, they are being collected and found slightly out of the ordinary, but entertaining for that very reason. Organized into the economy of voices, they are acceptable, but not central, not the voices of the modern rational speaking subject. The humor in these pieces, then, comes from the tensions between the writer's voice, their “mocking” voice and the cited voice. The writer has heard and reproduced a voice that contrasts with their own, one that perhaps they ought not to have.

“See and Hear!”: Audience and the Voice

The issue at hand was not only that the movies were talking, but that the audiences were hearing. Thus, many early advertisements for sound films included the tag line “See and Hear!” Letters to the Editor in Photoplay (in their colorfully named section “Brickbats and Bouquets”) covered this issue of the relationship between the sound films and their audience. Arguments over what kind of voices and what kinds of accents were best suited for the movies often hinged on theories of influence. Tom Boellstorf (2003) has suggested that the late 1990s controversy over a broadcasting bill in Indonesia rested on fears that dubbing foreign television into Indonesian (rather than subtitling it) would make it too Indonesian and would encourage viewers to model their own behaviour on that of the dubbed actors. Discussions in Photoplay draw on a similar kind of argument—that voices coming from behind the screen acted as a model for everyday behavior.

For example, in a letter to Photoplay, Mrs. W. L. Johnston from Brooklyn writes:

Last night I took my little girl to see a splendid picture, showing the unselfishness of a young, courageous Marine. It was applauded by old and young, and, I am sure, each youngster expected to take pattern. But my objection to the picture was that such expressions as ‘ain't cher,’‘yez gotta,’‘yeh’ and ‘I don't wanna’ were frequently used when correct English could easily have been employed. Naturally little children, especially from foreign homes, think that is English. My little French neighbor told me the result of a football game was “nuttin’ to nuttin’ ”. [1929d]

With a different opinion on the talkies, but similar assumptions, H. P. Doughty writes from Los Angeles:

I have been going with a very nice young man, well mannered and gentlemanly, but who, owing to a very little education, made many and noticeable mistakes in grammar. How to correct him without hurting his pride?

Then the talkies came along and solved my problem! We went to see ‘Bulldog Drummond.’ I remarked about the excellent English used. Then came ‘Charming Sinners,’‘The Idle Rich,’‘Dynamite’ and others, all containing dialogue with impeccable grammar, and yet not seeming stilted or affected. I made it a point always to point out the excellent diction of this character or that one. And, believe it or not, after these several months the ‘ain'ts’ and ‘I seens,’ etc., have disappeared from my friend's speech, and the improvements are still going on! Thus the talkie solved my problem. [Photoplay 1930c]

Here the concern is that immigrants and the undereducated become appropriately assimilated to the standard voice. The questions about what kinds of voices would end up on the screen were not only about how stars should sound, but also about what kinds of voices should end up coming out of the mouths of Americans.


In conclusion, media discourses about the voice in the transition from silent to sound cinema in the United States reveal a concern for an appropriately embodied voice (and appropriately voiced body) based on an ideology of authenticity which continued on in a less noticeable naturalized form after the hyper-audibility of the voice dispersed. Ideas about appropriate gender expression and the link between gender and certain vocal styles were a large part of the discussion surrounding the meaning of these newly audible voices. These theories of the vocal embodiment suited for the new talking pictures also interacted with discourses about the embodiment of African Americans and their voices in a way that constructed them as particularly fit to play a certain part in the new films. Finally, anxieties over accent and nationality prevented some silent film actors from crossing over into the talkies and allowed others to flourish, if properly matched with their star image. A closer look at periods like this one is an important part of constructing a critical history of the voice. As technologies developed which expanded the category of the voice and detached it from its physical localization, media discourse on the sound film worked to reattach it, at least in the arena of film, with stars' bodies, while endeavouring to smooth out many of its social markers. But despite all this talk about a properly embodied voice, it was not the only ingredient necessary for Hollywood success. As one reviewer in Variety trenchantly states about a short featuring actor George Jessel: “Pretty much of a flop due to lack of material. Reveals George Jessel's voice as natural, rich, and melodious for recording but does not reveal Jessel as a funny man” (1928b).


  • Acknowledgments. I would like to thank Paul Manning and Miyako Inoue, three anonymous reviewers, members of the Publication Club, Bonnie McElhinny, Tania Li and members of the Dissertation Writing Workshop for their feedback and helpful suggestions for revision of this article.

  • 1

    When voices did appear, they were often visibly disconnected from the screen. For example, during the early years of cinema in Japan, performers called benshi often narrated movies live (Fujiki 2006).

  • 2

    Williams suggests that while “for example, the decline in the careers of certain stars of silent cinema, such as John Gilbert or Ramon Novarro, is said to have occurred because their voices ‘recorded badly.’ This is at best an obscure notion, except in the cases of players such as Vilma Banky who had such thick accents that their lines were difficult to understand. Listening today to Gilbert or Novarro, one is struck by how well their voices recorded. And even foreign accents were not an insurmountable barrier to success in the talkies, as Garbo's case demonstrates. Leaving aside the truly vocally impaired—probably only a handful of players at most—the ‘bad voices’ of the most notorious Hollywood stars were probably in part a product of their association with melodramatic sensibilité” (1992:134–135).

  • 3

    Branston points out that “one of the ironies is that Jean Hagen, acting the part of Lina Lamont, dubbed the singing voice of the Debbie Reynolds' character, Cathy Seldin” (1995:41–42).

  • 4

    Although, as Siefert (1995) discusses, studios often broke this rule in the case of movie musicals, where dubbing continued longer, while still being dispreferred in publicity.

  • 5

    There are also some obvious issues around ethnicity/nationality here, which will be picked up later in the article.

  • 6

    See also Amy Lawrence (1991) and Kaja Silverman's (1988) discussions of the disembodiment of women's voices in classic Hollywood cinema.

  • 7

    There is even an odd little article in Variety which suggests that even women's speaking movements are less suitable for the medium than men's: “Curious feature is that the dubbing has been made locally in the Gaumont studios under the supervision of Tiffany's local film expert, using the office personnel instead of actors to do the speaking parts. Result only passable for the women, but excellent for the men, whose lip movements are less pronounced than the girls” (Variety 1930).

  • 8

    This vision of the impact of sound film was not unique to America, as Damousi (2007) discusses in her account of negative nationalistic Australian responses to the American accent in American sound films.

  • 9

    I do not intend here to suggest that the framing of these accents as appropriate for comedy was any sort of grand progressive move forward, but simply that accents did have their place in the transition between silent and sound film.