Viral video: Describing online multimedia information flows



This paper examines why certain videos go viral while other languish and die. To answer this question metadata associated with YouTube videos was collected (over the course of three separate weeks in 2009) and analyzed. The results show that virality is intimately connected to how one defines popularity, as different components of the metadata highlight the importance of different sets of videos over time. For this reason different elements of the metadata are explored and examples are given from the dataset that indicate popularity.

Data Retrieval

First, the YouTube API was used to query lists of videos from the following categories: most views, top rated, recently featured, watched on mobile, most discussed, top favorites, most linked, most responded and most recent videos.

Second, both Google and YouTube were queried in order to determine additional metrics including the number of results available for the YouTube video ID within the Google search interface and the inbound link data provided from the YouTube interface when viewing details for a particular video.

Creating the Database

All videos were collected based on category and stored with a date and the category of origin. The resulting database has the same number of records every day (N per category) and allows for category overlap. The results look like this: Video, pubdate, duration, author, category, ghits, views, favs, rating, votes, via, timestamp, title, href, ytqty, ytlinks, keywords (kw2-kw35);

Visualizing the Matrix of relations within the Database

Network graphing with ReseauLu results in niche network profiles that identify the distribution of authors, genres and linking sites in a given time period

Executive Summary of Findings

The general findings are similar week over week: user-generated feedback counts for boosting popularity understood as viewership over time. Many well-viewed videos score high on at least one user-generated metadata variable in addition to raw viewership. It seems that corporate accounts (for example film and music companies) make up the bulk of high views accompanied by low user-generated feedback. The more user-generated accounts are also linked to more user-generated indicators of popularity. Cross-feed posting activity is more frequent than cross-genre posting activity at the top end of viewership for the week. Watched_on_mobile seems again to be an isolated / unique feed.


Figure 1.

Genre by Feed

2, 3

Figure 2.

Top Genres


Figure 3.

Top Link Providing Websites


This research was made possible by the Ontario Centres of Excellence, Government of Ontario University – Industry Partnership Program, as well as by the Social Sciences and Humanities Research Council Doctoral Fellowship Program, Canada.