SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Literature Review
  5. Theoretical Framework
  6. Research Questions and Hypotheses
  7. Method
  8. Data Analysis: Results of Hypotheses Testing
  9. Discussion and Conclusion
  10. Acknowledgements
  11. References

This study examines the design and evaluation of a key-frame arrangement framework for video storyboard surrogates. First, we constructed an arrangement model of keyframes for a storyboard based on the Preserved Context Indexing System (PRECIS) and a narrative theory. Second, we evaluated the model using six sample videos and 26 participants, and then analyzed the results using a t-test and repeated-measure MANOVA. The results revealed that structurally arranged storyboards (STR) following the arrangement framework are not effective for either summarizing or indexing videos than sequentially arranged storyboards (SEQ) and that the repetition of viewing storyboards is effective for summarizing (not indexing) videos. The order of viewing two storyboards, regarding which of STR or SEQ comes first, is also crucial for improving subjects' summarizing ability. When subjects are viewing STR after SEQ, they summarize videos more effectively than viewing SEQ after STR.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Literature Review
  5. Theoretical Framework
  6. Research Questions and Hypotheses
  7. Method
  8. Data Analysis: Results of Hypotheses Testing
  9. Discussion and Conclusion
  10. Acknowledgements
  11. References

Video surrogates have been used to make a relevance judgment or identify the contents of a video. Video surrogates can be textual, visual, aural, or any combination of the three. Many researchers are concerned about whether visual surrogates are sufficiently useful for summarizing or indexing videos. Previous studies have already verified that these three types of surrogates indeed have their unique roles and that visual surrogates (hereafter, ‘storyboards’ or specially arranged key-frames) are most effective when combined with text or audio. It is right time for researchers to consider ways in which to enhance the quality of storyboards. There may be at least two tasks at hand: designing an algorithm of extracting key-frames representative of the theme of a video and designing a model for display key-frames in a particular way. This study aims to design a model for displaying key-frames, to evaluate the model, and to suggest how to improve digital video library systems where users browse or search videos more effectively.

Literature Review

  1. Top of page
  2. Abstract
  3. Introduction
  4. Literature Review
  5. Theoretical Framework
  6. Research Questions and Hypotheses
  7. Method
  8. Data Analysis: Results of Hypotheses Testing
  9. Discussion and Conclusion
  10. Acknowledgements
  11. References

The methods used for the control and access of still and moving images have been grounded in conventions for textual classification and retrieval (Goodrum, 2001). Yang and Marchionini (2004) found that users liked to see visual surrogates for relevance judgments, especially those surrogates that contained motion, although topicality that belongs to textual relevance criteria was still considered the most important criteria for video relevance judgments.

The individual roles that textual and non-textual video surrogates play were investigated by various studies. Wildemuth et al. (2002) argued that textual video surrogates can facilitate the process of determining relevance, and non-textual video surrogates can effectively augment textual surrogates. Hughes et al. (2003) suggested that textual surrogates appear to transmit information regarding the contents of the video, while non-textual surrogates appear to transmit information regarding what the video was like. More recently, Song and Marchionini (2007) compared the effectiveness of three different surrogates: visual alone (storyboard), audio alone (spoken description), and a combination of video and audio (a storyboard augmented with spoken description). The study showed that combined surrogates are more effective and, hence, strongly preferred. Song and Marchionini also demonstrated that the use of only oral descriptions lead to better comprehension of the video segments than do only visual storyboards; however, people prefer to have visual surrogates and use them to confirm interpretations and add context.

Next, we reviewed some previous studies that conducted tests to investigate whether visual storyboard surrogates can be effectively used as sources for indexing and searching a video. Stachowicz (2002) investigated whether an indexer using automatically generated storyboard surrogates could effectively assign subject keywords to digital videos. The results showed that the time consumed in indexing three videos based on their surrogates was 82% less than that consumed in downloading and indexing the three complete videos; this sacrificed only 6% less retrievability for the three indexed videos based on their surrogates. Using 12 sample videos and 14 participants, Kim (2007) examined whether visual storyboard surrogates are sufficiently useful to be utilized as sources for indexing and searching a video. The study showed that the effectiveness of video storyboards as sources for both video indexing and searching varies according to the type of video: If a video conveys its meanings primarily through images, then its effectiveness will be high; if a video conveys its meanings primarily through narration, then its effectiveness will be low.

Lyer and Lewis (2007) investigated the ability of video storyboards to summarize and communicate the themes of arts-related videos. The results presented the importance of storyboards as surrogates for videos; 75% of the responses indicated that the participants considered storyboards to be useful in deciding whether to view a full-length video. However, they questioned the linear sequence and narrative structure of a storyboard that has no thematic links between key-frames. Thus, they proposed a model that improves the storyboard's ability to communicate the essential message of videos. Even though there has been active and productive research conducted on visual storyboard surrogates, few studies have experimentally proven whether the order of the key-frames in a storyboard affects its ability to summarize and index a video.

Theoretical Framework

  1. Top of page
  2. Abstract
  3. Introduction
  4. Literature Review
  5. Theoretical Framework
  6. Research Questions and Hypotheses
  7. Method
  8. Data Analysis: Results of Hypotheses Testing
  9. Discussion and Conclusion
  10. Acknowledgements
  11. References

Most video storyboards–that are composed of key-frames selected from a video and arranged in chronological order–have a linear structure. We assumed that video storyboards with a nonlinear structure, in which key-frames are classified according to semantic context, are more effective than those with a linear structure, in identifying the basic content of a video both for users' relevance judgments and for indexers' decisions. Thus, we designed a key-frame arrangement framework that enabled storyboard key-frames to be restructured in a meaningful pattern for users, as shown in Table 1. The model has three layers: the first layer consists of key-frames regarding the video's central theme; the second layer includes key-frames containing background information; and the third layer consists of key-frames pertaining to attributes, locations, and time periods.

PRECIS has been used as a theoretical basis for reordering key-frames according to contextual relationships. Next, we determined the classification schemes that are suitable for structuring key-frames. We used concepts from narrative theory to create the classification categories. Each narrative has two parts: a story, the content or chain of events (actions, happenings), plus what may be called the existents (characters, items of setting); and a discourse, that is, the expression, the means by which the content is communicated (Chatman, 1978). He mentioned that the story is the what in a narrative that is depicted, discourse the how. Leaving the role of discourse for a story aside, his argument regarding the structure of a story was quite simple, that is, a story equals events plus characters plus setting (background).

We began to adapt the three elements of the narrative content. From among these elements, we maintain the scope of the events and background. In order to include objects (inanimate and animate) as well as characters, as subjects of the event, we extend the scope of the characters' category to that of the agents' category. In the background category, we assigned location that is defined as geographical settings and period that refers to time periods. Additionally, we included the process, tools, and methods that can be used to describe the events and agents in detail, while narrators and interviewees that are not suitable to the agents' category are assigned to the background category.

Table 1. Key-frame Arrangement Framework
Events
(key events [RIGHTWARDS ARROW] peripheral events)
Agents (characters, objects)
(key agents [RIGHTWARDS ARROW] peripheral agents)
Background
(space/location [RIGHTWARDS ARROW] period [RIGHTWARDS ARROW] process/tools/methods [RIGHTWARDS ARROW] narrators (or interviewees))

Research Questions and Hypotheses

  1. Top of page
  2. Abstract
  3. Introduction
  4. Literature Review
  5. Theoretical Framework
  6. Research Questions and Hypotheses
  7. Method
  8. Data Analysis: Results of Hypotheses Testing
  9. Discussion and Conclusion
  10. Acknowledgements
  11. References

Key-frames in a storyboard may be arranged either according to the sequence the original author use to arrange shots in a video or according to the structural features of the story in the video as was shown above in Table 1, ‘Key-frame Arrangement Framework.’ Here, the former will be called ‘sequentially arranged storyboards or SEQ’, while the latter ‘structurally arranged storyboards or STR’. Thus, our first research question is “Which of the two types of storyboards, SEQ or STR, increases users' ability in summarizing or indexing the content of a video, without viewing the whole text of the video?” Two hypotheses from this research question are as follows:

Hypothesis 1 (H1): The structurally arranged storyboards (STR) will be more effective than the sequentially arranged storyboards (SEQ) for increasing users' ability for summarizing videos.

Hypothesis 2 (H2): The structurally arranged storyboards (STR) will be more effective than the sequentially arranged storyboards (SEQ) for increasing users' ability for indexing videos.

Next, our concern is regarding the repetition effect of storyboard viewing in summarizing and indexing a video. We recognized that while browsing a video, most users view the video repeatedly. Under this circumstance, we assumed that it will be more effective for users to view a storyboard twice than once. Furthermore, we also thought that it will be even better for users to browse two different types of a storyboard (SEQ and STR) successively than to view the same type of a storyboard repeatedly. This reasoning leads us to the third and fourth hypotheses.

Hypothesis 3 (H3): Browsing two storyboards regardless of the types will be more effective for users than viewing only one storyboard in summarizing a video.

Hypothesis 4 (H4): Browsing two storyboards regardless of the types will be more effective for users than viewing only one storyboard in indexing a video.

The third research question is regarding the order effect of which of two storyboard types comes first on subjects' summarizing or indexing ability for videos. It may be predicted that browsing the sequentially arranged storyboards before browsing the structurally arranged storyboards (SEQ-to-STR) will be more effective than browsing the two types in the reverse order (STR-to-SEQ). We will return to this line of reasoning later in the discussion section after hypothesis testing. Based on the third research question, the fifth and sixth hypotheses were developed.

Hypothesis 5 (H5): Browsing storyboards in the order of SEQ-to-STR will be more effective for increasing users' ability in summarizing videos than in the reversed order.

Hypothesis 6 (H6): Browsing storyboards in the order of SEQ-to-STR will be more effective for increasing users' ability in indexing videos than in the reversed order.

Method

  1. Top of page
  2. Abstract
  3. Introduction
  4. Literature Review
  5. Theoretical Framework
  6. Research Questions and Hypotheses
  7. Method
  8. Data Analysis: Results of Hypotheses Testing
  9. Discussion and Conclusion
  10. Acknowledgements
  11. References

Participants and Sample Videos

We recruited 26 participants from two universities: eight undergraduate and ten graduate students majoring in Library and Information Science from Myongji University and 8 undergraduate students majoring in Communication from Pukyong National University. We used six videos from the Open Video Project repository located at http://www.open-video.org. The videos were selected in our previous study (Kim, 2007) wherein 12 sample videos were classified under two groups: Group 1 consisted of six videos that best convey their meanings through images; Group 2 comprised six videos that best expressed their meanings through narration or through a combination of imagery and narration. For this study, as shown in Table 2, we used the six videos belonging to Group 1, because the mean accuracy values for Group 2, obtained from experimental tests used to summarize and index a video from the surrogate, were very low.

Table 2. Sample Video List
Video No.Segment TitleDuration (m:ss)
1Food Preservation04:27
2Earthquake02:36
3First Flying Machines04:18
4Computer Rage02:51
5Ubiquitous Computing in the Living Room04:02
6Automobile Tomorrow10:00

Creation of Storyboard Interfaces

We developed an HTML interface that contained two versions of the storyboards. The top part of Figure 1 shows the list of the six sample videos. As shown in the Figure 1, the sequential storyboards of three videos (videos 1, 3, and 5) were shown first, followed by the viewing of their structural storyboards. For the other videos (2, 4, and 6), their structural (or non-sequential) storyboards were shown first, followed by their sequential storyboards.

Each video was identified only by a video number, and it had two storyboards. In order to generate the sequential storyboards, we used the same storyboards as those created by the Open Video Project team, although we excluded some key-frames with captions (or redundant key-frames) in the storyboards, because we wanted to focus on the expressive power of the images in the key-frames. In order to extract the keyframes from video files, the team used a hybrid; they manually selected the most representative key-frames from the candidate key-frames that were extracted automatically on the basis of large intra-frame differences (Marchionini, Wildemuth and Geisler, 2006). Next, in order to construct the structural storyboards, we classified the same key-frames as those utilized in the sequential storyboards into three categories (events, agents, and background) and then displayed them. The bottom part of Figure 1 shows the sequential and structural storyboards of Video 1.

thumbnail image

Figure 1. Storyboard Interface

Download figure to PowerPoint

Test Procedures

We asked the participants to summarize and index the two versions of the storyboard for each of the six videos; then, we compared the test results for the two storyboards. The tests were independently conducted in the two university computer labs. The testing consisted of tasks to be performed on the pilot system, and it was conducted in two steps:

Orientation Session: An orientation session was held for approximately 25 minutes. The participants were given information on the purpose of the study, the pilotsystem, the logistics of the testing sessions, and the tasks to be performed.

Task Performance: The procedure of the test was as follows. First, the participants summarized and indexed Video 1 after viewing its 1st storyboard, and this was submitted to us. The same process was followed for the 2nd storyboard of Video 1. The participants performed the same tasks for the remaining five videos. We allowed the participants to summarize and index each version of each video within 8 minutes (16 minutes for both versions); thus, they took a total of 96 minutes to complete all the tasks for the six videos.

Operational Definition

For the test analysis, after viewing the six full-length videos, we assigned keywords to each video. These were either collected from the Open Video Project site or new keywords were added, if necessary. The number of keywords ranged from four to 15 terms for each video. Meanwhile, based on the abstracts constructed by the Open Video Project team, we constructed the summaries of the six videos.

Written Gist Determination: After viewing each storyboard, the participants wrote a summary of the video. We requested them to write a summary consisting of a maximum of three sentences. The summaries assigned by the participants were compared to those assigned by the researchers, and then, independently scored by two researchers in the range of 0.00–1.00 (e.g., 0.25). Next, an average of the two scores was calculated for each trial in order to yield the final scores.

Subject Keyword Assignment: The participants assigned four terms (or phrases) to each digital video. Then, the terms assigned by the participants were compared to the uncontrolled index terms assigned by the researchers; if only one term out of four terms assigned by a participant matches with the term assigned by the researchers, then the participant's score for the video is 0.25, while if three terms out of four terms assigned by the participants match with the terms by the researchers, then the participant's score for the video is 0.75 (=0.25 × 3).

Data Analysis: Results of Hypotheses Testing

  1. Top of page
  2. Abstract
  3. Introduction
  4. Literature Review
  5. Theoretical Framework
  6. Research Questions and Hypotheses
  7. Method
  8. Data Analysis: Results of Hypotheses Testing
  9. Discussion and Conclusion
  10. Acknowledgements
  11. References

In order to test H1 and H2, we used a paired samples t-test. The significance level (p) or sig for tests is 0.05; if p is smaller than 0.05, the finding is statistically significant and null hypotheses are rejected. SPSS was used for the data analysis. Table 3 illustrates the results.

Table 3. Results of Paired Samples t-test
Match RatesSEQ (n=78)STR (n=78)tp
Summarizing0.42(0.22)0.42(0.24)0.410.97
Indexing0.45(0.22)0.46(0.23)−0.330.74

Hypotheses 1 and 2: H1 states that the structural storyboards (STR) based on an arrangement model will be more effective than the sequential storyboards (SEQ) for summarizing a video. The summarizing match rates (78 cases, 3 cases for each participant) obtained from viewing the sequential storyboards first were compared with those (78 cases) obtained from viewing the structural ones first. As shown in Table 3, the mean match rate (0.42) for the sequential storyboards was the same as that of the structural ones (0.42); thus, H1 was rejected by the t-test (t = 0.41, p = 0.97 (>0.05)).

H2 states that the structural storyboards will be more effective than the sequential ones for indexing a video. The indexing match rates (78 cases, 3 cases for each participant) obtained from viewing the sequential storyboards first were compared with those (78 cases) obtained from viewing the structural ones first. As shown in Table 3, the mean match rate for the structural storyboards (0.46) is not higher than that for the sequential storyboard (0.45); thus, H2 was rejected by the t-test (t = −0.33, p = 0.74 (>0.05)). We can conclude that there is no difference in the summarizing and indexing match rates of the sequential and structural storyboards. In other words, it means that viewing structurally arranged storyboards once did not affect the subjects' ability for summarizing or indexing the video contents.

Our research questions 2 and 3 regard the repetition effect of viewing storyboards more than once and the order effect of viewing two storyboards in a row either in the order of SEQ-to-STR or STR-to-SEQ. To test H3, H4, H5 and H6, we used a repeated-measure MANOVA and administered follow up tests (t-tests) in order to closely analyze the MANOVA results. Table 4 illustrates the descriptive statistics of the match rates, while Table 5 and Table 6 show the results of repeated-measure MANOVA tests for summarizing and indexing, respectively.

Table 4. Descriptive Statistics (Mean and S.D. of Match Rate)
Thumbnail image of
Table 5. Repeated-measure MANOVA Test (Summarizing)
Thumbnail image of
Table 6. Repeated-measure MANOVA Test (Indexing)
Thumbnail image of

Hypotheses 3 and 4: H3 states that browsing the two storyboards successively will be more effective for users to summarize a video than viewing only one video storyboard. Table 5 shows that the repeated browsing is statistically significant (F = 5.79, p = 0.02 (<0.05)), which means that viewing video storyboards repeatedly is more effective than browsing them once in summarizing a video. Thus, H3 was validated. H4 regards the repetition effect of browsing two storyboards for indexing task. Table 6 shows that the repetition effect (F = 2.11, p = 0.15 (>0.05)) is not statistically significant; thus, H4 was rejected. Table 5 and Table 6 show that the order effect is not significant either for summarizing (F = 2.80, p = 0.10 (>0.05)) or for indexing (F = 0.03, p = 0.86 (>0.05)).

Hypothesis 5 and 6: H5 states that browsing storyboards in the SEQ–to-STR order will be more effective than browsing storyboards in the reversed order, STR-to-SEQ, in summarizing a video. There is very strong interaction effect between repetition and order, as shown in Table 5 (F = 14.36, p = 0.00 (<0.05)). We administered follow up tests (t-tests) to closely analyze the MANOVA results. The results indicated that the repetition variable was effective only when combined with the SEQ-to–STR order (t = − 3.02, p = 0.00 (<0.05)), whereas it is not effective when combined with the STR-to– SEQ order (t = 0.81, p = 0.42 (>0.05)); the mean match rate (0.40) of the 2nd browsing was lower than that (0.42) of the 1st browsing in the STR-to–SEQ order. Therefore, it can be concluded that the repeated viewing of storyboards is effective in summarizing a video only when the users browsed two storyboard versions of a video in an order of SEQ-to–STR (browsing the sequentially arranged storyboard first, followed by browsing the structurally arranged one). Thus, H5 was validated. H6 states that browsing storyboards in the order of SEQ-to-STR will be more effective for increasing users' ability in indexing videos than in the reversed order. Table 6 shows that the three independent variables are not significant; thus, H6 was rejected.

Discussion and Conclusion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Literature Review
  5. Theoretical Framework
  6. Research Questions and Hypotheses
  7. Method
  8. Data Analysis: Results of Hypotheses Testing
  9. Discussion and Conclusion
  10. Acknowledgements
  11. References

We assumed that the sequential storyboard is constructed according to authors' own structure, while the structural storyboard is restructured in a meaningful pattern for viewers'. Thus, we proposed that the structural storyboards are more effective for increasing viewers' ability in summarizing or indexing the content of videos than the sequential ones, but the hypotheses were not accepted. The meaning of the repetition effect is straightforward. The more one views storyboards repeatedly, the more effective one is in summarizing a video. But, why not in the indexing task? Another point we should consider seriously is the interaction effect of the repetition and the order. The results showed that viewing two storyboard versions of a video in an order of SEQ-to-STR (browsing its sequential storyboard first, followed by browsing its structural one) is more effective in terms of improving its ability to summarize a video. Why is that so? If we follow the reasoning line of the arrangement model, where authors share the structures of viewers', then it may be possible that viewers use keyframes and their structure or the relations of key-frames differently at the different time of viewing sessions. At the first time, they can only view key-frames and at the second, they can further arrangement key-frames according to their own structural framework in their own thinking. The order of SEQ-to-STR may match the order of gathering keyframes first and then structuring those key-frames later, whereas that of STR-to-SEQ may not match the order. However, we need to test this line of thinking by deriving hypotheses in detail and designing experiments in a more vigorous way.

One implication of our finding is how the structural feature of storyboards can be used in the video digital libraries. During the tests, we observed that most participants viewed the storyboards repeatedly. Thus, it will be much efficient to arrange key-frames differently at each time when they browse storyboards. Users can view two storyboard versions of the same video, which will increase the effectiveness of visual surrogates. In other words, users can view the sequential storyboard as the default, and then view the structural storyboard if they wish to view the video again. Structural features of storyboards will enhance the speed and efficiency of image-based queries in the same way as index files for text-based queries. For instance, if a video retrieval system receives image-based queries, then, after classifying the queries into three categories of the key-frame arrangement model, the system can match each image-based query with its corresponding category, thus improving search efficiency.

Acknowledgements

  1. Top of page
  2. Abstract
  3. Introduction
  4. Literature Review
  5. Theoretical Framework
  6. Research Questions and Hypotheses
  7. Method
  8. Data Analysis: Results of Hypotheses Testing
  9. Discussion and Conclusion
  10. Acknowledgements
  11. References

This work was supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD, Basic Research Promotion Fund) (KRF-2007–327-H00017)

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Literature Review
  5. Theoretical Framework
  6. Research Questions and Hypotheses
  7. Method
  8. Data Analysis: Results of Hypotheses Testing
  9. Discussion and Conclusion
  10. Acknowledgements
  11. References
  • Chatman, S. B. (1978). Story and discourse: narrative structure in fiction and film. Cornell University Press.
  • Goodrum, A. (2001). Multidimensional scaling of video surrogates. Journal of the American Society for Information Science and Technology, 52 (2): 174182.
  • Hughes, A. et al. (2003). Text or pictures? An eye-tracking study of how people view digital video surrogates. In Proceedings of CIVR 2003 (pp. 271280).
  • Kim, H. H. (2007). An experimental study on the effectiveness of storyboard surrogates in the meanings extraction of digital video. Journal of the Korean Society for information Management, 24 (4): 5372.
  • Lyer, H. and Lewis, C. D. (2007). Prioritization strategies for video storyboard keyframes. Journal of the American Society for Information Science and Technology, 58 (5): 629644.
  • Marchionini, G., Wildemuth, B. M., and Geisler, G. (2006). The Open Video Digital Library: A Möbius strip of research and practice, Journal of the American Society for Information Science and Technology, 57 (12): 16291643.
  • Song, Y. and Marchionini, G. (2007). Effects of audio and visual surrogates for making sense of digital video. Proceedings of CHI 2007, San Jose, CA, USA. pp. 867876.
  • Stachowicz, C. (2002). The effectiveness of storyboard surrogates in the subject indexing of digital video. Master's Degree Thesis, Univ. of North Carolina, Chapel Hill.
  • Wildemuth, B. M. et al. (2002). Alternative surrogates for video objects in a digital library: Users' perspectives on their relative usability. Proceedings of the 6th European Conference on Digital Libraries (pp. 493507), New York: Springer.
  • Yang, M. & Marchionini, G. (2004). Exploring Users' Video Relevance Criteria – A Pilot Study. Proceedings of the Annual Meeting of the American Society of Information Science and Technology (pp. 229238), Nov. 12–17, 2004. Providence, RI.