The Value of Web Log Data in Use-Based Design and Testing
Address: 9000 South Rita Road, Tucson, AZ 85744 Tel: 520-799-2938
Address: 110 8th St., Troy, NY 12180 Tel: 518-276-2557
Web-based logs contain potentially useful empirical data with which World Wide Web (Web) designers and design theorists can assess usability and effectiveness of design choices. Most Web design guidelines from artistic or usability principles feature no empirical validation, while empirical studies of Web use typically rely on observer ratings. Web server logs and client-side logs can provide naturally-occurring, unobtrusive usage data, partially amenable to normative use assessments but particularly useful in experimental research comparing alternative Web designs. Identification of types of Web server logs, client logs, types and uses of log data, and issues associated with the validity of these data are enumerated. Frameworks that outline how sources of use-based data can be triangulated to assess Web design are illustrated. Finally, an approach to experimentation that overcomes many data validity issues is presented and illustrated through a pilot experiment that used server logs to compare user responses to frames, pop-up, and scrolling arrangements of a single Web site.
The ways in which users interact with a World Wide Web (Web) site provide potentially valuable data on the usefulness and effectiveness of Web design elements and content. Numerous sources offer rules and axioms for Web page design, derived from rhetoric, visual communication, cognitive science, and usability studies. While these guidelines seem to constitute the “gold standard” for design decisions, few if any have verified these recommendations using large-scale empirical methods. Yet the log files recorded by Web servers, and client logs, offer potentially useful data about users' Web site interactions. These data may be studied to generate inferences about Web site design, to test prototypes of Web sites or their modifications over time, and to test theoretical hypotheses about the effects of different design variables on Web user behavior. At the same time, aspects of Internet connectivity such as dynamic Internet Protocol (IP) numbers, shared computers, and other issues limit the validity of normative interpretations about Web usage based on server log data alone.
While information pertaining to the previous observations has appeared in a variety of information sources, a synthesis is provided here, and is keyed to the features of commercially available logfile analysis programs. Moreover, we develop frameworks which extend the ability of analysts to utilize these data: Triangulating server data with client-side log data and other sources of information, or collecting server data in the context of formal experiments overcomes many of the problems described elsewhere. This article explores the use of Web site user interaction data or use-based data–Web server logs, client-side logs, and usability testing–in the context of Web site evaluation and design. We highlight both what can and cannot be learned from them. Finally, a simple experimental study is presented that illustrates how server log data can resolve conflicting hypotheses about some Web design features, showing significant differences in users' content-accessing behaviors as a result of different Web layouts.
Rationale: Assessing the Impact of Web sites
In their article “A Multivariate Analysis of Web Usage,”Korgaonkar and Wolin (1999) observed that at that time an estimated 55 million people surfed the Web, and that on-line traffic had been doubling every 100 days. Yet, according to Korgaonkar and Wolin (1999), many companies were disappointed with the Web and its commercial potential. Studies show that Web user purchases are less than expected. In order to remedy disappointing commercial results on the Web, they proposed, a better understanding of Web users is needed. Korgaonkar and Wolin's original research methods offered a fine-grained analysis of Web users' motivations, concerns, and demographics in the context of three uses of the Web: per day, the number of hours spent on the Web; percentage of time spent on the Web for personal versus business use; purchases made on the Web and the approximate number of orders placed on the Web. Such analyses begin to provide a use-based glimpse into the behavior of web users. However, such analytic data need not be restricted to large-scale trends; local, Web site-based data can be harnessed to provide particularistic views of the use of one's own Web site.
For example, a frequent goal for Web site owners and designers is to keep as many Web users as possible at their site for long periods of time and to have users navigate to numerous Web pages within their site. Of importance to Web site owners is “stickiness,” or the quality of a Web site that brings users to a site and keeps them there. Media Metrix tracks “sticky traffic,” which is the Web equivalent to the television rating company Nielsen Media Research. Media Metrix tracks 21,000 Web sites, and generates monthly reports based on the number of “unique visitors” at each Web site for a given reporting period (Larson, 1999). A more detailed understanding of user activities on a Web site is needed to understand what users find most useful and satisfying. “Stickiness” of a Web site is one indication of Web site popularity and usefulness. Yet stickiness, and other characteristics of web visitation, need not only reflect gross levels of popularity. The data from web traffic can also inform design and evaluation in more detailed ways, and is available to anyone with access to servers' internal files.
Web Design Guidelines: Rhetorical, Empirical, and “Click” Studies
What are the possible criteria, and sources of guidance to achieve those criteria, that pertain to the planning and design of Web sites? The key to good Web design is to understand Web site users and their tasks and to design to the needs and expectations of the user, and many Web design guidelines attempt to address these concerns. While this work may be considered a sub-field of usability, it tends to differ in two respects. First, much Web design guidance tends not to be presented in the context of use goals, but as universals. Second, there is often a less apparent basis in a research or disciplinary foundation in Web usability than in other usability sub-fields. Hence, much such advice is questionably based in traditional rather than empirical knowledge. Guidance on designing for the Web is available in a number of forms, from books to Web sites to corporate guidelines. The spirit of these guidelines is to increase consistency, predictability, and ease of use of Web based user interfaces. As often as not, guidelines emerge through commonly used graphical user interface elements that become standard, user testing, and designer and programmer preferences.
Among the most popular of such sets of guidelines is Jakob Nielsen's. Nielsen, a highly respected Web user interface designer, provides valuable Web design guidance in his monthly “Alertbox” column. He published two columns, “Top Ten Mistakes in Web Design” (Nielsen, 1996) and “The Top Ten New Mistakes of Web Design” (Nielsen, 1999), which describe mistakes commonly made by Web designers. Among the mistakes Web designers should avoid include (1) page elements that are in a constant state of animation, (2) long scrolling pages, (3) non-standard link colors, (4) long download times, (5) launching new browser windows, (6) lack of author information, (7) moving pages to new URLs, and (8) designs that look like advertisements.
Nielsen also offered “Ten Good Deeds in Web Design” (Nielsen, 1999) in which he outlined “dos” of Web design. These include (1) placing a name and logo on every page, (2) providing search on a Web site with more than 100 pages, (3) using simple headlines and page titles, (4) using a page structure that makes scanning easy, (5) providing pages that are accessible for disabled users, and (6) following what large Web sites are doing.
Even though Nielsen (1999) recommends that Web designers follow what large Web sites are doing, he observes that the most commonly used design elements on Web pages may not be the most usable. However, while a design element may not be the most appropriate for the situation, users will expect to see design elements that they have learned how to use through their experiences with larger Web sites. Such conventions include using blue hypertext links (blue text reportedly reduces reading speed), the use of horizontal tabs across the top of the screen to indicate main topics (because tabs are meant to represent different views of the same information), or a colored stripe to indicate primary navigation elements on the left side of a Web page.
In addition to advice derived from usability principles and rhetorical schemes, a small number of empirical studies on Web use suggest alternative design decisions, as well. For instance, Spool, Scanlon, Schroeder, Snyder and DeAngelo (1997) found that when users are looking for information, they are focused and that design approaches that are meant for “surfers” (e.g., advertisements) are distracting for information seekers. In contrast, Catledge and Pitkow (1995) determined that a hierarchy of information or database search might work for the goal-oriented user, but that these methods may prove frustrating to a user whose desire is to happen across unexpected information. In their study of user Web behavior, Catledge and Pitkow (1995) found that eighty percent of requests from a server were of type http (as opposed to other protocols such as ftp). They also found that the navigation method preferred by users were hyperlinks, where hyperlinks made up 52% of all requests for documents. Therefore, Catledge and Pitkow (1995) determined that to a great degree, hyperlinks are the preferred method of Web site navigation. Second to hyperlinks in popularity among users was the browser “Back” command, which accounted for 41% of all requests for documents. The researchers determined that the users in their study interacted in a small area of a Web site and frequently backtracked, as evidenced by the use of the “Back” command. They also found that users typically navigate two levels within a site before they return to the point at which they entered the site. Therefore, the “Back” button was an important navigation tool for users. Catledge and Pitkow (1995) offered the following design advice from their study:
- •Important information should be located within two to three clicks from the home page, given that users accessed an average of 10 pages per Web server
- •Too many links on a Web page may increase the time it takes users to find the information they desire
- •Groups of related information should be used, given that users interact with small areas of a Web site
Tauscher and Greenberg (1997) reported on the patterns of user revisitation to Web pages. They found that users often revisit Web pages, with a recurrence rate of 58%. In interviews following their study, Tauscher and Greenberg learned that users revisit for several purposes, including (1) monitor changing information, (2) further explore a Web page, (3) use a “special purpose” page (e.g., a search engine page), (4) modify a page as its author, (5) access another revisited page. In contrast, people access new Web pages in order to (1) satisfy changing information needs, (2) explore a Web site, (3) visit a recommended Web page, (4) explore a page while browsing for another item.
Byrne, John, Wehrle, and Crow (1999) studied user tasks in the context of daily use of the World Wide Web. They built a “taskonomy” or a taxonomy of Web tasks. They established six general categories of Web tasks:
- •Use information: Tasks include most Web browsing actions and are defined as one or more tasks in which a user attempted to make use of (read, listen to, view, watch, duplicate, download, display) information from the Web
- •Locate on page: Tasks where a user must find a link on a Web page in order to use information or go to a URL.
- •Go to page: Tasks include typeing a URL, using the Back or Forward button in a browser, etc.
- •Provide information: Tasks might involve the completion of a Web-based form.
- •Configure browser: Tasks where a user may resize a browser window.
- •React to environment: Tasks where a user may respond do a dialogue box that is displayed by the browser.
Byrne et al. expected a hierarchy of tasks but instead found a “flat” structure (any one of the above general task categories can have any other of the tasks as a subgoal). Byrne et al. concluded that the category with the most numerous events was Configure but that Use Information tasks accounted for the most time spent by users, followed by Locate.
The rationale for the Byrne et al. (1999) study was based on their observation that research into user navigation patterns on the Web, or “click-studies,” includes very little information about user tasks and user context. Their study, therefore, focused on user context and user tasks. From the results of their study, Byrne et al. concluded that users spend more time reading Web pages, visually searching Web pages, and waiting for Web pages to load than they do interacting with graphical user interface buttons and browser history mechanisms. It was not clear to the authors, however, if users did not interact with graphical user interface elements due to poor design. Byrne et al. also recommended the use of caching and rendering algorithms to improve performance and thus decrease Web page load times. The authors also found that users are willing to scroll and read long texts, although they cite a tradeoff between designing a Web page for reading (in the case of online documents that users find and are willing to read) versus scanning (in the case of users who are searching among online documents for a particular topic). Byrne et al. suggest that a task-oriented Web behavior study like their own could be combined with “click studies,” where the click studies would provide more detailed information on users interactions with Web pages (e.g., most frequently visited links on a page).
Each of the above approaches provides an example of using some form of analysis to articulate or to validate design decisions. At the same time, the actual browsing behavior of Web users creates numerous discernable data trails, speaking to a number of different variables, the inferences from which offer a rich source of usability information. Such data on the actual experience of users on a particular Web site are needed to validate Web design decisions and to enhance the design of a Web site to increase ease of use and business value. The following sections will explore the sources of Web site user behavior, or use-based, data. Three primary sources of Web interaction user data are considered: Web server logs, client-side logs, and usability data. Particular attention will be paid to Web server logs and their usefulness as a resource for Web page design and Web site usability assessment.
Sources of Web Use-Based Data
Web Server Logs
A Web server, according Webopedia (2001), is “a computer that delivers (serves up) Web pages. Every Web server has an IP address and possibly a domain name. For example, if you enter (a URL) in your browser, this sends a request to the server…. The server then fetches the page…and sends it to your browser.” Web server data are created from the relationship between a person(s) interacting with a Web site and the Web server. A Web server log, containing Web server data, is created as a result of the httpd process that is run on Web servers (Buchner & Mulvenna, 1998).
All server activity–success, errors, and lack of response–is logged into a server log file (HTTP-ANALYZE). According to Bertot, McClure, Moen, and Rubin (1997), Web servers produce and update dynamically four types of “usage” log files: access log, agent log, error log, and referrer log.
Access Logs provide the bulk of the Web server data, including the date, time, user's IP address, and user action (e.g., whether or not the user downloaded a document or image). The following is some of the information that can be obtained from an access log:
- aThe IP (Internet Protocol) address of the computer making the request for a document
- bThe time stamp (user access date and time)
- cThe user's request (e.g., html document or image requested, or data posted) (Bertot et al., 1997)
Agent Logs supply data on the browser, browser version, and operating system of the accessing user (Bertot et al., 1997).
Error Logs contain information on specific events such as “file not found,”“document contains no data,” or configuration errors; the time, user domain name, and the page on which a user received the error is recorded, providing a server administrator with information on “problematic and erroneous” links on the server (Bertot et al., 1997, p. 377). Other kinds of data written to the error log include stopped transmissions; information on “user-interrupted” transfers are recorded (e.g., a user might click the browser “Stop” button which would produce a “stopped transmission” error message; Bertot et al., 1997).
Referrer Logs provide information on what Web pages, from both the site itself and other sites, contain links to documents stored on the server. The log provides information such as the URLs of sites and pages on sites that referred visitors to a particular page. For example, users may often arrive at a particular Web site through a search engine, and the referring search engine along with the keywords used in the originating query, can be obtained from the Referrer log (Bertot et al., 1997).
Web server logs are stored, in general, in Common Logfile Format or Extended Logfile Format. Common Logfile Format includes date (date, time, and timezone of a request), client IP (remote host IP and/or DNS entry), user name (remote log name of a user), bytes transferred, server name, request (URI query), and status (http status code returned). Extended Logile Format includes bytes sent and received, server (name, IP address, and port), request (URI query and stem), requested service name, time taken for transaction to complete, version of transfer protocol used, user agent (the program making the request, such as a Netscape browser or a search engine “spider”), cookie ID, and referrer (Buchner & Mulvenna, 1998).
Web server logging tools, also known as Web traffic analyzers, analyze the log files of a Web server and create reports from this information (HTTP-ANALYZE). Bertot et al. (1997) suggest that Web server logs are “user-based measures of Web services” and can be used to “begin to understand the path users take through a server, the problems users encounter during a session, and technology users use while navigating a site” (p. 393). They believe that these data can be used in the planning and designing Web sites.
A Web Server Log Data Primer
The following discussion of server log data is divided into three groups: Navigation and Activity, Demographic, and Performance.
Navigation and Activity Server Log Data. Navigation and activity server log data provide information on user interaction aspects of a Web site i.e., navigation paths, number of accesses, time spent on a page, etc.
Various server logging tools provide an array of information about the activities of users on a Web site. The information collected by these tools, and the terms used to describe data, are not always consistent between server logging tools. Table 1 lists typical navigation and activity server log data collected by server logging software programs. A more detailed discussion of each log data item follows.
Table 1. Server Log Data Types
|Visitor||A unique Internet Protocol (IP) address.|
|Visit||A set of requests depicting all the pages and graphics seen by a unique visitor at one time.|
|Hit||Any file from a Web site that a user downloads.|
|Access||An entire page downloaded by a user regardless of the number of images, sounds, or movies.|
|Request||When a Web server is asked to provide a page, graphic or other object.|
|Path||How the user navigated through the site (e.g., entrance, intermediate, and exit points); the length of a user's session, specific location duration (e.g. time on a page), download times.|
|Entry page||The first page users actually access when entering a Web site (which may not be a home page).|
|Exit page||The last page users actually access before leaving a Web site.|
|Click-through||When a visitor to a different Web site clicks on an advertisement that ultimately redirects the visitor to the logged site.|
|Duration||Average time per visit or average time per page.|
|Downloads||Information on files that are downloaded from a Web site.|
|Browsers||Browsers used to access a Web site.|
|Errors||Errors associated with accessing a Web site.|
|Search engines||Search engines used that pointed to a particular Web site.|
Visitors and visits. A visitor may be defined as a unique Internet Protocol (IP) address. Although an IP address may represent one person only, an IP address is in many cases shared by more than one person (Accrue HitList, 1999).
A visit is a set of requests depicting all the pages and graphics seen by a unique visitor at one time. For example, a visitor to a Web site may go to eight HTML pages and in the process request fifteen graphics. In total, these 24 requests equal one visit. Note that the total number of visits is typically greater than the total number of visitors as each visitor can visit a Web site more than once.
Inferences about visits are imperfect, however, and visits are merely estimates: one cannot be certain that a series of requests is associated with one person, or the same person, within the same visit (Accure HitList). AccessWatch defines a visit as “a unique host active during the period of an hour.” Examples of unique hosts are an IP address such as 22.214.171.124 (AccessWatch). According to AccessWatch, this type of data provides an indication of the degree to which users are interested in a particular Web site.
Hits, Accesses, and Requests. According to Bertot et al. (1997), a hit is “any file from a Web site that a user downloads” (Bertot et al., 1997, p. 375) and an access (sometimes called a page view) is “an entire page downloaded by a user regardless of the number of images, sounds, or movies” (Bertot et al., 1997, p. 375). A user accesses one Web page only of a Web site even if that downloaded page has a number of images on it (Bertot et al., 1997). Neither hits nor accesses represent unique users: many Internet Service Providers use proxy servers, which further complicates the situation because the Access Log will reflect the number of hits or accesses by a referring server, instead of by the number of users (Bertot et al., 1997).
Definitions of hits and accesses vary somewhat among server log analysis tools. Some server log analysis tools track “requests.” In the Web logfile analysis program Analog, a request is “when a Web server is asked to provide a page, graphic, or other object.” A request may be created by a “visitor going to a page or by the page itself requesting an object (usually a graphic)” (Analog). Analog distinguishes between requests (the number of transfers of any file type) and page requests (the number of transfers of HTML pages) (Analog).
Analog tracks the “success” of requests. Success is defined in terms of HTTP status codes i.e., status codes in the 200 range (meaning a document was returned) or with a code of 304 (a user could use a cached copy of a document so the document was not required from the server). Analog treats logfile lines with no status code as a success. A redirected request has a status code in the 300 range (with the exception of 304) and indicates that a user was directed to a file other than the file originally requested. A common use of redirected requests is for click-through advertising banners. HTTP status codes also fall in the 400 range (indicating an error in the request) or in the 500 range (indicating a server error). The most common failure in this range occurs when a file is read-protected or not found. HTTP status codes in the 100 range are information status codes and are rare (Analog).
The following represents a subset of the hit and access data provided by server log analysis tools: number of hits, number of visits per hour, visitor view of pages (AccessWatch), number of requests, average successful requests per day (Analog), average successful requests for pages per day, failed requests, redirected requests, distinct files requested (Analog), most requested pages, least requested pages (WebTrends), 7 most popular pages, average requests per day of the week (Accrue HitList), top 40 pages that were requested at least once (SurfReport), most common single page visits, and most popular directories, which could reveal the most requested information types (Accrue HitList). Virtually all of the log analysis programs provide a default listing (e.g. top ten requested pages) but allow customization if, say, the top 100 requested pages are of interest
Paths. Paths may be defined as “the average length of a user's sessions, specific location duration (e.g., average time on a page), average download times, and how the user navigated through the site (e.g., entrance and exit points)” (Bertot et al., 1997, p. 376). Server logging software tools provide the following output related to paths: unique paths, average path length (reported in pages), longest path (reported in pages) (SurfReport), previous pages viewed within a site–which may aid in determining how a user navigated to pages within the site–and jumps from the home page, which can indicate the most often used links (Accrue HitList).
Entry and exit pages. Entry and exit page data provide information on where users enter and exit a Web site. Log analysis output includes data such as top ten entry pages, top ten exit pages (SurfReport), most popular entry pages, and most common exit pages (Accrue HitList).
Click-throughs. A click-through occurs when a visitor to a different Web site clicks on an advertisement that ultimately redirects the visitor to the logged site. Web servers track click-throughs only when the HTML that contains the advertisement has been written so that a click does not go directly to another Web site but instead goes to an application on the site where the advertisement was displayed, and then to the final destination. Applications that handle these transactions are called redirection programs (Accrue HitList).
The Accrue HitList server log analysis product tracks “impressions,” or the number of times an advertisement is requested from a Web server. The number of impressions may be less than the actual numbers of times an advertisment was viewed by a visitor due to Web browser caching. Accrue HitList calculates the click-through rate as the number of click-throughs divided by the number of impressions. According to Accrue HitList, the click-through rate is an “indirect estimate” of the effectiveness of an advertisement.
According to Nielsen (1998), in October 1998 clickthrough rates had dropped to 0.5% (as reported by NetRatings). In his September 1997 Alertbox column, “Why Advertising Doesn't Work on the Web,”Nielsen (1997) stated that the value of Web advertising should be assessed in terms of new users who are brought to a Web site. Since clickthrough rates are so small, Nielsen concluded that Web advertising does not work well, and further that it will not be important to the future of the Web.
Duration. Duration may be defined in terms of average time per visit (Accrue HitList) or average time per page (Accrue HitList). Duration, however, may not describe interactive use of a Web site. Catledge and Pitkow (1995) note that users often leave a Web browser open and running for extended lengths of time without interacting with the browser. Also, a Web user may not look at an entire page.
Downloads. Server log data can provide information on items that are downloaded from a Web site. In the Accrue Hit List product, downloads can include, for example, .zip files and applications (e.g., .exe files). Download information that is tracked by Web server analysis tools includes visitor download (AccessWatch), and 7 most popular downloads (Accrue HitList).
Browsers. Server log data can provide information on Web browsers used to access a Web site, including browser software (Microsoft Internet Explorer, Netscape Navigator, Other) (AccessWatch), and most popular browsers (Accrue HitList)
Errors. Server logging tools can provide error information: status code report (e.g., “Access Forbidden”) (Analog); top 10 bad requests, and top 10 bad source pages (Accrue HitList).
Search engines. Server log data can indicate the use of search engines in the context of a particular Web site, indicating the following: percentage of traffic generated by search engine (based on a percentage of visitors and a percentage of visits), top 10 keywords used to find the site, top 9 search engines referring to the site (SurfReport), and most common search engine crawlers (robots) (Accrue HitList).
Demographic Server Log Data
Demographic data describes the “kinds of people” accessing a site. Examples of these data include accesses by domain, e.g., .org, .com, .jp (AccessWatch), most active organizations (as determined by IP address or domain name, e.g., aol.com), most active countries (as determined by the suffix of the domain name), new versus returning users (reported as a percent of the total number of visitor sessions) (WebTrends), visits by distinctly authorized users (requires a user id and password) (Wusage).
Performance Server Log Data
Performance describes the load on a Web server and the responsiveness of the Web server. Performance information can include: megabytes of information served by the site, page demand, defined as the average number of pages traversed and average time to download a given amount of information for a specified number of visitors within a specified amount of time (AccessWatch), average data transferred per day (Analog).
These types of data are often used for other types of analyses than Web site design issues. For example, McLaughlin, Goldberg, Ellison, and Lucas (1999) describe server log analyses that illuminated the kinds of users who visited, and the methods used to reach an on-line art museum. Their analyses provided the typical indications of how users connected to the Internet, their likely country of origin, and type of service provider, and length of user sessions. But of particular merit in this analysis were the correspondences drawn between publicity episodes about the site in off-line news releases (such as USA Today, the Los Angeles Times, and the Chronicle of Higher Education) as well as via on-line publicity sources. The analysts essentially triangulated known publicity occurrences against server logs, in order to assess the peaks and flows of visitation as a result of external exposure. Moreover, referrer logs were used to determine whether the majority of visitors came to the site via these publications' sites, or whether users were more likely to have found their way in through other linked sources or by typing the URL directly. While the validity of inferences about demographic trends face some serious concerns, the triangulation of internal with external event data is an exemplary strategy. These concerns, and other triangulation methods for usability, are taken up below.
Web Server Log Data Validity Issues
There are key issues associated with the completeness, accuracy and representativeness of server log data. These include caching and unique user identification. Due to these issues, it is suggested that Web server log data should be used for high level, general information (Linder, 1999). Two relatively safe conclusions from server logs are (1) that hits received were at least as many as what the server log revealed, and (2) each different site/machine listed in a server log reflects at least one unique user access–it is impossible to determine if the site/machine represents one user, or more than one user using the same site/machine to access a Web site (Linder, 1999).
Caching and Browsers
A Web browser may make what is known as a “conditional request” to the Web server. In a conditional request, the browser only requests a document or inline object from the server if a page is not already stored in the browser's “disk cache.” This method reduces network traffic. However, from a Web server logging perspective, pages that are served from the browser cache will not be recorded in the Web server log. Therefore, user data will not be captured in this situation (HTTP-ANALYZE).
Caching and Proxy Servers
Proxy servers are used by Internet service providers, and private and public institutions with a large user base, in order to protect a network from unauthorized parties, and/or to reduce network traffic (HTTP-ANALYZE). To reduce network traffic, pages that are requested and loaded into a browser, via a proxy server, are stored in the proxy server “disk cache.” The idea is that documents that are often requested by users may be accessed from the proxy server cache, rather than from the Web server where the document originally resided. As in the case of browser cache, from a Web server logging perspective, pages that are served from the proxy server cache will not be recorded in the Web server log. Therefore, user data will not be captured in this situation.
Internet service providers and private and public institutions with users who are located in a limited geographic area might consider disabling proxy servers so that they could more accurately track server usage via Web server logging tools. However, the decision to disable a proxy server would need to be weighed carefully against any resulting degradation in performance: if users experience long wait times in page downloads, users may abandon the use of a Web site.
Unique User Identification
Each unique IP address in a server log may represent one or more unique users (Linder, 1999). For example, if 300 hits are recorded from an IP from the .au domain, it is impossible to know if this is 300 hits from one person in Australia, or 300 people in Australia all requesting the same page. (Goldberg, 1999).
In addition, an IP address does not necessarily represent the same computer due to dynamic IP addressing via Dynamic Host Configuration Protocol (DHCP). According to Vicomsoft (2000), computers are manually assigned a permanent (fixed) IP address in traditional TCP/IP networks. However, when DHCP is used (for example, when users dial up their service providers via modem), computers are assigned IP addresses dynamically without manual intervention. However, IP addresses, whether fixed or dynamic, are not reliable sources of user identification because it is the computer, not the user, who is assigned an IP address and multiple users may use a single computer to access Web sites.
Client-side logging tools have been used to address some of the shortcomings of server logging data and tools. As Ellis, Jankowski, Jasper, and Tharuvai (1998) note, client-side logging tools can capture Web site navigation from cached documents, “overcoming some of the problems associated with analyzing standard web server logs” (p. 573).
Client-side logging tools are predominantly used as a means of collecting data in a controlled study environment, rather than in commercial applications. Etgen and Cantor (1999) developed the Web Event-logging Tool (WET) as an alternative to Web server log data, due to the following limitations of server log data: Web server logs do not collect data on client-side user interfaces, including Java applets and form element interactions; proxy server and browser caching impacts the validity of server log data. WET was designed to provide usability data on Web site use. It is currently considered as a complement to other usability testing data collection techniques, including usability tester notes that are collected manually. Another client-side tool, Listener, was also developed to capture client-side Web site usability data. Listener is designed to capture a user's navigation through a Web site through navigation elements such as links (Ellis et al., 1998). The benefits of Listener are described as follows:
- •Provides access to Web site user interaction behavior in the case when a usability tester does not have access to server logs
- •Records Web site user interactions that are not captured by Web server logs i.e., records actions on Web pages that would not be recorded in a server log due to caching
- •Listener will operate without an HTTP server connection, which could be an advantage for usability testers who do not have access to a Web server and/or network (Ellis et al., 1998)
Client-side logging tools have been used in an advantageous way to capture Web user interactions in a usability testing situation. Client-side logging tools provide more detail about user interactions with Web sites and also address the problem of Web server caching. This detail could include a user load of a page or a user click in a checkbox in a form and submission of the form (Ellis et al., 1998). Data gathered in client-side logs includes event date or time, type of event (e.g., load, click, submit), elements of an online form including source type (e.g., checkbox), source name (“Submit Now”) and source value, and event/source location (e.g., http:///orderstuff/order_form.html) (Ellis et al., 1998). Therefore, as Catledge and Pitkow (1995) note, “actual user behavior, as determined from client-side log file analysis, can supplement the understanding of Web users with more concrete data” (p. 1065). In other words, client-side log data could be used with server log data and other forms of usability data collection to provide a more complete description of user interaction on a Web site. However, a method to capture client-side interactions in a large scale, commercial setting has not been developed.
Validating Web Design
A combination of traditional usability testing techniques, client-side server logging and Web server logging may provide the best opportunity to understand how users interact with a Web site, what tasks users are trying to accomplish, and what improvements should be made to a Web site to increase ease of use. Given the validity issues noted above with regard to interpreting logs at face value, triangulation of such data would wisely be combined with traditional approaches to usability testing. These approaches might include (a) observation of user interaction on Web sites in a usability laboratory or in a field setting, using video and audio taping of free-form and/or predetermined tasks, recording navigation patterns and user comments for observers, who would note user interactions and analyze their notes and the recorded data for insights; and (b) remote evaluation of Web sites, using online questionnaires and telephone interviews. As Kanerva, Keeker, Risden, Schuh, and Czerwinski (1997, n.p.) suggest, “The ultimate success of (software) is difficult to measure in tangible, reliable behaviors like task time or number of errors. In addition to traditional measures, researchers have to make strong use of natural observation and subjective questionnaires.” Whether or not to use one or more of these sources of use-based Web site data depends upon the particular research question(s) at hand. The following section offers examples, for the purpose of illustration, of how Web site design principles can or cannot be validated using each of three types of use-based data: Web server logs, client-side logs and/or usability testing.
Server log data: Analyze data to determine if users abandon a frames-based homepage frequently or in short durations. Since users can access a site from any number of locations within the site, path data should also be used to determine if the homepage is most frequently found as the beginning of a user path through a Web site.
Client-side log data: Analyze path, link and element data to determine if a frames based approach produces longer or more indirect paths than a non-frames based approach.
Usability testing data: Observe users using a frames versus non-frames based approach to determine which approach is more appropriate and easier to use.
II. Provide easy access to information that is most frequently and most often used by users (Nielsen, 1996).
Server log data: Analyze what pages users access most often. Analyze paths to determine if the most efficient access to important information is available. Look at common entry pages–does this indicate frequency of use and/or importance? Compare paths between most frequently and least frequently used to determine if a particular design impedes access to less frequently accessed pages. Also note that in the server logging product Accrue HitList, the use of requests to determine the popularity of a site is not appropriate because a page with many graphics will generate more requests than a page with fewer graphics. Therefore, the number of visits or accesses (the number of HTML page requests) will provide a more precise representation of activity (Accrue HitList).
Client-side log data: Analyze what page elements users access most often. Analyze paths, links and elements to determine if the most efficient access to important information is available.
Usability testing data: Observe what users use most often. Ask users to rank tasks and information needs by importance and frequency of use.
III. Design a Web site so that multiple types of users, emphasizing the most typical users, can access and make use of the site.
Server log data: The Agent Log provides data on the type of browser and operating system of the accessing user. According to Bertot et al., (1997), this type of data can provide information on what a user, or group of users, can access on a site (e.g., Java). If the majority of users cannot access content on the Web site because their browser does not support the technology used to display the content, the site could be inaccessible to a large number of users. If it is found that a majority of the intended audience of a Web site cannot access content on that site, the site should be redesigned so that users can access the content based on technology that is supported by their browsers.
Usability testing data: Provides the best information regarding the most typical users and their tasks through audience definition and task analysis activities.
IV. Do not provide large files on a Web site that require long download times (Nielsen, 1996) nor make a Web site graphical at the expense of performance.
According to Nielsen (1996), human factors guidelines specify a 10 second maximum response time to mitigate the risk that users will lose interest in a Web site. Nielsen further states that 15 seconds may be acceptable given that users are accustomed to long download times on the Internet.
Server log data: According to Bertot et al. (1997), stopped transmission data from the server Error Log can indicate that there is a pattern to users stopping the download of large files.
Client-side log data: Analyze the types of links that users are typically using, i.e., are they text, graphic, or a combination?
Usability testing data:Spool et al. (1997) found no evidence that graphics helped users to retrieve information on a Web site. Spool et al. also found that “most users examined text links before considering image links” (p. 8). If there is a question of whether or not users prefer and are more successful with text versus graphic elements on a Web site, usability testing could include tasks and questions pertaining to text and graphic elements. If it is found that users prefer and are more successful with graphics elements, the size of graphic elements must be considered if they are to be included on a Web site.
Server log data: Analyze Error Log to determine if users are getting link errors due to pages that have been removed or moved. Look for common exit pages–are these linked with outdated information? In addition, according to Bertot et al. (1997), when changes are made to document links on a server, the Referrer Log may be used to contact frequent referrers so that user errors, generated by the selection of moved links, can be avoided.
VI. Do not open new browser windows (Nielsen, 1999).
New browser windows could be opened to display online help and/or to display additional information pertaining to a Web site.
Server log data: If the URL of the new browser window is different from the URL from which the new window was launched, the newly logged URL could be useful in determining if, for example, users select online help. However, if the URL of the new page is not different from the URL from which it was launched, this data could be misleading in a Web log because the data would appear as two accesses of the same Web page.
Client-side log data:Catledge and Pitkow (1995) found that the browser “Back” command accounted for 41% of all user interaction requests for documents. Nielsen (1999) reasoned that since the Back button is “the second most used feature” on the Web, users would be able to navigate among information without the use of new browser windows. Nielsen (1999) did not offer empirical data to support this claim.
Usability testing data:Nielsen (1999) found that users do not often notice when a new browser window is opened. This finding could be tested in a usability evaluation of a Web site. Users could be asked questions about content contained in launched new browser windows to both determine whether or not they noticed new browser windows and to determine the perceived value of the new content if users noticed new browser window launches.
Experimental Research and Web site Evaluation
A use-based approach to Web design should include testing throughout the lifetime of a Web site. The ways in which users interact with a Web site provide valuable data on the usefulness, value and appropriateness of Web design navigation elements and content. The previous sections of this paper discussed the importance of the Web, how users make use of Web sites, Web design guidelines and principles, and the sources of use-based Web data (Web server logs, client-side logs, and usability testing) that can be used to understand user interactions on Web sites. The discussion so far has focused primarily on descriptive, or “normative data” interpretations. That is, examination of hit frequencies, durations, paths, etc., have been discussed in terms of their (more or less) face-value indications about the normative patterns that accrue as users connect to any single Web site. Interpretations of such data, alone, can only be used with extreme caution regarding users' actual responses to any particular element of a Web design, as we have discussed; in using these data normatively, they must be supplemented with a variety of other forms of information in order to ascertain users' true reactions (Kanerva et al. 1997). However, alternative approaches to the use of such data can reduce the need for triangulation. Though this alternative will require other, additional efforts in data collection, the results will potentially be more clear and compelling. This approach has received scarce attention in the literature, yet may be of great use in Web site development or refinement.
Collecting and analyzing data within pre-constructed experimental research designs overcomes most of the validity issues that go along with the use of server logs and analysis programs for normative interpretations. In this section we will discuss the use of sever log data as dependent variables, in parallel-site experimental research. The following section outlines how sources of use-based Web data can be used to evaluate Web site design options through experimental research at all phases of design and revision.
Experiments Through the Web
While experimental research using the Web to study a variety of non-Web topics is gaining popularity, experimentation on Web sites themselves (i.e. Web site characteristics as independent variables) has been slow to follow. Numerous sources discuss the application of the Web as a means to collect questionnaire data in psychological and sociological survey research and experiments (e.g. Coomber, 1997; Pettit, 1999; Reips, 1995). Advocates of this approach observe that experimental stimuli or plain questionnaires can be reproduced and distributed via the multimedia and textual capacities of the Web, with little or no degradation in subjects' responses compared to traditional methodologies (e.g. Krantz, Ballard, & Scher, 1997; Stanton, 1998). They also note that the use of the Web is vastly less expensive, and reaches a potentially far greater sample size, than stimulus-response or questionnaire research using traditional media (from telephones to paper-and-pencil questionnaires; Schmidt, 1997; Watt, 1999); certainly these cost comparisons would pertain to log analyses versus more labor-intensive, observational usability approaches, as well. However, these methods of research administration are obtrusive, that is, subjects know they are being queried, which may skew accurate reporting, in some cases even moreso than traditional survey methods (Weisband & Kiesler, 1996). They have overlooked the most available, unobtrusive data available that describe the reactions of Web users to Web pages themselves: Web browsing behavior recorded by server logs, which offer direct measures of behavior that can be compared to detect empirical differences between versions of Web sites.
Experiments About the Web: Server Logs As Data in Unobtrusive Experimental Designs
We propose that during protoype development and refinement, or in hypothesis tests about variables in Web design, investigators create parallel Web sites reflecting those variations in design in which the designers are interested. For instance, if it is true that download speed, or intensive graphics, discourage users, yet if Web designers need a direct indication of whether such variables affect users' click-throughs or stops, parallel sites could be constructed which vary these attributes. The log data can be compared in order to test for significant differences between prototypes, rendering useful inferences about the direct effects of these design choices on specific user behaviors.
One way to facilitate the efficient deployment of such an experiment is to use a random redirect program on the home page of a site. A redirect script automatically sends the user to a different page. Thus, by including a redirect script on a home page, users can be sent to experimental pages transparently. By including a randomization script in the redirect routine, the home page can accomplish the random assignment of subjects to test conditions, which is the foundation of many experimental research designs. Users need not be given different URLs in order to be exposed to different prototypes and thus, users need not know that their reactions are solicited, preserving the unobtrusive nature of the research and accentuating data accuracy. A sample randomizing redirect script, allocating users to one of four prototypes, appears in Table 2.
Randomizing redirect script
A further advantage that this approach offers over normative data interpretations is that the various interpretive weaknesses to which normative interpretations are prone do not apply in this context. For instance, it does not matter if 100 hits to experimental Site A come from 49 different users or 78 different users, because an equivalent hundred hits to experimental Site B has the same probability of profile distribution. Whatever frailties occur in the measurement of one test site are assumed to occur in another; random error is cancelled out in equivalent designs when random assignment of subjects is employed. One complication might be in assessing return visits, since a returning user may not actually be redirected to the same site as s/he saw previously. However, once again, the effect that this problem presents in assessing Site A is equal to that for Site B, and balances out. When tests for the effects of differences between Web pages are conducted, it matters less who is connecting to the site, rather than, among those who do connect, what navigations or other browsing behavior they perform.
Such approaches could be developed to employ inferential statistics to compare path lengths, duration, aborts, and almost any other feature of experimental Web sites that may be of interest to designers. To do so requires that more than one prototype is employed, and that the various versions differ only on key, identified variables. For example, a z-test might determine, among Web design A, B, C, or D, which has the longest visitor duration. Or a t-test might reveal, among prototypes with and without consistent graphics on each page in the site, whether a longer path, with more steps through a site, is significantly more or less frequently taken due to graphics. Such experimental techniques, while previously employed in usability labs with small numbers of subjects and scored by human observers, can give way to large-scale research and development efforts that would be appropriate to test on large numbers of selected subjects (e.g. university students, corporate employees), or on the Web user population at large.
A simple experiment was conducted in order to demonstrate the potential of server log data to detect behavioral responses to Web design differences. While many potential causes and effects might be assessed, a simple comparison of potential differences in users' time and level of page requests due to differences in web layout was operationalized for the purpose of demonstrating the feasibility of this kind of analysis. Details of the experiment are presented below. It is important to note, however, that the kinds of analyses we have recommended above are made difficult due to the current state of web log data analysis software. Many such packages overly-interpret server log data, offering, for instance, averages of user behaviors such as requests per page, or “top ten” most requested (or least requested) pages or paths. Such finessing of the data hide the raw frequencies from which not only means but, more critically, variances may be derived. Work is currently underway to develop a new software tool that will render web log data in such a way that is more useful to statisticians.
Hypotheses. There are several basic Web site layouts available for design, and thinking is conflicted about which design facilitates or frustrates users. Optimally Web design should encourage, among other things, users to avail themselves of the content of a site. A common design scheme intended to facilitate the easy access of content involves the use of frames–dividing the browser window so that links providing a table of contents appear in one side which, when clicked, call up alternating content pages in the other side. Frames can be useful for the easy presentation of multiple documents (Engelfriet, 1997) using a functional organizing system for keeping track of several layers of content in a way that renders their existence persistent, and minimizes the number of keystrokes needed to move between content views. This perspective suggests that frames may allow a user to visit more content pages within a similar period of time than s/he would spend on a site organized differently. On the other hand, Nielsen (1996) and others recommend that designers avoid the use of frames. Frames confuse users, according to Nielson, and load slowly. Frames are often difficult to navigate in Web browsers designed for the visually-impaired (Accessible Web, 2001, although this can be overcome with embedded tags–McCathieNevile & Jacobs, 1999). If this is correct, users should be annoyed with frames and abandon a frames site more quickly than a similar site with some other organizational scheme.
Without frames, one may use a single, long page with all content on it, or a series of numerous, shorter linked pages that come up when clicked from a menu. One observational usability study claims to have found that fewer, longer pages may be “best” for users (User Interface Engineering, 1998), but not in what way. Yet other usability research found that users prefer “secondary navigation controls” such as a linked table of contents; such features made media usage easier to control and resulted in significantly less time to complete the tasks (Burton et al., 1999). Thus two nondirectional hypotheses were generated:
Ceteris peribus, a frames organization yields differences in (1) the amount of time users spend on a site, and (2) the number of content pages users request, compared to a Web site organized without frames.
Stimuli. Three versions of a single web site were constructed. In each version, the content of the site was identical; the sites differed only in their structural layout and linking style. The sites were based upon, and used the contents of, an actual university department's faculty biography pages. All versions were served on a Windows NT server operating under an “.edu” domain. Version 1 was a “pop up” version. The home page on this site was a menu featuring the names of all eighteen faculty members, with each name underscored and colored to indicate a link. When a user clicked on a name, a new window popped open. In each new window, a photograph of the respective faculty member appeared, along with a biographical description of that person's research and teaching interests, awards, and other information. While these pages were not standardized regarding content, their natural variability was preferred for the organic quality it lent to the research. A user could close the pop-up window, or ignore it and go on.
The second version was a typical “frames” layout. A left-hand pane listed all 18 faculty member names, while a right-hand pane was originally blank. By clicking on a name, the named person's biographical page loaded in the right-hand pane. Users needed to do nothing to remove one biographical page except to click on another name, which replaced the previous biographical page with another one.
The third version of the site was a single, long web page that featured consecutive biographical information and photo combinations, serially, one below the other, for each of the 18 faculty members. Users could navigate this page by scrolling down.
Since the primary data for the study were the web logs of user behavior, and since the research was authorized institutionally on the basis of complete anonymity, no demographic data were collected. Demographic log data indicated that most users connected to the study from the university, while others appeared to do so by cable modem service, commercial service providers, or the corporate sites of some of the distance education students. Sixty participants completed the study. The records of 3 participants were removed from analysis since, upon inspection of the logfiles, it appeared that they had technical difficulties or meandered through several versions of the site before exiting.
Analyses. Several commercially available applications were examined for their ability to provide raw data suitable for frequency analysis and inferential statistical comparisons. The most useful of the packages sampled was Funnel Web Enterprise, by Quest Software (http://www.activeconcepts.com/). Among the data Funnel Web can provide is a listing for each client, or visit to the site, of the path for that visit including page URLs that that client requested, and the exact date and time that the requests were processed. From these records, the number of seconds was calculated from the moment that each user moved between the home page, and the moment at which s/he requested the questionnaire page. Additionally, for the pop-up version and the frames version, the number of content pages (i.e. biographies) that each user requested were also counted. The content request data are not available for the single, scrolling-page version, since all biographies were simultaneously presented on a single page hit.
Results. The amount of time users spent on each version of the web site was examined using a oneway analysis of variance test. Although the scrolling-page version had the greatest average raw number of seconds (M= 266.42, SD= 189.54), no significant differences were detected between this version and the pop-up (M= 197.89, SD= 179.96) or the frames version (M= 183.47, SD= 202.81), F (2, 54) =1.02, p= .37. Even very liberal post hoc LSD tests detected no differences between conditions, although the high variances may have hindered this analysis. Hypothesis 1 was not supported: Frames do not seem to affect the amount of time users spend on a site. Apparently users are not put off by the features of frames, at least not to the extent that it affects their browsing behavior.
The number of content pages users called up was examined using only the data from the pop-up and the frames versions. The data indicated that the visitors to the frames site requested many times more content pages (M= 11.05, SD= 11.03) than those who perused the pop-up version (M= 3.47, SD= 3.19). A preliminary Levene's test for inequality of variances was significant, F (1, 36) = 17.23, p < .001. Therefore the hypothesis was tested using SPSS's independent samples t-test for samples with unequal variances assumed, which demonstrated that the differences were significant, t (20.99) = 2.88, p < .009 (two-tailed). Hypothesis 2 was supported: Frames make a difference in the amount of content that users access, with frames encouraging more content requests–in an equivalent amount of time–than in a non-frames layout.
As we have argued above, analysis of Web design and testing hypotheses about Web-borne variables may be facilitated and improved using organic, spontaneous, unobtrusive, and inexpensive data that speak directly to how people actually utilize Web sites, by analyzing server logs and client-side logs. While these data are notably imperfect, their weaknesses may be overcome in at least two distinct ways: by triangulating them with traditional usability testing, and/or by collecting them within the framework of experimental designs intended to test directly the differential effects of specific design options.
In most cases, no one source of user interaction information is as strong as when more than one source is combined. In some cases, however, only one method of collecting user interaction data is possible. Server administrators need to understand the meaning of server log analysis tool output (e.g., if the tool produces server “accesses” or server “hits”) (Bertot et al., 1997). Server administrators also need to understand the extent to which server log output is useful in determining what improvements should be made to a Web site to increase its value and ease of use. Web designers and usability professionals need to understand how to use the output of server log analysis tools in their design, usability testing, and analysis activities. In particular, designers and usability professionals need to understand what they can and cannot learn from server log data and further, how those data could be complemented by data from client-side analysis data, usability testing data, and other techniques. Web site designers must evaluate whether or not to use one or more sources of use-based Web data based on available resources, maturity of a Web site, the particular questions and issues that need to be addressed, and the existence of design options that may experimentally be tested.
In other cases, more detailed user interaction information may be gathered as more methods are combined. Server logging tools capture interactions at a page level, client-server logging tools capture interactions within a page at a page element level, and usability testing can reveal user experiences and impressions at page and page element levels. Thus the inclusion of usability data can validate and expand upon inferences based upon server and client-side logging tools alone. Catledge and Pitkow (1995) proposed a similar approach, stating that client-side logs provide more “concrete” data about Web user activity and that guidelines for Web page, site, and browser ease of use can be gleaned from log file data. A triangulation of usability testing, client-side logging, and server logging may provide the best opportunity to understand how users are interacting with a Web site, what tasks users are trying to accomplish, and what improvements should be made to increase ease of use and effectiveness.
Any of these approaches might constitute a considerable burden to professional designers. This is especially true if one contemplates triangulating data prior to publishing a new or revised site. However, many organizations put great stock not only in having a web presence but in analyzing its utility, especially those organizations that claim to advance new media or knowledge about it. High tech companies and universities may find the use of these methods consistent with their missions. At the same time, these analyses need not take place all at once. Web designs that are based on user tasks, validated design principles, and use-based data collected over time will provide the most value to Web site users and ultimately, to the owners of Web sites.
As noted by Nielsen (1999), more research is needed in the area of Web usability. Use-based techniques of gathering data on Web site designs will contribute to the understanding of how visitors use Web sites and the analysis of use-based Web data will generate additional principles of Web design. The application of these kinds of analyses may reach farther than the professional designer's toolkit. Log analyses may be useful tools for the fields of rhetoric, visual design, and others that have concerned themselves with techniques of effective Web-based communication. These approaches extend beyond simply testing one guideline against another: They allow for the validation or modification of various theories, the principles of which can be distilled into testable hypotheses in Web designs. A vast amount of validating data are now available with which to assess rhetorical and aesthetic theories, evidence for the utility of which heretofore resided in only speculative and axiomatic form. The use of these techniques may have further utility in teaching communication and design of new electronic media. Teachers need not justify design recommendations on the basis of intuitive appeal or literary precedent alone; instruction may avail itself of empirical precedents and original findings. The education of future technology workers will be enhanced by exposure to social scientific principles of analysis, and a reinforcement of the often repeated but seldom demonstrated canon that effective design must consider users' responses.