Obviously, the data (IRC scripts) for condition testing have to be collected over a sufficiently long period of time – long enough to reveal interesting patterns and to even out random fluctuations in participation by channel members. Although there are no hard rules on how long the observation period has to be, some field experience would be sufficient to make such a decision. One day, or even a week, is apparently too short, because many IRC participants may only get online once a week. Our preliminary observation suggests that three to six months would be long enough to generate sufficient data to make a study meaningful.
Selection/Sampling of Channels
Online chatting is done on ISP-provided programs as well as on host systems of the Internet. The systems (IRC servers) typically operate in groups. Servers in the same group are connected to form an network. Participants connected to the same server or to different servers on the same network are able to communicate with each other, but not with those on a different network. Some networks are much more popular than others. A popular IRC network may have hundreds of channels, of which some are heavily populated while others are not. Note that, as we explained earlier, IRC channels may disappear or be re-created, but they do not change names or transmogrify.
Selective or random sampling of IRC channels can be done in two possible ways. One is to combine channels from existing IRC networks into one general pool for sampling. The other way is to first select IRC networks and then select channels on the chosen networks.
Two possible methods may be used to observe activities on IRC channels. One is to intercept and script participants’ postings from the server side. The other is to do it from the client side by being present at chosen channels. We call the former the “server-side approach” and the latter the “client-side approach”. Regardless of which approach is taken, if interception is done at the network level, every line of unparsed message will have the following structure: “:trueGalfirstname.lastname@example.org PRIVMSG #california :hey guys!”, in the format of nickname, IP string, action keyword, channel destination, and posted message. Each line of message can be time stamped.
The client-side approach can be implemented with a small program installed on a personal computer. The program can be configured to stay on several channels at the same time and log every line of messages coming from the IRC server into channel-specific files. However, this approach suffers from some major problems.
First, the observation cannot be truly unobtrusive. Even though the observation is absolutely passive and the “observer” does not contribute anything to the public forum, one can still argue that the very presence of the observer changes the group dynamics. This is true especially when the “observer” is singled out and approached by other participants. The researcher will have to face the dilemma of either responding to the inviting participant or refusing to be engaged in any conversation. Either way, the researcher's decision will affect the approaching individual.
Second, the scripting process may be interrupted either by technical difficulties or by human interference. The server may cut off the connection after a certain time of inactivity or when it detects a collision of nickname. The monitoring program may get kicked out, or even banned, from a channel when someone with power becomes tired of a constant “presence” which never says anything. In any case, the researcher should be aware of these problems and be prepared to deal with them.
Third, invited-only (private) channels also present a problem. The monitoring computer program probably will never get invited into a private channel. The researcher may get online in person, try to gain the trust of a channel owner/operator, and manage to get invited. Then, the researcher can either script the channel manually or try to smuggle the computer program into the channel under the same IP string. Although this method may not necessarily work with all invited-only channels, it at least provides a chance to get some idea of what is going on there.
Finally, monitoring private exchanges from the client side remains the biggest problem. It was noted on several occasions earlier that private exchanges among participants in a channel are important evidences of interpersonal interaction and that they should be included in the research. However, if the client-side approach is taken, it is impossible to intercept and script private messages, because private messages are directly sent from the server to the recipient. A possible remedy is to recruit voluntary participants to script their private exchanges for the research. While scripts of private exchanges gathered this way may reveal what is going on underneath the surface to some extent, they cannot provide a complete picture of private interaction in IRC.
Compared to the client-side approach, the server-side approach seems to offer much more control and flexibility. With a piece of software installed on the IRC server, all of the messages passing through the server – including private exchanges and messages meant for “invited-only” channels – can be captured. The process is seamless and the involved parties will know nothing about it. Although the server-side approach requires a great deal of assistance and co-operation from the system administrator and some system administrators may not be willing to help at all, the promise of highly complete and unobtrusive research data makes this approach extremely attractive.
Nevertheless, the decision of which approach should be used to collect research data can not be made solely on the basis of data quality. In the following section, we examine the ethical aspect of this research and focus our discussion on ethical issues in collecting unobtrusive and naturalistic data on human interactions in IRC.
While researchers in the field of computer-mediated communication are deeply concerned about the ethical issues of conducting research on human interactions in cyberspace, they have difficulty reaching agreement on common ethical guidelines (Allen, 1996; Bohlefeld, 1996; Garton, Haythornthwaite, & Wellman, 1997; Herring, 1996; King, 1996; Reid 1996; Paccagnella, 1997; Thomas, 1996a, 1996b; Waskul, 1996). Even if common ethical guidelines exist, following the guidelines to the letter does not automatically result in responsible and ethical research. It is not a matter of following codes, policy or procedure, but a matter of commitment to protect the participants in one's study from potential harm (Waskul, 1996). The individual researcher has the ultimate responsibility for keeping the best interests of the research participants in mind.
Recognizing that ethical issues may arise anywhere throughout the course of a research, we can nevertheless divide the ethical concerns of this research into three parts: ethical issues in data collecting, in data handling, and in reporting of findings respectively. It seems that ethical concerns in data handling and reporting of findings can be dealt with by following common practices in the field of CMC research. We can always stay on the cautious side by guarding the confidentiality of data unconditionally, by avoiding direct quotation of postings, and by not referring to any particular IRC participants in our writing.
In contrast, the data collecting part of this research is much more problematic from the ethical perspective. In this research, we need to gather data on mainly three kinds of CMC communications in IRC: public postings in public IRC channels, public postings in private IRC channels, and finally, private (one-to-one) exchanges in IRC. The three kinds of data present different sets of ethical problems, and we examine each of them separately below.
The ProjectH Research Group's ethics policy (Paccagnella, 1997; Rafaeli, Sudweeks, Konstan and Mabry, 1994) seems to take a reasonable position on the issue of how to study human communications in public domain ethically. Although the ProjectH ethics policy was specifically designed for studying newsgroup postings, it can be adopted for this research and be used to guide our practice in collecting and managing postings in public IRC channels.
Conversations in publicly accessible IRC channels, like messages posted on newsgroups, are public acts deliberately intended for public consumption. Recording, analyzing, and reporting of such content, where individuals’ identities are shielded, is not subject to “Human Subject” constraints. It should not be necessary to inform the participants or to obtain explicit consent before collecting their public IRC messages for research purposes. Nor is it necessary for the researcher to take any more precautions in handling the collected data than in any study of real-life public activities.
The problem of how to observe participants’ interactions in a private (invited-only) channel seems to be more complicated. The degree of “perceived privacy” (Herring, 1996; King, 1996) in a private IRC channel is relative to the number of participants. When there are only two or three participants chatting in the channel, the situation becomes essentially equivalent to private conversations. When the channel has a large number of participants, the degree of perceived privacy reaches the minimum. Since we are only interested in channels with a large number of participants, it should not be necessary to treat public postings in private channels much differently from those in public channels. However, message content and participant membership should be kept confidential, and the research report should not include any information that may be potentially used to identify either individual participants or the private IRC channel.
One-to-one messaging in IRC, regardless of whether it is carried out in a public channel or in a private channel or outside of any channel, is private communication. General research ethics requires that one should have informed consent from the research participants before recording their private communications, or gathering personal information from them. While it is relatively easy to obtain informed consent in studies that rely only on a few voluntary informants, it is rather difficult in this research. The goal here is to investigate the overall interactivity level in an IRC channel sustained through a period of time, and we can not possibly accomplish this by relying on only a few voluntary informants. On the other hand, it is impossible to anticipate who is going to join or leave an IRC channel and who is going to “message” whom at any particular moment. One may suggest that we should ask every participant for informed consent as soon as he or she joins the channel, which is hardly practical, or periodically post an announcement to inform the participants about the recording. However, as noted by King (1996), the act of requesting for informed consent, or even the compromised attempt of informing participants about the recording, will be grossly disruptive to the group process and make the results of this research questionable.
It seems that one possible solution to this ethical dilemma is to record only who is sending one-to-one messages to whom, but not the actual content, without explicitly seeking for informed consent from either the sender or the recipient. By not recording the actual content of private messages, we manage to protect the essence of private communication. Although the identities of individual participants and their acts of private messaging are recorded in the scripts, the potential of putting any participant at risk of harm as a consequence of this research is minimized. Once the scripts are compiled into quantitative data without human intervention, all of the related computer files should be destroyed and no permanent records should be kept. The generated data on occurrences and patterns of one-to-one communication in IRC will be sufficient for the purpose of this research, i.e., to statistically test for virtual community existence in IRC.
However, one may argue that with the approach described above, even though the content of private messages is not recorded in scripting, the process of monitoring private communications without informed consent itself constitutes a violation of the participants’ perceived privacy. Our response to this argument is that the importance of complete and unobtrusive observation in this research outweighs the necessity of acquiring “informed consent”. Determination of whether a research approach is ethical should not be based on whether informed consent is sought from research participants, but rather on whether the researcher makes his or her best effort to protect the research participants from any harm, and on whether the participants may suffer from any potential damage as a result of the research. On such matters, however, there will continue to be debate and discussion.
Data Extraction and Message Coding
Once enough scripts are collected, a computer program can be employed to extract the necessary data and to compute mean co-appearance ratios for each participant based on different factors of combination. Although the bulk of data to be processed will be huge, a well-designed program running on a reasonably powerful machine should be able to handle it.
While IP string-nickname associations shall be extracted from the whole collection of scripts and mean co-appearance ratios may be computed on a channel-by-channel basis, analysis of interaction (i.e., message references and message targeting) does not have to include every line of script. Since such analysis has to be done manually, it would be terribly expensive and impractical to try to code every single piece of script. Instead, the analysis may be limited to a good number of segments randomly selected from the whole collection.
For every channel chosen for inclusion in the study, the continuum of scripts for the whole period of observation may have weak spots where the channel does not have much activity going on. The weak spots form natural divisions of the continuum and can be used to define segments. Once the continuum of scripts is divided into segments of manageable size, a good number of segments can be randomly selected for the channel. The resulting random samples specific to IRC channels can be either analyzed separately or combined into a larger pool of samples, depending on what specific research questions we ask.