Four epistemological views of information organization behavior on personal computers of information workers
Hjørland proposes a typology of four epistemological views in analyzing professional knowledge organization systems. In this study, we identified all four views as components of personal information organization in the current hierarchical folder structures on personal computers. The typology enabled us to synthesize the varieties and commonalities within the large number of particular information organizational practices observed both within and between individuals over time. The study demonstrated that the typology is a promising analytic descriptive framework and can help illuminate problems in current folder systems.
How people create folders is an important question in understanding how people organize information on personal computers. In analyzing the data collected in an exploratory study and trying to understand how and when the participants created folders and built up their folder structures on personal computers, we confronted the challenge of lacking a framework to integrate and make sense of the varieties and commonalities within the large number of particular folder creation practices observed both within and between individuals over time. This might be one of the reasons that there have been few studies investigating this particular issue. Examination of the data at the epistemological level revealed that the elements of pragmatism, rationalism, empiricism, and historicism, which are the four views of Hjørland's typology and usually exist in general knowledge organization systems, also exist in the participants' personal information organization behavior on their personal computers.
After briefly introducing Hjørland's typology of pragmatism, rationalism, empiricism, and historicism, together with related studies on personal information organization behavior and current hierarchical folder systems, the following sections describe the four elements in participants' folder structures as well as the instances that do not conform to the typology. The problems in current folder systems illuminated by the typology are described and discussed after that.
Hjørland (2009) proposes to apply a typology of four epistemological views as an analytical tool in examining knowledge organization systems. In (Hjørland, 2009), the four views are defined as:
Pragmatism is the ideal of basing knowledge on the analysis of goals, purposes, values, and consequences (p. 1526).
Rationalism is the ideal of basing knowledge on logics, principles, rules, and idealized models (p. 1524).
Empiricism is the ideal of basing knowledge on observations (and on inductions from a pool of observations) (p. 1523).
Historicism is the ideal of basing research on social contexts, on historical developments, and on the explication of researchers' pre-understanding (p. 1525).
These views are “idealizations,” as noted by Hjørland (Hjørland, 2003) in another paper, and they usually do not exist in pure forms. Instead, the four views are connected. For example, “any kind of pragmatism is limited by constraints set by the real world through empirical evidence” (Hjørland, 2003, p. 107), and pragmatism is “closely related to historicism by understanding that observations are contextual” (Hjørland, 2009, p. 1526) although pragmatism places more emphasis on purpose.
There are several studies using this typology in analyzing knowledge organization systems. For example, Dousa (2008) describes the application of this typology to the analysis of Julius Otto Kaiser's Systematic Indexing, finding the hybrid nature of epistemological positions in his theory. It is interesting to investigate if this typology can also help make sense of grassroots information organization behavior on personal computers.
People organize information not just for ease of finding it. It could also be for reminding, understanding, or other reasons such as creating a legacy, sharing resources, confronting fears and anxieties, and identity construction (Kaye et al., 2006). Ravasio et al. (2004) found that people invest effort in organizing the hierarchical file system structure in order to engrave “the information's content and context into the system,” and provide “an overview in a single glance.” Jones further points out that people organize as a part of making sense of the information and possible use (Jones & Teevan, 2007).
On the other hand, many studies have found that people have cognitive difficulty in organizing and naming information, be it paper files or electronic files, bookmarks, and emails (Abrams, 1997; Boardman & Sasse, 2004; Malone, 1983; McKenzie & Cockburn, 2001). Specifically, Barreau and Nardi (1995) found that people especially have problems organizing ephemeral information which is information needed for only a short time.
It has been found that many people do not spend much time organizing their personal information (Whittaker & Hirschberg, 2001). Similar to Malone's (1983) neat “filers” and messy “pilers,” users are categorized into “cleaners” and “keepers” (Gwizdka, 2004), or “pro-organizing” and “organizing neutral” (Boardman & Sasse, 2004). While Whittaker and Hirschberg (2001) argue that “the distinction between filers and pilers [is] one of degree,” “all people file some information and pile some other information.”
Instead of fitting people into categories, Bondarenko and Janssen (2005) identified procedurally pre-structured “administrative” activities and unstructured “research” activities, which have a large impact on document management. For example, an “administrative” activity corresponds to fast document flow, while a “research” activity corresponds to slow document flow. They also noted that information workers have both types of activities in their work.
In a study investigating how participants describe and organize their documents in offices, Kwasnik (1991) found that form, use, topic, location, circumstance, and time are among the most important classificatory factors. Barreau (1995) observed the same pattern with digital documents. In a study investigating how people tell stories about their digital documents, Gonçalves and Jorge (2004) found that the dimensions most commonly used were time, place, co-author, purpose, subject, other documents, format, exchanges, tasks, storage, and contents.
For folders on personal computers, Khoo et al. (2007) found that the most common types of folder names were document type, organizational function/structure, and miscellaneous/temporary. A task- or project-based organization method has been recognized as an important need in information organization (Kaptelinin, 2003).
Henderson's (2009) study is one of the few that investigates the folder structure development behavior on personal computers. Henderson categorizes folder creation behavior according to temporal features: “in advance,” “just in time,” and “cleanup” (p. 78). The folder name analysis is based on categorization of folder names captured by a file system snapshot program. The categories of folder names include Genre, Task, Topic, and Time.
The current strict hierarchical folder structure on computers creates problems and inconveniences for classification (Kwasnik, 1991), for example, users cannot place a single item in multiple folders. However, recent studies have also identified that there are advantages with current folder structures that were not previously emphasized. For example, folders can provide an “effective way to manage workflow” and clutter (Civan et al., 2008), and a stable folder structure provides a familiar environment for personal information organization (Boardman & Sasse, 2004).
Overall, prior studies demonstrate the complexity and subtlety of information organization behavior on personal computers, and our understanding of this behavior is still limited.
Two rounds of in-depth semi-structured interviews were conducted with six PhD students and six administrative staff in an academic environment in front of their computers with a three-month interval between interviews. During each interview, the participants gave the investigator a guided tour of their main information organization systems. Although a set of broad, open-ended questions were used to help guide the conversations, the actual interviews were directed by what was observed and what the participants talked about in their primary information organization systems which included mostly file folder systems and some email folder systems. The participants were asked to talk about their behavior of folder creation and organization of particular folders and files, as well as difficulties they might have in re-accessing them. At the end of each interview, three or four folders were selected to run disk scan commands (for Windows, Mac, and UNIX operating systems) to capture the file folder structures. Based on the first interview disk scan data, at the time of the second interview several files or emails were randomly selected on each participant's computer to ask them to re-find them. Email folders were recorded with screenshots. During the three months between interviews, participants were also asked to report via email any experiences of information re-access difficulty.
These multiple instruments (interview, re-finding experiments, email, disc scan, screenshot) were designed to obtain rich data. The second interview followed a similar procedure to the first interview, although it included the re-access experiment and focused more on the new and changed parts of the folder structures. Two rounds of interviews allowed evolving issues to be captured and explained to complement the data collected in the first interview. Multiple interviews are especially valuable in studies of personal information organization behaviors because information organization behavior is often conducted without much explicit thought. The three months allowed time for interviewees to pay attention to and report information re-access difficulty experiences via email which were discussed during the second interviews.
Similarly, using the two groups of participants provided rich and varied data. Since “activity type has a large impact on document management” (Bondarenko and Janssen, 2005), it is important to note that the two groups are not exclusively doing “research” and “administrative” activities. Rather, members of each group do both kinds, but in varying degrees, giving an opportunity for a richer understanding of a continuum of multiple practices. Figure 1 includes the operating systems participants were using and the length of time they had been in this institution at the time of the first interview.
Each interview lasted about 1 to 2 hours. The interviews were audio recorded and transcribed. The total audio length of 24 interviews is just over 34 hours. The transcripts were analyzed and coded in QSR NVivo 8.
With a small sample size, we are focusing on depth of understanding rather than making claims for breadth of coverage. The initial coding yielded 242 free nodes. Consequently we had a substantial number of information organizational activities to make sense of. We found that Hjørland's typology enabled us to gain a sense of recurrent themes in these individual and idiosyncratic behaviors.
In identifying elements of Hjørland's four views in information organization behavior on personal computers, we delineate the key features (see Figure 2) for each view based on the definitions given in the Introduction section, and at the same time consistent with the context of information organization on personal computers. This is an operational tool for data analysis rather than formal definitions.
For the pragmatist view, since the basic function of folders is to separate and group, to separate/group will not be seen as a pragmatic element (purpose) in this paper.
Since folder structures are personal forms of organization intended to support an individual's tasks and goals, they are pragmatic in nature. Thus, it is not surprising that there are many examples observed in the data consistent with the pragmatist view.
Most of the top-level folders under the home directories or drives are for a particular purpose, e.g. project or course folders for PhD student participants, and job tasks such as “employee reimbursement” and “awards,” as well as projects such as “TEI workshop” for administrative participants. A participant created a top-level folder of important files and folders for backup purpose: “this is the directory I have to backup no matter what.”
The pragmatic element can often be seen in how participants described what a particular folder is. When participants were asked what a particular folder is, they often used the phrases such as “project or type of work,” “those are all things that I do in support of the …,” “I have a folder just for alumni news,” “all these are about the visit,” “anything involved in CCB (name of a center),” “save here for this purpose, so next time if I go for that purpose, that information will be there,” “for that (field exam), I have a directory,” “(a folder) … was where I was working with … as RA,” etc.
The pragmatic element is also seen in how participants re-find information items. When they were asked to re-find a file during re-access experiments, their first responses include “that would be related to this class,” “sounds likes something I did in …,” “that was for …,” etc. One participant explained that looking for items according to project or task was a way he finds things: “it's usually an easy little memory device for me, oh I did that paper for this class, it's in that class folder.”
The purpose could come from re-accessing needs. For example, an administrative participant created a subfolder “furniture” under folder “orders” when she found that she started to have to re-find orders of furniture. The pragmatic element can also be indicated in the lack of organization of some folders. For example, a participant did not re-organize a folder which was deemed badly organized, because she did not need to access that folder very often.
While observing the pragmatic element in folder creation, the study also observed cases when participants have no particular purpose for an information item, and such items often were associated with organization and retrieval difficulties. This is more of a problem for PhD student groups probably because they have more of this type of information. Examples include a note taken during a talk which has nothing to do with the student's research projects, and an article a colleague sent and recommended to read. Two students talked about the difficulties in organizing and re-finding this type of files. As a solution, several participants in both groups used some types of catch-all containers, e.g., miscellaneous folders. A PhD student uses people's names as such a container for files where the only common clue is by or about the person. Another PhD student created a folder “talk” in order to have a place to put notes on a talk, even though there is only one file under it. An administrative participant did a similar thing with a file that he “didn't know where else to put it, so,” he created a folder for it. Some participants simply left the files scattered in different locations depending on their judgments at the moment, and possibly a bottom-up organization later on would play a role to get them organized (see below subsection “Empiricist Element”).
The participants in both groups have more or less general folders at the top level, e.g., “academic,” “school,” “teaching,” “corporate interactions.” But it was more salient in the PhD student group probably because the administrative participants' jobs are more specifically task-oriented. For PhD student participants, as one of them said, these top level folders correspond to the “big chunks” in their information spaces: “…based on my activities. I have my own study, my dissertation, my teaching assistantship. In form of work, I have these two big chunks.” Another PhD student with 30 top level folders used several quick links (on a Mac) for current main parts, while another student with 10 top level folders said “I mostly think about it just in terms of file folders.” An administrative participant created a top level folder “teaching” and then a course name subfolder even though that is the only subfolder under “teaching” and she did not expect to teach other courses. Another administrative participant decided to organize emails according to people's names at the top level and then created project subfolders under a particular person's folder, because that is how she thinks and recalls these emails.
The rationalist element can also exist in subfolders. For example, a PhD student chose to organize his readings “based on how people talk about things in the literature.”
As the above examples illustrate, the rationalist element represents how people envision (or want to envision) their information spaces and make sense of them.
Empiricist approaches are based on bottom-up analysis. Although current folder systems do not provide a good mechanism for bottom-up organization, it was observed in this study that many participants accumulated files for a while and then created a subfolder when they realized that it was needed. This action reflects not only similarities between items, as listed above in the definition of empiricism, but also on the number of items, importance, access frequency, and so on. For example, an administrative participant said:
“it soon became clear that it's going to require a lot of communications. So after the first couple of dozen messages just stuck in here, I decided it deserves a subfolder.”
Similarly, a PhD student said:
“At some point, I'll probably have a bunch of stuff under this directory, then I'll create a directory for that and move a bunch of things to that directory.”
Another administrative participant would create a subfolder “when we are committed” that “we're going to do this.” It was also observed that a participant moved a subfolder to the top level when she found she used it frequently.
Because there is no systematic place to “wait,” such files sometimes were scattered in different folders and may not get collected when a subfolder was created later. For example, an administrative participant created a folder for two files about internships, without remembering that there was another file about internship she had received earlier but had left in the parent folder.
In many participants' home folders/drives (which includes top level folders and files) and subfolders, there existed many individual files, potentially waiting for this bottom-up organization. They were mixed with other files that are intentionally left there, e.g., important ones, templates, more frequently used ones, and the ones put there for reminding purpose. This study observed that two students grouped certain files under their home directories between the two interviews. For example, a student created a “misc” folder at the top level and moved several files that were originally under the home folder into it.
The historicist view emphasizes context, situation and pre-understanding. It exists implicitly in many folders. A prominent phenomenon demonstrating this element is that folder names do not always carry a literal meaning.
For example, an administrative participant had “TEI WORKSHOP” and “publications” as two folders at the top level among many others. But the contents under them are all invoice vouchers with different clients. They are separate from another top level folder “invoice vouchers” because they have their own account and the means of payment processing is different from the others. Because a major task the participant was responsible for was various payments and money transactions, there is a “pre-understanding” behind these folders that they are invoice vouchers, even though it was not labeled in the folder or file names.
A PhD student participant had a folder “atlas” (a particular software name). But the content under the folder was her data files instead of the software. Another participant had a folder named with a person's name, but it was a folder for the project that she had been working on with the person, not a folder about that person or the person as the author. A different participant had an “articles” folder for articles that she might use for her dissertation.
This implicit “pre-understanding” may make perfect sense to the user, as a participant declared, “because that's something I used, so I know what it means,” but could cause problems when other people try to understand and interpret the folders.
Situation factors can change organization behavior. For example, an administrative participant had a big email folder containing 822 emails that were all obtained in a short period. No subfolders were created because “it was a project that was going so fast and furious …and I almost didn't have time really to think too hard about how best to organize.” In a casual talk after the data collection ended, a PhD student participant described a very similar case where he ended up with a folder containing many files and without subfolders for a fast growing project.
In addition to that, a folder originally for a project could be extended to include a follow-up project. For example, a participant has a folder “defense” for files used for her dissertation defense. But it also includes files for depositing purpose after the defense. Another participant developed her course project into a research project, which extended the course folder to a project folder.
All these examples indicate that folder names have to be interpreted in the light of the user's particular understanding, context, and situation, and cannot be interpreted literally or strictly.
It is also worth noting that folder creation and structure are not fully controlled by users. They can be determined by system and software. For example, several participants had music files saved by iTunes at a particular “default” place.
Although a few organization behaviors observed in this study can be characterized by a single element as described above, many other behaviors seem to combine more than one element.
For example, a PhD participant had a folder with the name of the school. Later as he realized that he was going to graduate, he decided to split this school folder into two, creating another folder named “academic,” so that he could bring everything under the “academic” folder with him after he graduated and archive the other school folder. This example shows that the rationalist element was connected with a particular situation (historicist element) at a certain stage in the program, and was influenced by a specific purpose (pragmatist element) of “bringing with him after graduation.”
Another example is a special phenomenon of folder structure observed in the study. A participant had two folders at the top level as illustrated in Figure 3.
The second folder is for the meetings that the Dean had asked to arrange. Thus the top level “Meetings” folder is for the other meetings. These general-exceptional folder structures were observed in several participants' folders in both groups, and the reasons behind it identified in the data include priority, anticipated access frequency, and anticipated volume of items. It is worth noting that since this perception is hidden and mainly makes sense to the user, it could cause confusion to others. For example, a participant who was working on a shared drive for a project was not aware of an existing “meeting” subfolder under a top level folder “advisory committee” and thus put the files about the advisory committee meeting under a top level folder “meeting and agenda,” which caused confusion for the group members. Both pragmatist and rationalist elements can be seen in this type of general-exceptional folders.
Accumulated over Time
An element that seems outside of the typology and is unique in personal information organization systems is the time dimension. The folder structure accumulates over time and includes accumulated elements of the four elements at different times. For example, every participant in this study includes folders tagged with “new,” “old,” “archive,” or “work in progress” which indicates the time dimension in folder structures. Although these folders were mixed with current ones, participants use the specific folder naming mechanisms to separate them out.
Problems with Current Systems
Looking at participant behavior in this framework demonstrates problems and limitations due to functionality and interface constraints of current file organization systems.
Pragmatic view over time: when purpose changes
This study observed the pragmatic element changing over time, especially for the PhD student group. For example, a PhD student found that something he worked on for a class became a research area that he wanted to work on. But it was still under the class folder. The participant said: “I keep meaning to change it.” A similar situation happened to another PhD student and he was planning to copy the related files out to separate them.
Sometimes the change of pragmatic element involves more than copying things out. It may lead to a need to retrieve items from a perspective different from the top-level folders. For example, two PhD students encountered a similar situation where they needed to access all the readings scattered in different courses and projects. This is difficult in current systems and both participants ended up re-downloading many files again. This may be seen as the need for a mechanism to support multiple classifications. But it is more a need for multiple access points. Using folder names and even file names as tags at retrieval time may help alleviate this problem in current folder systems.
Rationalist view: the way we see things
The lack of an ability to classify an item into two folders is a major weakness in current systems. For example, a PhD student wanted to be able to look at his files from both research perspective and project perspective: “but those two things kind of overlap, and I haven't quite figured out what's the best way to overlap files that are related to a project but also related to my research agenda.” But this problem cannot be solved easily for editable files under folder systems because it involves version tracking which is missing in current systems.
Another limitation of current folder systems is that the strict top-down display hides information. As a complementary mechanism to a tree view, a “map” view should be provided to give an overview of the folders and access that is not limited by step-by-step navigation.
Empiricist view: mechanism of waiting for bottom-up
A better mechanism is needed in current systems to allow postponing creating folders, which means that items should be able to be grouped (with tags of document/event attributes) without being filed to a folder before a decision is made. Currently these files are scattered in folders mixed with others. As a PhD student commented on one of his folders: “it just confuses everything in here, because these are archive or reference directories, these are actual content directories, and this is a specific kind of content directory that I used a lot.”
Grouping can represent weak association (e.g., general genre, content, usage, etc.). For this purpose, some level of spatial organization can convey the weak and sometimes vague association between items or groups of items.
Historicist view over time: derivative relationship tracking and file naming
The study found that sometimes the folder or file names used to encode the context may not make sense later even though the participant tried “to be descriptive when I name the files.” As a PhD participant put it:
I tried to name things systematically, like with the date I took…like a few months later, you look at the name, it doesn't mean anything. It was so obvious when you did it, but it's really not.
Another PhD student also made similar comments.
A related problem is the file naming mechanism. As an important context encoding method, file names sometimes are controlled by other factors. For example, a participant had a file he worked on with another person who named it in a different way than this participant did; another participant changed a file she worked on to a different name when she sent it to someone else, “in order to make it more communicative for her so she knows this is mine.”
A function missing in the current system is version and derivative relationship tracking (Zhang & Twidale, 2009). Version derivation can be more complicated than linear relationship. Context information such as who sent it to me, where I got/downloaded it, whom I sent it to, what I did on it, etc. are very useful for people retrieving and understanding their information items. And a lot of these details can be captured automatically by computers.
As the first effort to explore the applicability of Hjørland's typology in helping make sense of personal information organization behavior on computers, this study identified all four views as components in the participants' folder structures, which suggests that personal information organization behaviors share similar epistemological ground with professional information organization systems. The study demonstrated that the typology is a promising analytic descriptive framework and can help illuminate problems in current folder systems.
It is also important to note that further studies are necessary to explore how the typology can be expanded with extra components and the interactions between components to better tackle the complexity and uniqueness of information organization behavior on personal computers.