The component plane analysis method in conjunction with the SOM method was used to investigate relevance among one group of health subjects. A transaction log from a public health portal was employed for the study. The subjects from a health subject directory in the portal were extracted and the traversal activities among these subjects were analyzed. The focus theme ‘weight control’ was selected and the related subjects to weight control were analyzed in the visual contexts of the subject component planes. The findings of this study can be used to optimize the health subject directories, and understand users' medical information seeking behavior.
According to a report (Weight Control, 2008) 66 percent of adults in the U.S. are overweight. Obesity represents one of the most important nutritional diseases in affluent countries, and is an issue that persists from childhood into adulthood. A healthy weight can help people maintain normal cholesterol, blood pressure, and blood sugar levels. It is believed that weight control decreases chronic disease risk, morbidity, and mortality. As a result, the weight control issue has brought to the attention of the public. Component analysis method applications have been found in computer science and other fields (Denny, Williams, and Christen, 2007; Raivio, 2006), but they are rarely found in the library and information science literature. The objectives of this study are to use the component plane analysis approach to investigate weight control as a topic in a subject directory from a consumer health portal (HealthLink, 2008), identify a group of subjects related to weight control, and display the degree to which they are relevant to weight control in a visual component plane from the users' perspective. The findings of this study make a positive contribution to our understanding of health consumers' information seeking behavior on weight control, and can be used to optimize the content organization of weight control information and to enhance information retrieval.
2. Methods and Results
A component plane has a very close relationship with the Self-Organizing Map (SOM) method. It is clear that every cell in the SOM display is associated with a weight vector. A weight vector plays a critical role in both labeling subject theme areas in the SOM display, and mapping the input objects onto the final SOM display. In our case, the number of the subjects and topic is equal to the number of the attributes in the weight vector. That is, each attribute in the weight vector represents a subject or topic. In the component plane analysis for the weight control subject, the corresponding attributes in all the weight vectors are extracted and separated from the original SOM display to yield a weight control component plane whose size is the same as that of the SOM display. Both the component plane and the SOM display share the same display structure. But the difference between this derived component plane and the SOM display is that a cell in the component plane corresponds to a single attribute value while a cell in the SOM display corresponds to a weight vector.
Since each cell in the component plane is associated with a single value, the cell value can be easily converted into a color in such a way that a larger cell value corresponds to a lighter color and a smaller cell value corresponds to a darker color. After the color conversion process is completed, the component plane can be partitioned by different colors. Because the component plane has the same display structure as the SOM space, the colored component plane can be merged with the SOM display where all the subjects are projected onto. As a result, the produced subject clusters in the SOM display can be analyzed in the context of the colored component plane.
The greater a cell value (or the lighter the cell color) in the component plane for the weight control subject, the more traversal traffic activities are from other subjects to the weight control subject. A subject cluster that is projected onto a cell with a high value (or light color) tends to have more traversal activities to the specified subject than a subject cluster that is projected onto a cell with a low value (dark color). This means that the subjects located in a cell with a higher value in the component plane are more relevant to the specified subject than the subjects situated in a cell with a lower value. Following the principle, the subject clusters in the component plane can be ranked in terms of relevance to the specified subject based on the cell colors where they are located.
Both the component plane analysis and SOM analysis methods have been used for object clustering analysis. But they work in different ways. In the SOM method, the objects which are projected onto the same cell or adjacent cells are regarded as relevant, while in the component plane analysis method the objects which are mapped onto the cells with light colors (high cell values) are regarded as relevant. This implies that the SOM method cannot identify the relevant objects in its visual space if the objects are not located in the same cell or neighboring cells. The primary advantages of the component plane analysis are that it allows users to focus on one specified attribute and its related objects, and it can reveal related information of the specific attribute that the SOM analysis cannot (For instance, the subjects diabetes (11) and wellness lifestyle (46) in this study).
The investigated Healthlink Web log covers one-year long data from Jan 1st, 2006 to Dec 31st, 2006. There is a health subject directory which includes the topic (the root of the directory), 47 subjects (the directory branches), and 2099 associated articles (the directory leaves). The final component figure and subject relevance analysis results are shown in Fig 1 and Table 1 respectively.