Commentary on Vermeulen H, Ubbink DT, Schreuder SM and Lubbers MJ (2007) Inter- and intra-observer (dis)agreement among nurses and doctors to classify colour and exudation of open surgical wounds according to the Red–Yellow–Black scheme. Journal of Clinical Nursing16, 1270–1277 and Sugama J, Matsui Y, Sanada H, Konya C, Okuwa M and Kitagawa A (2007) A study of the efficiency and convenience of an advanced portable wound measurement system (VISITRAKTM). Journal of Clinical Nursing 16, 1265–1269


This commentary focuses on two papers reporting evaluations of assessment tools used with chronic wounds. The main outcomes in this type of research are reliability and validity. Reliability is the degree to which repeated measurements of a stable characteristic produce similar results. This can involve repeated measurements performed by the same individual (intra-rater reliability) or by different individuals (inter-rater reliability) (Fletcher et al. 1996). Validity or accuracy can be defined as the extent to which the data measure what they are intended to measure. Determination of validity is usually accomplished by comparing the measurement against a reference standard. In fields where there is no accepted reference standard, assessment may focus on the level of agreement between competing methods (Bland & Altman 1986). Other relevant outcomes include feasibility (the practicality of use of an assessment tool) (Brazier et al. 1999), and the ability of data generated by the instrument to predict an eventual, clinically meaningful outcome, such as complete healing of the wound (Kantor & Margolis 2000).

Vermeulen et al. (2007) describe a cross-sectional study of the Red–Yellow–Black (RYB) scheme to assess colour and exudate in surgical wounds healing by secondary intention. Assessment was based on examination of 18 digital photographs of wounds, shown as a slide presentation. The inter-rater reliability of 63 nurses and 79 doctors was assessed within the two professional groups. The ratings of the two groups were compared with one another indirectly and also with those from an expert panel (considered as the reference standard). In addition, the intrarater reliability of 14 nurses and 13 doctors was determined, all of whom rated the wound photographs twice.

Interrater reliability among nurses and doctors was not significantly different, both groups achieving ‘good’ for assessment of wound colour, and ‘moderate for assessment of exudate, using recommended classifications of the Kappa statistic (Landis & Koch 1977). Similar agreement was seen when both clinician groups were compared with the expert panel. In terms of intra-rater reliability, both nurses and doctors scored ‘good’ for wound colour, whereas nurses scored ‘moderate and doctors ‘good’ for amount of exudate.

It is noteworthy that the agreement between nurses and doctors was not assessed directly, but indirectly using a Mann–Whitney U-test. This type of indirect comparison will only allow consideration of a difference in the level of agreement but will not reveal the aspects of classification where the two groups may have disagreed. This information seems pertinent in light of the authors’ concluding comments about the importance of uniformity of practice among professionals locally.

The authors explored a possible relationship between agreement with the reference standard and observer expertise, using age as a proxy for the latter. No statistically significant association was observed for either nurses or doctors. However, if mature students were included in the sample, an increase in age may not necessarily equate to an increase in experience. A more meaningful analysis could involve a correlation between the number of years of clinical experience and agreement with the reference standard.

It is unclear how the amount of exudate was assessed from the wound photographs. The authors reported that this was more difficult to agree upon than wound colour. This may be because exudate is difficult to assess from photographs alone, and perhaps really requires direct examination of the wound. Previous studies have collected and quantified exudate by weighing dressings, applying topical negative pressure (Dealey et al. 2006) and aspirating wound fluids from underneath a transparent occlusive dressing (James et al. 2000).

The authors conclude that the RYB scheme appears to be reliable in daily clinical practice to classify wounds and guide treatment decisions, and they mention an ongoing study assessing agreement of such determinations. Clinical guidelines for the assessment of pressure ulcers recommend the inclusion of wound colour and exudate within a wider range of wound characteristics (including dimensions, location, grade, pain, odour, infection) (Royal College of Nursing 2005). Although the authors have studied a different wound type to this, it seems reasonable to base clinical examination on a range of patient and wound characteristics as opposed to just one or two elements for which validity has not yet been established.

The second study focused on the measurement of surface area of pressure ulcers using Visitrak (a portable digital wound measurement device) (Sugama et al. 2007). A range of outcomes were considered including intrarater reliability and interrater reliability (four nurses observing 10 wounds), concurrent validity (involving a comparison with digital planimetry as the reference standard using observations of 30 wounds) and an aspect of feasibility (comparison of the times taken for measurement for the two methods).

Intrarater and interrater reliability were determined using the intra-class correlation coefficient and showed high correlations. Validity was assessed by generating the correlation coefficient between Visitrak and digital planimetry and showed a positive correlation between the two methods. In terms of the time taken to conduct measurements, this was statistically significantly shorter for Visitrak (median number of seconds 54 vs. 126).

Measurement of wound size is of importance in both research and clinical practice. Many trials use change in wound surface area as a primary outcome if funding does not permit follow-up to complete healing for all recruited patients. Even well-resourced trials may assess change in size as a secondary outcome. Wound size is also important in clinical practice for monitoring purposes and may have a role in predicting eventual complete healing for some wound types (Kantor & Margolis 2000).

This study has highlighted some of the difficult aspects of evaluating wound measurement instruments, namely, the potential variation between observers in terms of defining the wound perimeter and the lack of consensus concerning the reference standard. This study used digital planimetry, and this has also been proposed as the reference standard by other researchers (Öien et al. 2002).

One aspect of evaluation that could have been handled differently is the method of analysis for agreement between Visitrak and digital planimetry. The authors have generated the correlation between the two methods which demonstrates the strength of relationship between the two sets of measurement but does not summarise the level of agreement. A preferable approach is the limits of agreement method proposed by Bland and Altman (1986). This entails plotting the observed differences between the methods against the mean value of the two methods for each measurement. This allows examination of the possible relationship between the measurement error and the true value (Bland & Altman 1986). In light of this, the authors’ conclusions in relation to the concurrent validity of Visitrak may not be justified. A similar point has been raised in an earlier commentary (Smith & Truscott 2006).

Research methods for evaluating the performance of clinical assessment methods are at a much earlier stage of evolution, than, for example, clinical trials, where there are now clear guidelines for principles, practice and reporting (Pocock 1983, Begg et al. 1996). These two wound management papers make a useful contribution to a challenging area of research in which investigators continue to develop methods and expertise in terms of evaluation, and clinicians continue to explore the best ways of using the information derived from such assessments.