SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Background
  5. Methodology
  6. Findings
  7. Discussion
  8. Conclusion and Future Research
  9. Acknowledgements
  10. References

At the beginning of a patient's visit to the Emergency Department (ED), a record is created containing the Chief Complaint and the Triage Note. The Triage Note attempts to capture the events and circumstances leading up to the decision to visit the ED. Events in the triage note are often not recorded in the order in which they occurred but rather in the order in which they are reported. In addition, patients typically do not use absolute terms to describe the timing of these events. Instead, they rely on relative phrases such as yesterday and this morning. Using a temporal information extraction system, we examine the variation in frequency of these relative temporal expressions over time of day. We then suggest interpretation rules for translating this ‘natural time’ used by people to the ‘logical time’ necessary for the automatic placement of the associated events in the proper sequence on a timeline. The research described here serves as a building block to support the automated creation of timelines with the goal of aiding clinicians and others in visualizing the patterns of events which led to the patient's visit to the Emergency Department.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Background
  5. Methodology
  6. Findings
  7. Discussion
  8. Conclusion and Future Research
  9. Acknowledgements
  10. References

It is 2:00 a.m. and you go to the emergency department (ED) for abdominal pain. When the triage nurse asks you when the pain began and you respond “early last night”, does that mean a few hours ago or does it mean a whole day and a few hours ago? The first interpretation acknowledges 12:00 a.m. as the dividing point between days. The second interpretation seems to define a new day as starting “after one has slept”, thus implying that you have not yet slept. The correct interpretation, representing the duration of your pain, affects the immediate course of evaluation and treatment in the ED.1

thumbnail image

Figure 1. Sample ED record including visit timestamp, chief complaint (CC) and triage note (TN).

Download figure to PowerPoint

Information gathered at triage is recorded in the ED patient record, including the chief complaint (CC) and triage note (TN) fields. The CC is a brief field; usually containing 1-2 concepts while the TN provides greater detail. This can include the history of the present illness, a description of symptoms, or the manner in which an injury occurred. Together, the CC and TN (henceforth CC/TN) provide an initial snapshot of a patient's problem: the “story” behind their decision to come to the ED. Untangling the order in which events occurred can, however, be difficult. Events are often not recorded in the order in which they occurred, but rather in the order in which they are reported (e.g., in response to the nurse's questions). More importantly, the time of occurrence or duration of an event can be expressed in a wide variety of ways, including absolute and relative references. Absolute references employ calendar and/or clock terms to express time, e.g., “had surgery on October 2” or “at 3:30”. Relative references express the time an event occurred in relation to another event such as the current ED visit, e.g., “2 days ago”, or “last night.” People generally have no problem interpreting the meaning behind both types of references, in part because they do not require exactness to understand the timing of an event. If, however, we want a computer to correctly interpret and reason with these temporal expressions, we need to define interpretation rules to place the associated events in their correct place on a timeline.

This paper describes the usage patterns of certain relative temporal expressions (TEs) commonly used in triage notes and outlines interpretation rules based on these patterns. It is part of a larger project whose long-term goal is the creation of a system that can automatically identify and extract distinct events in the CC/TN and place them in the appropriate place on a timeline of the events which occurred prior to the patient's visit to the ED. By creating a timeline, we hope to provide clinicians and researchers with a visual representation of a patient's situation. Potential uses of such a system include improving understanding of patterns of events that occur prior to the ED visit, thus 1) aiding clinicians in patient treatment and 2) informing public health practices.

Background

  1. Top of page
  2. Abstract
  3. Introduction
  4. Background
  5. Methodology
  6. Findings
  7. Discussion
  8. Conclusion and Future Research
  9. Acknowledgements
  10. References

Zhou, et al. developed a model of temporal reasoning for events that are described in hospital discharge summaries (DS) (Zhou, et; al 2006, Hripsak, et al. 2005). They sought to discover if temporal information available in discharge summaries could be represented with temporal formalisms such that systems could be built based on “computationally feasible algorithms” which would allow inferences to be drawn. As part of this work, they constructed a temporal expression coding scheme (hereafter referred to as the Zhou model) (Hripsak, et al. 2005, 56). The Zhou model includes 7 categories of temporal expressions (TEs):

Date and Time (“bike accident 03/15/07”, “bike accident last Sunday”)

Relative Date and time (“yesterday”, “3 days ago”)

Key Events (“after surgery”)

Duration (“abd. pain for last 3 days”)

Fuzzy time (“history of back pain”)

Recurring time (“taking Demerol 2x day for a week”, “legs hurt at night”)

Other events (“unknown last menstrual date”)

Five of these seven categories contain subcategories (the exceptions being ‘Other events’ and ‘Recurring time’).

While discharge summaries and triage notes have much in common, there are crucial differences, particularly when concerned with developing systems for extracting event and temporal information. A discharge summary is written in a narrative and linear style; in contrast, the language in a CC/TN is not standard text and often contains a variety of abbreviations, truncations, misspellings, and shorthand notation (Travers and Haas 2003). For example, “headache” may be expressed as HA, h/a, hdach or even hedache.

Therefore, the initial phase of the larger umbrella research project categorized a set of 598 TNs from North Carolina hospitals to determine how well the Zhou model fit (Haas, et al. 2008). One hundred twenty-four (124) of these records had no TEs. A total of 891 TEs were found, resulting in a mean of 1.87 per TN (for those TNs with TEs). The 7 main categories of the Zhou model were found to work well for TNs. In comparing results from the TN categorization with that of the DS, however, the frequency of usage differed dramatically.

The most common temporal category found in discharge summaries by Zhou, et al. (2006) was Date and Time, while the most common category for triage notes was Relative Date and Time. The Date and Time category includes not only explicit date (3/21/07) and time references (0900) but also includes named entities such as Sunday and October. As the name suggests, Relative Date and Time category includes expressions where knowledge of the time of visit is required to know when the event occurred. Date and Time expressions accounted for 36.2% of the DS expressions but only 10.7% of the TN TEs. Relative Date and Time expressions, on the other hand, accounted for 38.5% of TN TEs but only 5.4% of DS TEs. This is not surprising considering the two sources. Discharge summaries are written at the conclusion of a hospital visit, drawing on the patient records generated during the visit. In essence, they are a detailed history of the visit, written by a physician for use by other clinicians in the future of the patient's care. TNs are notes of a face-to-face interview that include symptoms and incidents reported by the patient as well as the nurse's observations.

The ED interview participants share the same temporal context of the ED visit, making relative temporal expressions a natural form of expression. Haas et al. (2008) provide further discussion of these differences.

Zhou, et al. established four subcategories in the Relative Date and Time category; these are listed, with examples, in Table 1.

Table 1. Relative Date and Time subcategories
RDT subcategoryExample
Yesterday, today, tomorrowcar accident today
Past / Next (time unit)coughing worse over the past few days
(A period of time) Ago/laterfell 4 hours ago
In/within (a period of time)changed packing 7-8 times in the last hour

We chose to focus on gaining a better understanding of the types and patterns of usage of RDT TEs for two reasons. First, it was the most frequently used type of TE in the CC/TN corpus. Second, placing RDT TEs on a timeline requires more complex interpretation rules than do Date and Time TEs. The latter can be placed directly, but the former must take into account the time at which it was written, i.e., the time of the ED visit. Within the RDT category, the Yesterday, today, tomorrow (YTT) subcategory was the most frequent, accounting for 26.3% of all TEs and in 69% of the RDTs. The YTT subcategory includes expressions referring to days and parts of days, such as yesterday and this morning. Can frequency of use or patterns of usage help us design interpretation rules, for example, to determine what “last night” should mean in the opening scenario? In particular, we sought to answer the following questions:

Which of the YTT concepts are the most common? Which level of granularity do people use to label events; in day-size chunks (today, yesterday) or parts of day (this morning, this afternoon)? Does the frequency with which YTT concepts are used vary by time of year? Records in the CC/TN corpus were selected from different times of year to capture any seasonal differences in ED visits. For example, there are more ED visits from patients with flu-like symptoms during the winter than the summer. Does the frequency with which YTT concepts are used vary by time of day? If so, it may suggest boundaries for shifts in how the TEs should be interpreted, as illustrated in the opening scenario.

Once we have a better understanding of the usage pattern, we can begin to develop inference rules for determining when an event occurred in relation to the ED visit.

Methodology

  1. Top of page
  2. Abstract
  3. Introduction
  4. Background
  5. Methodology
  6. Findings
  7. Discussion
  8. Conclusion and Future Research
  9. Acknowledgements
  10. References

The North Carolina Disease Event Tracking and Epidemic Collection Tool (NC DETECT, 2007) receives data daily from 105 of the 111 24/7 emergency departments in NC. During the initial phase of research, 598 triage notes were gathered from NC DETECT for two days of November 2006. Explicit temporal expressions were extracted and manually coded in accordance with the previously discussed modified version of the Zhou model. This became the gold standard for the creation of a system to automatically extract and tag TEs, TN-TIES (Irvine, 2008).

Triage Note – Temporal Information Extraction System (TN-TIES)

TN-TIES includes two major processing stages. The first is a rule-based partial parse (chunking) of the text followed by a chunk classifier. Chunks are meaningful phrase-like units of the original triage note texts. The TN-TIES chunker is written in Python and includes some of the modules of the Natural Language Toolkit (NLTK). It is largely based upon punctuation and typical lexical indicators of topical shifts and phrase boundaries within the triage note genre of text. The chunks that contain temporal information (like the gold standard (TEs)) include both an explicit time reference and the related event (symptom or incident): “MVC around 1500 today”; “she woke up with difficulty breathing today”.

We tested the TN-TIES chunker against 20% of the original (598) set of manually annotated TEs. The chunker identified eighty-nine (89%) of the gold standard chunks perfectly. That is, the output chunks contained both an explicit time reference and the related event. The error rate was quite low – the chunker split six percent (6%) of the gold standard chunks into 2 pieces, and 5% of the resulting chunks contained 2 temporal expressions.

The second stage of TN-TIES employs binary classifiers for each of the following temporal classes: ‘Relative Date and Time’, ‘Date and Time’, ‘Duration’, ‘Key Events’, and ‘Fuzzy Time.’ ’ The gold standard dataset was not large enough to construct classifiers for the ‘Other Event’ and the ‘Recurring Time’ classes. The five binary classifiers indicate whether or not a given chunk belongs in each of the temporal classes. In building the classifiers, we used a combination of automatic and manual approaches to feature selection. We extracted terms that appeared frequently in the target classes and expanded those lexical features to include abbreviations and morphological variations. For example, ‘yesterday’ appears frequently in chunks belonging to some of the target classes and, thus, was included in the feature list. Based on our experiences with the textual domain, we were aware that nurses also sometimes use ‘yest’ as an abbreviation for ‘yesterday.’ So, we included ‘yest’ as a positive indicator for the ‘yesterday’ feature. We also grouped some lexical items that signal similar references to time (e.g. the days of the week) into single features. Finally, the TN-TIES system uses regular expressions to match other important temporal features such as exact times, which are typically expressed in a 4-digit military format. The TN-TIES system employs Decision Tree and Naive Bayes classifiers informed by manually-annotated training data and the feature set. The Decision Tree classifier for the ‘Relative Date and Time’ temporal class outperforms the Naive Bayes, achieving 92% precision and 90% recall.

In order to create inference rules based on the language used in TNs, we need an understanding of the phrases and concepts most commonly employed. Our investigation of specific classes of TEs required a larger set of records than the original 598 set. A set of 2,598 records was drawn from the NC DETECT database for this purpose and then processed by TN-TIES to extract the RDTs. The records were drawn from 4 2-day periods (November 2006, February, April, and July 2007). As before, records include a timestamp, the chief complaint (CC) and triage note (TN). This gave us a total of 2,598 records: 799 from November, 600 from February, 597 from April, and 600 from July. We manually filtered the TN-TIES output to achieve 100% classification precision as our goal for this portion of the research project was an understanding of how people use TEs, not on TN-TIES. This enabled us to focus our analysis on the 1,373 RDTs extracted.

Yesterday, today, tomorrow (YTT)

Visits to the ED usually occur because of an event in the immediate past, such as the day before or the day of the ED visit. Using regular expressions and manual examination, the RDT TEs output by TN-TIES were split into the following YTT concepts:

yesterday

today

this morning

this afternoon

last night

tonight

Note that ‘tomorrow’, although part of the subclass name used in the Zhou model, was not included because events referred to in the records are overwhelmingly about the past, occasionally about the present but almost never about the future. Indeed, only 3 of the TEs contained a reference to tomorrow, usually referring to a scheduled procedure or appointment.

Some of the YTT concepts are associated with multiple forms of expression. For instance, today am was classified as part of the this morning YTT concept and this evening was classified as tonight. The goal in classifying TEs is to be able to apply the most specific rule possible; in the former example, ‘am’ gives greater specificity than ‘today” alone. For each YTT concept, the frequency per hour of these concepts was extracted for each set of days in November, February, April, and July.

Findings

  1. Top of page
  2. Abstract
  3. Introduction
  4. Background
  5. Methodology
  6. Findings
  7. Discussion
  8. Conclusion and Future Research
  9. Acknowledgements
  10. References

From the 2,598 CC/TNs, 1,373 RDT TEs were extracted. Of these, there were 1,004 in the YTT subcategory while the other three subcategories combined totaled only 369. Table 2 below gives the breakdown of RDTs for each month.

Table 2. Number of RDTs per month
 NovFebAprJulTotal
YTT2772382512381004
All other RDTs131907276369
Total4123313243161373

As mentioned earlier, records were selected from different times of the year in an attempt to capture any seasonal variation in TE usage. For example, since the sun rises much later in November than in July, one might anticipate this morning as conceptually beginning earlier in the day in July. However, there was no difference in frequency of use for different times of the year. Therefore, we only looked at and derived rules for the overall pattern of usage.

The most commonly used concept of the YTT subclass was today, followed by yesterday. Today accounted for 29% of the YTT expressions and 21% of the RDT expressions overall, while yesterday accounted for 23% and 17%, respectively. Interestingly, the concept of this morning was third (17% of the YTT expressions) but the corollary concept, this afternoon, was near the bottom, a mere 1% of the YTT concepts used. Seven point four percent (7.4%) of the YTT phrases included a time refinement. That is, they used the relative phrase plus an explicit time; e.g. “today 0730”.

Table 3. Frequency of usage for YTT expressions
YTT expressionTotal% YTT%RDT
Today28829%21%
Yesterday23423%17%
This morning16917%12%
Last night14214%10%
Tonight707%5%
This afternoon141%1%
Yesterday morning101%1%
    
With time refinements747.4%5.4%

Figure 2 below shows the frequency of use of YTTs throughout the day in 3-hour intervals. One interesting trend to notice is the usage of tonight versus last night. Tonight sees the most usage from 20:00 – 05:00 while last night comes into play around 05:00. This suggests that people consider the night's ending as the time of awakening or the start of daylight (what Zhou & Hripcsak (2007) refer to as “natural time”). This is also supported by the increase in use of this morning starting around 05:00. However, automating the timeline placement of YTTs will require translating natural time to what they refer to as “logical time”, fixing a reasonable default time as the boundary between night and morning. (Although a fuzziness factor could be added if it is deemed clinically important.)

Another interesting trend is the use of this morning in relation to today. As can be seen in Figure 2, this morning is used more frequently from 05:00 to 12:00 than today. This could suggest that people place a stronger emphasis on dividing the morning from the whole day. Similarly, the negligible usage of this afternoon (1%) suggests that either the distinction of ‘afternoon’ is less important than ‘morning’ or that the more of the day that has passed at the time of the visit, the less precise people are in labeling the portion of the day in which the event occurred. Or, it could merely reflect the pattern of use in general language. According to the American National Corpus (http://americannationalcorpus.org/index.html), this morning is used 5 times as frequently as this afternoon.

thumbnail image

Figure 2. Frequency of phrase usage per three hour interval

Download figure to PowerPoint

Discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Background
  5. Methodology
  6. Findings
  7. Discussion
  8. Conclusion and Future Research
  9. Acknowledgements
  10. References

In response to our research questions, we found that today and yesterday, which refer to day-sized temporal units, were the most frequently used YTT concepts. However, this morning and last night, which refer to parts of the day, were also commonly used. There were no differences in YTT usage at different times of the year. We did find differences in usage across times of day, which merit further discussion. The usage pattern of the various YTT concepts may provide some clues of how people mentally divide up their subjective time. This is not only fascinating in its own right, but it also helps in the development of inference rules. To return to our original example, based on what we have learned here, we could consider the following hypotheses. Interpret ‘last night’ to mean a whole day and a few hours ago because at 2 am, it is likely you consider tonight not to have ended yet. In general, if someone uses the term tonight, interpret the event as having occurred between the most recent 17:00 and 05:00. If the visit occurs at some point during that time frame, then tonight becomes bounded by the time of the visit (e.g., given a visit time of 21:00 and “fell down the stairs this evening” in a TN, interpret the fall as probably occurring between 17:00 and 21:00). If someone uses the term last night, instead of tonight, use the same time frame as tonight (17:00 to 05:00) but add 24 hours.

Interpretation rules for this morning could follow a similar logic. This morning is generally interpreted to mean an event occurred between the most recent 05:00 and 11:59, or the time of the visit, whichever comes first. Therefore, if a patient comes to the ED at 02:00, May 21 stating that their nose started bleeding that morning, we would interpret the start of the nose bleed as occurring on May 20, between 05:00 and 11:59 or between 10 and 21 hours prior to the ED visit.

Obviously, interpretation rules cannot be constructed based solely on frequency of use throughout the day, although our findings do suggest boundaries where shifts in interpretation may occur. We have obtained preliminary confirmation of our proposed interpretation rules from 3 domain experts (2 ED nurses and a member of the NC CETECT research team). Based on these initial sources of information, we are preparing materials for use in a larger validation study.

This research has studied only overall patterns of TE usage; we have not examined individual or even regional differences in how patients use temporal expressions, nor how triage nurses document and interpret what patients say during the triage interview. In the long term, interpretation rules must be validated for use in different regions. However, this research is not aimed at changing the triage interview process, or creating standards for how triage nurses should record the timing of events described by patients. Rather, the goal of this research is to work with triage note language “in the wild.”

One inherent limitation in working with triage notes revolves around the infallibility or imprecision of human memory. It is well known that human perception of time is fluid (which leads to imprecision on when an event occurred) and that often, people do not remember events correctly (Friedman 1990). For example, if a patient says that her pain started “1 week ago”, it could mean exactly 7 days ago, but could also be as few as 5 days or as many as 9. The “truth”, or precise time may not be learned at triage, and may never be known. According to the domain experts consulted in a preliminary study, vague knowledge of when an event occurred is often sufficient for treatment purposes and thus the nurse has no reason to probe for more specific information than that which the patient provides. We propose to deal with this, in part, by building in a fuzziness factor (Zhou, et al. 2006) so that we create a window of when an event likely began and ended.

Conclusion and Future Research

  1. Top of page
  2. Abstract
  3. Introduction
  4. Background
  5. Methodology
  6. Findings
  7. Discussion
  8. Conclusion and Future Research
  9. Acknowledgements
  10. References

The ability of a system to adequately represent the timeline of events leading to an ED visit will depend not only on an effective natural language processor but also on a domain-driven set of rules which can place the events in the correct sequence. Given that people most frequently use relative temporal expressions, it is important that we understand how people use them during the triage interview. We have shown that examining the frequency of use across times of day can provide one type of evidence.

In addition to the planned validation study, we are developing interpretation rules for other categories of TEs. The Key Event, another type of relative TE, is another challenging category. Instead of expressing the time of an event in relation to the triage interview, it places one event in relation to another, e.g., “fever started 2 days after surgery”. If we know when the surgery occurred, we can determine when the fever started. At the least, we can place them in the correct order, with surgery preceding fever.

Another concern revolves around the interdependency of the various events relevant to an ED visit. Take, for example, a triage note which states that the patient “fell and hurt my knee playing basketball last year and had knee surgery in October.” Our current scheme would create two separate events: “fell and hurt my knee playing basketball last year” and “had knee surgery in October.” The placement on the timeline of the fall would not take into account that it happened before the surgery. Unlike the previous example, which places the fever in direct relation with the surgery, it requires domain knowledge to understand that the fall most likely occurred prior to the surgery; it cannot be determined based merely on the text given. At this point, we are developing rules as if the events were independent of each other. We hope to address this in the future and incorporate any temporal constraints in the ordering of events. One possible approach is discussed at length in Zhou, et al. 2006.

A more immediate need is gaining an understanding of the usage patterns of the other subclasses of the RDT class as well as the other temporal classes. From this we can further develop interpretation rules which can ultimately be applied to TNs for the construction of ED visit timelines.

Acknowledgements

  1. Top of page
  2. Abstract
  3. Introduction
  4. Background
  5. Methodology
  6. Findings
  7. Discussion
  8. Conclusion and Future Research
  9. Acknowledgements
  10. References

We would like to thank the North Carolina Division of Public Health and NC DETECT for access to sample ED records. We would also like to thank the other members of our research team, Jacob Kramer-Duffield and Eliah Hecht, for their invaluable input.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Background
  5. Methodology
  6. Findings
  7. Discussion
  8. Conclusion and Future Research
  9. Acknowledgements
  10. References
  • Friedman, W. (1990). About Time : Inventing the Fourth Dimension. The MIT Press.
  • Haas, S. W., Irvine, A. I., & Sullivan T.C. (2008). Time as a function of genre: Use of temporal expressions in Emergency Department triage notes. In preparation.
  • Haas, S. W., Travers, D. A., Waller, A., & Kramer-Duffield, J. (2007). What is an event? Domain constraints for temporal analysis of chief complaints and triage notes. Poster session presented at the annual meeting of the American Society for Information Science and Technology, Milwaukee, WI.
  • Hripcsak, G., Zhou, L., Parsons, S., Das, A. K., & Johnson, S. B. (2005). Modeling electronic discharge summaries as a simple temporal constraint satisfaction problem. Journal of the American Medical Informatics Association, 12 (1), 5563.
  • Irvine, Ann. 2008. Natural Language Processing and Temporal Information Extraction in Emergency Department Triage Notes. Masters, University of North Carolina at Chapel Hill http://hdl.handle.net/1901/511.
  • Natural Language Toolkit (NLTK) (2007). http://nltk.sourceforge.net
  • NC DETECT. (2007). North Carolina Disease Event Tracking and Epidemiologic Collection Tool. http://www.ncdetect.org/FAQs.html (Accessed January 15, 2008).
  • Travers, D. A., & Haas, S. W. (2003). Using nurses' natural language entries to build a concept-oriented terminology for patients' chief complaints in the emergency department. Journal of Biomedical Informatics, 36 (4-5), 260270.
  • Zhou, L., & Hripcsak, G. (2007). Temporal reasoning with medical data∼A review with emphasis on medical natural language processing. Journal of Biomedical Informatics, 40 (2), 183202.
  • Zhou, L., Melton, G. B., Parsons, S., & Hripcsak, G. (2006). A temporal constraint structure for extracting temporal information from clinical narrative. Journal of Biomedical Informatics, 39 (4), 424439.