Emergency Department Triage Scales and Their Components: A Systematic Review of the Scientific Evidence

Emergency department (ED) triage is used to identify patients' level of urgency and treat them based on their triage level. The global advancement of triage scales in the past two decades has generated considerable research on the validity and reliability of these scales. This systematic review aims to investigate the scientific evidence for published ED triage scales. The following questions are addressed: 1. Does assessment of individual vital signs or chief complaints affect mortality during the hospital stay or within 30 days after arrival at the ED? 2. What is the level of agreement between clinicians' triage decisions compared to each other or to a gold standard for each scale (reliability)? 3. How valid is each triage scale in predicting hospitalization and hospital mortality? A systematic search of the international literature published from 1966 through March 31, 2009 explored the British Nursing Index, Business Source Premier, CINAHL, Cochrane Library, EMBASE, and PubMed. Inclusion was limited to controlled studies of adult patients (≥15 years) visiting EDs for somatic reasons. Outcome variables were death in ED or hospital and need for hospitalization (validity). Methodological quality and clinical relevance of each study were rated as high, medium, or low. The results from the studies that met the inclusion criteria and quality standards were synthesized applying the internationally developed GRADE system. Each conclusion was then assessed as having strong, moderately strong, limited, or insufficient scientific evidence. If studies were not available, this was also noted. We found ED triage scales to be supported, at best, by limited and often insufficient evidence. The ability of the individual vital signs included in the different scales to predict outcome is seldom, if at all, studied in the ED setting. The scientific evidence to assess interrater agreement (reliability) was limited for one triage scale and insufficient or lacking for all other scales. Two of the scales yielded limited scientific evidence, and one scale yielded insufficient evidence, on which to assess the risk of early death or hospitalization in patients assigned to the two lowest triage levels on a 5-level scale (validity).


Introduction
Triage is a central task in an emergency department (ED). In this context, triage is viewed as the rating of patients' clinical urgency [1]. Rating is necessary to identify the order in which patients should be given care in an ED when demand is high. Triage is not needed if there is no queue for care. Triage scales aim to optimize the waiting time of patients according to the severity of their medical condition, in order to treat as fast as necessary the most intense symptom(s) and to reduce the negative impact on the prognosis of a prolonged delay before treatment. ED triage is a relatively modern phenomenon, introduced in the 1950s in the United States [2]. Triage is a complex decision-making process, and several triage scales have been designed as decisionsupport systems [3] to guide the triage nurse to a correct decision. Triage decisions may be based on both the patients' vital signs (respiratory rate, oxygen saturation in blood, heart rate, blood pressure, level of consciousness, and body temperature) and their chief complaints. Internationally, no consensus has been reached on the functions that should be measured. Apart from emergency care, triage may be used in other clinical activities, e.g. deciding on a certain investigation [4] or treatment [5].
Since the early 1990s, several countries have developed and introduced ED triage [6][7][8][9][10]. Development of triage scales in some countries has been influenced largely by the seminal work of FitzGerald [11], resulting in most of the triage scales developed in the 1990s and 2000s being designed as 5-level scales. Of these, the Australian Triage Scale (ATS), Canadian Emergency Department Triage and Acuity Scale (CTAS), Manchester Triage Scale (MTS), and Emergency Severity Index (ESI) have had the greatest influence on modern ED triage [12][13][14][15]. Other scales have not disseminated as widely around the globe, e.g. the Soterion Rapid Triage Scale (SRTS) from the United States and the 4-level Taiwan Triage System (TTS) [6,7,9,16,17]. Some countries, e.g. Australia, have a national mandatory triage scale while many European countries lack such standards [7,9].
Patients may have a life-threatening condition, but show normal vital signs. Hence, in triaging the patient it is important to consider information given by patients or accompanying persons regarding the patient's chief complaints or medical history, which can provide essential information about serious diseases. The chief complaints describe the incident or symptoms that caused the patient to seek care.
In 2005, a joint task force of the American College of Emergency Physicians and the Emergency Nurses Association published a review of the literature on ED triage scales. Based on expert consensus and available evidence, the task force supported adoption of a reliable 5level triage scale, stating that either the CTAS or the ESI are good choices for ED triage [18]. In 2002, a national survey conducted in Sweden identified the use of 37 different triage scales across the country. Further, some 30 EDs did not use any type of triage scale [19].
This systematic review aims to investigate the scientific evidence underlying published ED triage scales.

Objectives
The following questions are addressed: Exclusion criteria for studies on reliability of triage scales • Studies on interrater reproducibility are excluded in cases where any rater in the study had access to retrospective data only.
Six experts from different professions and clinical specialties reviewed the studies, independently in groups of 2 or 3, for quality by using methods validated for internal validity, precision, and applicability (external validity) [20]. The methodological quality and clinical relevance of each study was graded as high, medium, or low. Results from the studies that met the inclusion criteria and quality standards were synthesized by applying the internationally developed GRADE system [21].
In accordance with GRADE, the following factors were considered in appraising the overall strength of the evidence: study quality, concordance/consistency, transferability/relevance, precision of data, risk of publication bias, effect size, and dose-response. In synthesizing the data, studies having low quality and relevance were included when studies of medium quality and relevance were not available. Based on the overall quality and relevance of the studies reviewed, each conclusion was rated as having strong, moderately strong, limited, or insufficient scientific evidence. If studies were not available, this was noted [21].

Vital signs and chief complaints
Most of the studies that investigated associations between different vital signs or chief complaints and mortality after ED arrival were observational cohort studies based on selected, diagnosis-specific, patient groups.
All of the studies were found to have medium quality and relevance. Only a few studies included all patients (albeit limited to "medical" patients") that arrived at the ED, regardless of diagnosis. Hence, studies of patients classified as surgical disciplines were generally lacking. Several studies described compiled scales or indexes for appraising the severity level of the patient's conditions, but provided no information on the importance of specific vital signs or chief complaints. Hence, little or no evidence can be found on the association between specific vital signs or reasons for the ED visit and mortality in the group of general patients presenting in EDs.

Respiratory rate
Only a single study, which described the predictive importance of respiratory rate, fulfilled the inclusion criteria [22]. The study aimed to assess whether the Rapid Acute Physiology Score (RAPS) could be used to predict mortality in nonsurgical patients on ED arrival. It also aimed to study whether an advanced version of RAPS, i. e. the Rapid Emergency Medicine Score (REMS), could yield better predictive information [22].
RAPS was developed for prehospital care and involves assessing respiratory rate, pulse, blood pressure, and the Glasgow Coma Scale (GCS). REMS is based on RAPS,

Articles included in systematic review 4
Abstracts identified through database seaching 4 185 Abstracts excluded by relevance 4 096 Articles studied in full text 89 Articles identified through other sources 10 Articles excluded by relevance, study design and non-sufficient eligibility 95 Low quality 1

High quality 0
Medium quality 3  but also assesses oxygen saturation, body temperature, and age. In total, 11 751 patients were studied prospectively after arrival at the ED of a university hospital in Sweden. Respiratory rate was found to be a significant predictor of mortality during the hospital stay. A decrease of one step on the RAPS scale was found to nearly double the risk of mortality within 30 days (Table 1).

Oxygen saturation in blood
Two studies used RAPS and REMS to predict acute mortality after ED arrival and specifically studied the predictive importance of saturation [22,23]. Oxygen saturation was found to be one of the three variables, along with age and level of consciousness, that best predicted mortality during hospitalization.

Pulse
One study investigated the importance of assessing pulse in the ED as a means to predict mortality during the hospital stay. The study, which was conducted in Sweden [22], showed a significant association between the pulse on arrival to the ED and mortality during the hospital stay in a group of 11 751 patients receiving care for nonsurgical disorders. With a decrease of one step on the RAPS scale, 67% of the patients showed an increased risk of mortality within 30 days.

Level of consciousness
The Swedish study (described above) also investigated the association between acute mortality and the level of consciousness on arrival at the ED [22]. Another study used the same methods mentioned above, i.e. RAPS and REMS [23], to analyze 5583 patients that had called the emergency phone number and were classified as urgent. The study showed that level of consciousness was one of three variables (age and saturation being the other two) that best predicted mortality during the hospital stay. Another study analyzed 986 stroke patients on ED arrival. Impaired level of consciousness appeared to be the best predictor of mortality during the hospital stay [24].

Blood pressure and body temperature
The importance of blood pressure or body temperature in assessing the risk of acute mortality after ED arrival could not be supported by the included studies due to the lack of scientific evidence.

Articles included in systematic review 20
Abstracts identified through database seaching 2 776 Abstracts excluded by relevance 2 608 Articles studied in full text 168 Articles identified through other sources 1 Articles excluded by relevance, study design and non-sufficient eligibility 149

Low quality 11
High quality 0 Medium quality 9 Figure 2 Results of literature search and selection process regarding reliability (10 articles), and validity (10 articles) of triage scales. One article studied both reliability and validity and was rated differently due to the studied endpoint, low quality regarding reliability and medium quality regarding validity.

Chief complaints
Studies describing the association between different chief complaints and acute mortality were found to be lacking.

Age
Three of the studies described above showed that the higher the patient's age, the greater the risk of death within 30 days of hospital care following ED arrival [22][23][24]. The results showed an increase in mortality of 5% per year. Furthermore, one study showed that older patients (above 75 years of age) with symptoms of coronary heart disease had a greater risk of death within 30 days after arrival at the ED compared to younger patients with the same symptoms [25] (Table 1). Based on the studies described above, Table 2 summarizes assessments and comments regarding the level of scientific evidence.

Interrater agreement of triage scales (reliability)
All 11 articles that were found to answer the question concerning reliability of triage scales and met the defined inclusion criteria were observational studies. They addressed reliability of the ATS [26], CTAS (including eTriage) [19,[27][28][29][30], MTS [31], SRTS [6], and two locally produced scales without names [8,32] (Table 3). Based on the quality review, 9 articles [6,8,19,[26][27][28][29][30][31] were found to be of low and 1 [32] of medium quality. One article was excluded due to deficient quality resulting from high internal dropout [16]. Deficient external validity was the major reason for the low-and medium-quality ratings of the studies. Selection of patients and triage nurses were both found to be irrelevant or insufficiently described. Hence, 10 articles remained as a basis for the conclusions.
The scientific evidence was found to be insufficient to assess the reliability of ATS, CTAS, MTS, SRTS and the   Swiss scale (Table 4). However, limited scientific evidence was found in assessing the reproducibility of the Brillman scale (North America) as having moderate interrater agreement.

Validity of triage scales regarding acute mortality and hospital admission rates Mortality
None of the studies reported on hospital admission rates adjusted for age and gender or mortality (Table 5). Since previous studies have shown that age is one of the major predictors of hospital mortality [33,34] the scientific evidence was found to be insufficient to asses the validity of the triage scales ATS, CTAS, and Medical Emergency Triage and Treatment System (METTS) ( Table 6). However, safety as measured by hospital mortality in patients graded as low risk (triage levels 4-5/green-blue) by the triage systems may be regarded as one aspect of validity. When assessing the above-mentioned triage scales' level of validity as regards mortality at the lowest triage levels only (levels 4-5/green-blue), the quality and relevance of the studies were found to be moderate. Hence, scientific evidence is limited.

Hospital admission rates in patients triaged as non-acute
Nine studies reported on admission rates for the ESI, ATS, and SRTS triage scales ( Table 7). The studies showed a range between 0.0% and 17.0% at level 5, the lowest triage level [6,16,[35][36][37][38][39][40][41]. A range was also observed in the age panorama (mean ages between 30 and 47 years) and in hospital admission rates at triage level 4 (3%-33%): 18% to 33% for ATS, 6% to 10% for ESI, and 3% for SRTS. Seven of these studies were found to be of moderate and two of low quality and relevance, and the scientific evidence for validity of admission rates for patients in the lowest triage levels (levels 4-5/green-blue) was found to be limited (Table 8).

Discussion
Our systematic review shows that when adjudicated by standard criteria for study quality and scientific evidence, the triage scales used in EDs are supported, at best, by limited evidence. Often, the evidence is weaker, not above insufficient by the GRADE criteria. The ability of the individual vital signs included in the different scales to predict outcome has seldom, or never, been studied in the ED setting. The scientific evidence for assessing interrater agreement (reproducibility) was limited for one triage scale (Brillman) whereas it was insufficient or lacking for all other scales. Two of the scales (CTAS and ATS) offered limited scientific evidence, and the scientific evidence for one scale (METTS) was insufficient to assess the risk of early death or hospitalization in patients assigned to the two lowest triage levels in 5level scales; the studies showed the risk of death to be low, but a need for inpatient care was not excluded (about 5% hospital admission rate on average). Studies on validity of the triage scales across all levels, i.e. their ability to distinguish the urgency in patients assigned the five different levels, were generally of low quality. Consequently, evidence was insufficient to assess the validity of the scales.
As none of the studies reported on mortality rates adjusted for differences in age and gender between the triage levels, we could not evaluate the validity of the triage scales across all triage levels as regards the risk of early death. To estimate the safety of the scales, we studied early death among patients assigned to the lowest triage levels (green and blue/4-5). Two triage scales (ATS and CTAS) offered limited scientific evidence for assessing safety. In both scales, the patients assigned to the two lowest triage levels had a very low risk of dying within 24 hours after triage. Hence, in this respect, the scales are safe to use. Scientific evidence for METTS, the newly developed Swedish triage scale, was found to be insufficient to assess safety. Since the study recorded the risk of dying during the in-hospital stay, mortality was higher than in the studies on ATS and CTAS. In using the need of hospitalization as a measure of safety, the situation was found to be more complex. Again, none of the studies reported on hospital admission rates adjusted for age and gender, so we could not evaluate the validity of the triage scales across all triage levels. However, on average, about 5% (in some studies up to 17%) of patients in the lowest (4-5/green-blue) triage levels in ATS, ESI, and SRTS were reported to be admitted as inpatients. The variations were wide not only between different triage scales, but also between studies using the same scales. This indicates differences between the studies in (a) patient populations in the ED, (b) access to hospital beds, (c) hospital admission policies and traditions, and/or (d) inaccurate triage decisions (i.e. patients were rated as less urgent than their actual urgency).
No definitive conclusions could be drawn regarding which of the scales was the safest as measured by the need of hospitalization. Hence, we suggest that none of  [35,36] 0.03%-0.1% Limited ⊕⊕○○

8695
(1 study) [10] 0.5% Insufficient ⊕○○○ Reduction for study quality (-1) All the studies are observational  the scales be used in referral of patients in the lowest triage levels (4-5/green-blue), e.g. to primary care, without further medical examination in the ED. New diagnostic tests typically need to meet rigid criteria before they can be accepted for widespread use. These criteria include documentation on precision. For non-laboratory tests, interrater agreement (reliability) is a key precision issue. Our review shows that most triage scales present insufficient scientific evidence for assessing interrater agreement. The study designs used to estimate interrater agreement have often been suboptimal. Most of the studies are based on fictitious cases rather than on authentic patients in real-life settings. The value of the studies as regards interrater agreement is also compromised by the fact that the mean age of patients assessed has either been low (as low as 30 years) or unreported. The generalizability to real-life ED patients must therefore be questioned.
All 5-level triage scales present insufficient evidence on interrater variability. The few studies that have been published (most of low quality) have reported widely divergent interrater agreement, with kappa values ranging from 0.2 (slight agreement) to 0.9 (almost perfect). Only a single study [32] presented limited scientific evidence. This was a 4-grade scale reporting a kappa value of 0.45, a value usually considered to be in the moderate agreement range [42]. It is evident that inter-observer agreement in triage scales must be documented in greater detail, and, if low, actions must be taken to reduce variability.
The literature shows variations in the vital signs and chief complaints applied in triage scales. It is unclear whether the selected vital signs are the best at distinguishing different risk groups. Further, evidence supporting the selected thresholds for continuous variables is deficient. The inclusion criteria for this systematic literature review place considerable emphasis on relevance. Triage scales are intended to be used in EDs irrespective of specific symptoms or disease. Hence, only studies of unselected patient populations in ED settings were included, greatly limiting the number of studies on the ability of individual vital signs to predict outcome. Our literature search revealed that many more studies had been performed in intensive care units, or soon after hospital admission.
Regarding specific vital signs, limited scientific evidence supports the use of oxygen saturation and consciousness level as predictors of mortality early after triage. However, scientific evidence was found to be insufficient as regards respiration and pulse, blood pressure, and body temperature. Hence, it remains unclear whether the selected vital signs are the best ones to use in distinguishing different risk groups. Moderate scientific evidence indicated age as a predictor of mortality early after triage, yet most triage scales do not take age into account.
MTS and eCTAS include the chief complaint leading to the ED visit, but we did not find any studies that analyzed which of the chief complaints are important predictors of mortality early after triage. It appears likely that in the construction of triage scales, much of the information was deduced from studies performed in settings other than EDs.

Strengths and limitations
The strength of this review of the scientific literature on triage in the ED lies in its systematic approach. Our search for relevant literature has been meticulous; the quality of the included studies has been evaluated in a uniform manner; and the level of evidence has been summarized using the GRADE methodology developed under the auspices of the World Health Organization [21].
Our review is limited to ED triage in adult patients in somatic care. However, EDs are only part of a continuum of services for acutely ill and injured patients. Studies are also needed in other aspects along the continuum of care, e.g. prehospital, psychiatric, and pediatric triage. Other limitations are ascribed to the volume and quality of the scientific literature available. Since all studies were observational, none of the evidence came from randomized controlled trials, the "gold standard" for evaluating new methods. As none of the studies met the standards for high quality, we included studies of low and moderate quality in our review in accordance with the creed in evidence based medicine to use the best available scientific evidence. Low study quality affected the GRADE rating and was a reason why scientific evidence was rated as insufficient or limited for so many aspects of so many scales.

Conclusions
This systematic literature review reveals shortcomings in the scientific evidence on which presently available triage scales are based. Stronger scientific evidence is needed to determine which of the vital signs and chief complaints have the greatest prognostic value in triage.
Interrater agreement (reliability), validity, and safety of triage scales need to be investigated further, and headto-head comparisons are needed to determine whether any of the scales have advantages over others.

Limitations
This review was confined to ED triage scales for adult ED patients with non-psychiatric illnesses or injuries. In the absence of an internationally agreed outcome measure for ED triage scale validity, the proxy variables hospital admission and mortality were used in the current study. These proxy variables have limitations with regards to ED triage scale validity as the variables may be affected by events occurring after the triage assessment. Further, comparison between ED triage scales need to be done with caution as there may be contextual differences influencing the result.