The most commonly used disease severity scores are inappropriate for risk stratification of older emergency department sepsis patients: an observational multi-centre study

Background Sepsis recognition in older emergency department (ED) patients is difficult due to atypical symptom presentation. We therefore investigated whether the prognostic and discriminative performance of the five most commonly used disease severity scores were appropriate for risk stratification of older ED sepsis patients (≥70 years) compared to a younger control group (<70 years). Methods This was an observational multi-centre study using an existing database in which ED patients who were hospitalized with a suspected infection were prospectively included. Patients were stratified by age < 70 and ≥70 years. We assessed the association with in-hospital mortality (primary outcome) and the area under the curve (AUC) with receiver operator characteristics of the Predisposition, Infection, Response, Organ dysfunction (PIRO), quick Sequential Organ Failure Assessment (qSOFA), Mortality in ED Sepsis (MEDS), and the Modified and National Early Warning (MEWS and NEWS) scores. Results In-hospital mortality was 9.5% ((95%-CI); 7.4–11.5) in the 783 included older patients, and 4.6% (3.6–5.7) in the 1497 included younger patients. In contrast to younger patients, disease severity scores in older patients associated poorly with mortality. The AUCs of all disease severity scores were poor and ranged from 0.56 to 0.64 in older patients, significantly lower than the good AUC range from 0.72 to 0.86 in younger patients. The MEDS had the best AUC (0.64 (0.57–0.71)) in older patients. In older and younger patients, the newly proposed qSOFA score (Sepsis 3.0) had a lower AUC than the PIRO score (sepsis 2.0). Conclusion The prognostic and discriminative performance of the five most commonly used disease severity scores was poor and less useful for risk stratification of older ED sepsis patients. Electronic supplementary material The online version of this article (10.1186/s13049-017-0436-3) contains supplementary material, which is available to authorized users.


Background
Sepsis recognition in older emergency department (ED) patients is difficult due to atypical symptom presentation. Delayed recognition of sepsis can result in delayed or even absent initiation of adequate treatment which has been shown to increase mortality and health-care costs [1][2][3][4]. Several disease severity scores have been specifically developed for ED patients who are hospitalized with a suspected infection and are supposed to help in sepsis recognition and risk stratification [5][6][7][8][9][10][11][12][13]. It is unclear however whether these disease severity scores are appropriate for risk stratification of older patients, i.e. have a high enough discriminative performance to identify high and low risk older patients which is needed for adequate disposition to a ward or intensive care unit (ICU). For example, it has been shown that in older patients vital signs detect cardiac arrest less accurately compared with non-elderly patients, which has important implications for how they are used for identifying critically ill patients. Churpek et al. suggested that more accurate methods for risk stratification of older patients are necessary with regard to the early detection cardiac arrest [14]. It is therefore possible that the currently available disease severity scores which are used for risk stratification of ED patients with a suspected infection may also be inappropriate for older patients due to the often absent classical symptoms such as fever, tachycardia and hypoxemia. Because these symptoms are an integral part of all the regularly used disease severity scores, their sensitivity will decrease in older patients.
If the prognostic and discriminative performance of the currently available disease severity scores are inappropriate for risk stratification of older ED patients at risk for sepsis, we need to develop a risk stratification tool specifically for older patients. This is especially important for the ED because delayed sepsis recognition could result in an inappropriate disposition to a ward rather than an ICU, increasing hospital length of stay and mortality [15]. In addition, in older patients in whom an ICU admission is not deemed desirable anymore, adequate initial fluid resuscitation should preferably take place in a monitored place like the ED because treatment monitoring is more difficult on a normal ward.
The aim of this study was therefore to assess if the prognostic and discriminative performance of the five most commonly used disease severity scores were appropriate for risk stratification of older ED sepsis patients (≥70 years) compared to a younger control group (<70 years).

Study design and setting
This was an observational study using an existing database in which ED patients were and are still prospectively collected as part of an ongoing quality improvement program in 3 Dutch EDs: Leiden University Medical Centre (LUMC; tertiary care centre with~30,000 visits/year), Rijnstate Hospital (RH; urban care centre with~30,000 visits/year), and the Albert Schweitzer Hospital (ASZ; urban care centre with~25,000 visits/year). Patients were included from April 1st 2011 to February 1st 2016 in the LUMC, from March 1st 2012 to November 1st 2012 in the RH, and from September 1st 2015 to November 1st 2015 in the ASZ. After inclusion in the database, patients were stratified by age into an older (≥70 years) and younger (<70 years) group, as this is the cut-off which is also used in all Dutch government instated interventions for older people [16].
The study was approved by the medical ethics committee of the LUMC, who waived the need for individual informed consent as this was a pure observational study embedded in routine clinical care.

Selection of participants
All consecutive ED patients of 17 years and older with a suspected infection and Manchester triage category yellow, orange or red [17] who received intravenous antibiotics in the ED and were subsequently admitted to the hospital were included in the database. Triage categories blue and green were excluded in the quality improvement program because most of these patients were expected to be at very low risk for mortality or admission. Patients who appeared to have no infection according to the final hospital discharge letter were excluded.

Data collection
In all participating hospitals, the same "Surviving Sepsis Campaign-based" quality improvement program was used, in which a standard screening procedure was followed to optimize sepsis recognition, early ED resuscitation and disposition to an appropriate level of care. The quality improvement program is illustrated in Additional file 1 and has been described in detail elsewhere [18,19].
Demographic and co-morbidity data, relevant time points and dates, laboratory variables, triage categories and vital signs, time to antibiotics, type of antibiotics, amount and type of fluids (L), administered oxygen (L/ min), disposition and outcome variables were prospectively registered in the digital hospital information system Chipsoft Ezis (Chipsoft, Amsterdam, Netherlands) of each participating hospital. A medical student or registrar in emergency medicine subsequently transferred data from the electronic hospital information system to a web-based data collection file (PromiseBasic, Leiden, Netherlands, https://www.msbi.nl/promise/promise.aspx), which automatically calculated the Predisposition, Infection, Response and Organ failure (PIRO) score and the Mortality in ED Sepsis (MEDS) score. After the inclusion period, data of the three participating hospitals were transferred to one SPSS file (SPSS version 23.0, IBM, New York, USA).
Time to antibiotics was measured by subtraction of registration time at the ED desk from the registered time of antibiotic administration by the nurse. Time is zero was taken as the time at ED registration. The appropriateness of the initial dose of antibiotics administered in the ED was assessed in retrospect and is summarized in Additional file 2.
By means of an automated query in the digital hospital information system all ED patients who had been admitted with intravenous antibiotics were selected. Of these ED patients, we retrospectively investigated how many had been triaged as non-urgent but had been admitted with intravenous antibiotics. In this way, we could quantify the number of patients who had been missed by the screening procedure of the quality improvement program (which excluded non-urgent triage categories) because of atypical symptom presentation, which was expected to occur more often in older patients.

Disease severity scores
MEDS and PIRO scores are both a combination of age, comorbidities (predisposition factors) and acute physiology variables [7,9]. The quick Sequential Organ Failure Assessment (qSOFA) score is a newly developed score that screens for low blood pressure (SBP ≤ 100 mmHg), high respiratory rate (≥22 per min), and altered mental status (Glasgow coma scale < 15) [12,13]. The Modified Early Warning score (MEWS) incorporates temperature and urine production into the more common variables heart rate, systolic blood pressure, respiratory rate and altered mental status [10]. The National Early Warning Score (NEWS) does not use urine production, but instead incorporates arterial oxygen saturation and the use of supplemental oxygen [11]. The five scores have been originally developed for slightly different purposes. The MEDS and PIRO scores have been developed to predict in-hospital mortality in ED patients with a suspected infection and qSOFA and MEWS and NEWS to predict sepsis or clinical deterioration. However, this does not complicate the comparison of the prognostic and discriminative performance of these scores between older and younger patients.
All disease severity scores were calculated retrospectively so the treating physicians were not aware of the score at the time of ED presentation. Missing values were counted as normal, similar as in the APACHE score [20]. A patient was considered to have a "Do not resuscitate" (DNR) status if existing medical files already stated that the patient had a DNR code or when it was decided at the time of ED presentation or during hospital admission.

Outcome measures
The primary outcome measure was in-hospital mortality.
Secondary outcome measures were ICU or MCU admission, an unanticipated transfer to an ICU or MCU within 48 h after being admitted to a ward [19], and the composite outcome of in-hospital mortality, ICU or MCU admission, or unanticipated transfer to an ICU or MCU within 48 h.

Data analysis Descriptives
Data are displayed as percentages, means and standard deviation for normally distributed variables or as median with interquartile range for non-normally distributed variables. Independent T-tests were used to assess differences between groups when normally distributed and with Mann-Whitney-U test for non-normally distributed variables. Chi-square test was used for categorical variables.

Main analysis
The prognostic performance of all disease severity scores in both age groups was assessed by associating the aforementioned disease severity categories with in-hospital mortality.
We assessed the discriminative performance of each disease severity score in younger and older patients using a receiver operator characteristic (ROC) curve with area under the curve (AUC) analysis and inhospital mortality as outcome. We calculated the sensitivities, specificities, negative predictive values (NPV), and positive predictive values (PPV) using the optimal cut-off points of each ROC curve. This cut-off point was determined by the maximum sum of the sensitivity and specificity in the ROC curve. To appropriately evaluate the qSOFA score, the cut-off point as originally proposed by Seymour et al. (≥2) has also been included in the analysis [13].
The AUC, sensitivity, specificity, PPV and NPV were reported as mean (95%-confidence interval (CI)). We considered AUCs to be poor at 0.6 to 0.7, adequate at 0.7 to 0.8, good at 0.8 to 0.9, and excellent at 0.9 or higher [21]. Differences in AUC were considered to be significant if the mean of older patients was not included in the 95%-CI of the younger patients.
All data were analyzed using SPSS software (SPSS 23.0, IBM, New York, USA).

Sensitivity analyses
Differences between the AUCs between older and younger patients could be caused by the age per se or by differences in disease severity because we expected disease severity in older patients to be worse compared to younger patients. To investigate whether age or disease severity was responsible for the AUCs in older patients, we did two sensitivity analyses: First, we excluded patients with acute onset organ failure [22] in the older patients and compared the AUCs of the five most common disease severity scores with the AUCs including all older patients. Secondly, we excluded older patients with a DNR status from the group with older patients and compared the AUCs in this selection with the AUCs of all older patients, because we have previously shown that a DNR status is another predictor of mortality and consequently a sign of higher disease severity [18].
In a third sensitivity analysis, we assessed the impact on the AUCs of inclusion of the ED patients with nonurgent triage categories.
Finally, we performed sensitivity analyses to assess the impact of missing variables (with multiple imputation), type of hospital (urban or academic) and time of inclusion (first or second half of inclusion period) on the AUCs of older and younger ED patients. Figure 1 shows patient flow and inclusion criteria. A total of 2280 patients met the inclusion criteria and were enrolled in this study. Baseline characteristics are shown in Table 1. We included 1497 younger patients and 783 older patients. As expected, older patients had more comorbidities than younger patients and more often a DNR status. In line with the atypical symptom presentation in older patients, compared to younger patients, blood pressure was higher in older patients, while heart rate was lower in older patients. There were more patients with altered mental status in older patients. Table 2 shows the individual disease severity scores upon ED arrival. On average, older patients accumulate more points than younger patients. This trend persists not only in scores that include age and predisposition factors, i.e. PIRO and MEDS, but also in scores that consists only of acute physiology variables, i.e. qSOFA, MEWS and NEWS. Our primary outcome in-hospital mortality was significantly higher in older patients compared to younger patients, 9.5% (7.4-11.6) versus 4.6% (3.5-5.7), respectively, as shown in Table 3.

Results
We assessed the prognostic performance of each disease severity score by calculating the frequency of inhospital mortality for each disease severity category, as shown in Fig. 2. Figure 2a shows that in the total cohort, there is a gradual increase of in-hospital mortality with an increase of disease severity category. This association is more pronounced in younger patients, resulting in a stronger increase in mortality with increasing disease   severity compared to the total cohort. The association between disease severity and in-hospital mortality is almost absent in older patients.
In addition to the prognostic performance of the disease severity scores, we also assessed the discriminative performance of each disease severity score. Figure 3a shows the ROC curves of the total cohort, revealing a large variation among the AUCs of the five scores. In the total cohort, the MEDS performed best with an AUC of 0.80 (0.76-0.83), whereas the MEWS performed worst with an AUC of 0.63 (0.58-0.67). The AUCs of all disease severity scores were larger in younger patients, as shown in Fig. 3b. In younger patients, the MEDS also performed best with an AUC of 0.86 (0.82-0.91), whereas the MEWS performed worst with an AUC of 0.66 (0.60-0.73). Corresponding to the younger patients, MEDS performed best in older patients although the AUC of 0.64 (0.57-0.71) was significantly lower than the AUC in younger patients. In older patients the MEWS also performed worst, with an AUC of 0.56 (0.49-0.63). Table 4 shows the sensitivity, specificity, PPV and NPV using the optimal cutoff values of the ROC curves. Both the optimal qSOFA cutoff point that we found (≥1) and the cutoff value according to Seymour  The sensitivity analysis in Additional file 4 shows that exclusion of older patients with acute organ dysfunction from the older group decreases the AUCs of all disease severity scores from 0.56-0.64 to 0.43-0.59. Excluding patients with a DNR status did not affect the discriminative performance. Finally, 91 ED  NEWS, median (IQR) 6 (4-9) 6 (3-8) 8 (5-10) a = According to Dellinger [22]. Missing values are shown in parentheses for every variable IQR interquartile range, qSOFA quick sepsis-related organ dysfunction assessment score, PIRO predisposition, infection, response, and organ dysfunction score, MEDS mortality in emergency department sepsis score, MEWS modified early warning score, NEWS national early warning score patients had non-urgent triage categories but had been admitted with intravenous antibiotics. Twenty-nine (32%) of the 91 patients were older 70 years. In-hospital mortality of the older patients was 45%, higher than the 8% (P < 0.001) of the younger patients who had been admitted with a non-urgent triage category. Inclusion of the 91 ED patients with non-urgent triage categories did not affect the AUCs of Fig. 3. In Additional file 5, the sensitivity analyses to assess the impact of missing variables, type of hospital and time of inclusion showed that the AUCs of older patients were structurally lower than in younger patients.  In summary, all scores performed structurally worse in older compared to younger patients. Scores which take comorbidities into consideration (i.e. MEDS and PIRO) consistently outperform disease severity scores that are solely based on acute physiology variables (i.e. qSOFA, NEWS and MEWS).

Discussion
The main conclusion of the present study is that the most commonly used disease severity scores are less useful for risk stratification of older ED sepsis patients.
To the best of our knowledge, this is the first study directly comparing the prognostic and discriminative performance of the most commonly used disease severity scores in older and younger ED sepsis patients. In contrast to the findings of our study in ED sepsis patients, previous studies investigating the prognostic and discriminative performance of the MEWS in all older hospitalized patients concluded that the score is appropriate for risk stratification of in-hospital mortality. Cei et al. found a gradual increase of mortality in all disease severity categories, and Dundar et al. found an AUC of 0.89 [23,24]. The discrepancy with our results is probably explained by different inclusion criteria resulting in different study populations: First, in these studies, patients older than 65 years of age were included. Secondly, in these studies not only older sepsis patients were included but older patients with all diagnoses. Most importantly, Dundar et al. included also patients who were discharged home after their ED presentation leading to a larger range of disease severity directly explaining the higher AUC in this study. Churpek et al. compared the MEWS in all older and younger patients who were admitted to a ward and concluded that it was inaccurate for prediction of the risk for cardiac arrest in older patients [14]. Studies in older trauma patients have shown that physiological parameters of older patients should be interpreted differently than those of younger patients. In these studies it was hypothesized that for example a systolic blood pressure of 90 mmHg is inappropriate for tissue perfusion of older people, therefore leading to acute organ failure [25][26][27]. We hypothesize that the same may be true for older sepsis patients, explaining the worse performance of all disease severity scores in older compared to younger patients tested in the present study. We used a cut-off of 70 years, as this  Optimal cut-off points are shown in parentheses for every disease severity score. Outcomes are shown as mean with 95% confidence interval Abbreviations: qSOFA quick sepsis-related organ dysfunction assessment score, PIRO predisposition, infection, response, and organ dysfunction score, MEDS mortality in emergency department sepsis score, MEWS modified early warning score, NEWS national early warning score, AUC Area under the curve, PPV Positive predictive value, NPV Negative predictive value is the cut-off which is also used in all Dutch government instated interventions for older people and 70 years is the official threshold to define older patients in the Netherland and is also the age above which vital signs start to deviate from younger patients. Nevertheless, in the future it may be necessary to develop scores for more than two age groups, comparable to the different reference values for vital signs in the paediatric patient population. These hypotheses are confirmed by Smith et al. who showed that there are different target values of vital signs for each age group of hospitalized ED patients [28]. Our study has another important finding. Because of the lack of a golden standard in sepsis research, there is a sepsis definition conference approximately every 15 years, aiming to improve our understanding of sepsis and better define it. The PIRO score was the newly proposed classification in the 2001 sepsis definition conference, and had a better prognostic and discriminative performance than the systemic inflammatory response syndrome (SIRS) definition proposed by Roger Bone in 1991, which it replaced [5][6][7][8]. The advantage of PIRO over the unidimensional SIRS is that PIRO distinguishes the unmodifiable predisposition and infection variables from the acute physiology variables, enabling better decision making in clinical practice. Recent studies showed that qSOFA outperforms SIRS in patients with a suspected infection outside the ICU, but it is unclear whether it is better than the PIRO score [29][30][31][32]. Our study shows that qSOFA has a worse prognostic and discriminative performance than MEDS and PIRO in older as well as in younger patients who were hospitalized with a suspected infection, possibly because PIRO and MEDS take age and comorbidities into consideration, whereas the qSOFA (as well as the NEWS and MEWS) are solely based upon acute physiology parameters.
Interestingly, we found that the rather old MEDS score still has the best prognostic and discriminative performance compared to all the newly developed scores. Instead of continuously developing new disease severity scores we suggest that it is more important to test the various scores in a randomized controlled trial to assess clinical acceptance and applicability, and impact on relevant clinical outcomes. After all, disease severity scores should aid in decision making in clinical practise and improve outcome of individual patients. This would also be in line with the GRADE guidelines and methodology [33].
The findings of the present study have several implications. First, recognition and risk stratification of older ED sepsis patients cannot be guided by the currently existing disease severity scores, as is indicated by the low AUCs in older ED patients. This is reinforced by the observation that all the disease severity scores only associate with in-hospital in the highest disease severity category, whereas in this range clinical judgement is often sufficient [8]. Secondly, a new prediction model specifically for older patients should be developed, similar to the separate scoring systems used in paediatric patients. Finally, we believe it would be premature to replace the PIRO classification by qSOFA.
Although our study has its strengths such as the large unselected sample size, incorporation of both tertiary and urban care centres, and inclusion of comorbidities to assess the performance of MEDS and PIRO, there are several limitations.
First, an observational study is subject to errors of documentation and data entry although we believe the prospective screening and inclusion of all consecutive patients in our database which is still done daily in the participating hospitals, minimizes this issue. Secondly, because older patients had a higher disease severity with more acute organ failure, it is possible that the treating physician was triggered to treat older patients more aggressively than younger patients leading to lower AUCs in younger patients, as have been argued previously [8]. However, both groups received similar treatment in terms of time to antibiotics, fluid resuscitation and supplemental oxygen. More importantly, the sensitivity analyses excluding the sicker patients from the older group showed comparable AUCs with the unselected older group, strongly suggesting that the lower discriminative performance in older patients is explained by age and not by the observation that older presented to the ED more ill. Secondly, our database included altered mental status as a binominal variable which could have led to difficulties calculating the MEWS score, which follows the AVPU-system. In a sample of 200 patients with altered mental status we determined that the grand majority had a Glasgow Coma Scale equal or higher than 14, thus the assumption to assign one point for altered mental status in the MEWS score seems realistic. In our study, urine production was not registered in our database. We therefore had to approach this variable by categorizing the serum creatinine levels in correspondence with the Dellinger definitions of renal organ failure (64-104 = 0, 105-141 = 1, 142-177 = 2, ≥178 = 3) [22]. We assumed that this sufficiently approached the urine production values. Nevertheless small deviations from the original MEWS may be present.