Study design and setting
This was an observational study using an existing database in which ED patients were and are still prospectively collected as part of an ongoing quality improvement program in 3 Dutch EDs: Leiden University Medical Centre (LUMC; tertiary care centre with ~30,000 visits/year), Rijnstate Hospital (RH; urban care centre with ~30,000 visits/year), and the Albert Schweitzer Hospital (ASZ; urban care centre with ~25,000 visits/year). Patients were included from April 1st 2011 to February 1st 2016 in the LUMC, from March 1st 2012 to November 1st 2012 in the RH, and from September 1st 2015 to November 1st 2015 in the ASZ. After inclusion in the database, patients were stratified by age into an older (≥70 years) and younger (<70 years) group, as this is the cut-off which is also used in all Dutch government instated interventions for older people .
The study was approved by the medical ethics committee of the LUMC, who waived the need for individual informed consent as this was a pure observational study embedded in routine clinical care.
Selection of participants
All consecutive ED patients of 17 years and older with a suspected infection and Manchester triage category yellow, orange or red  who received intravenous antibiotics in the ED and were subsequently admitted to the hospital were included in the database. Triage categories blue and green were excluded in the quality improvement program because most of these patients were expected to be at very low risk for mortality or admission. Patients who appeared to have no infection according to the final hospital discharge letter were excluded.
In all participating hospitals, the same “Surviving Sepsis Campaign-based” quality improvement program was used, in which a standard screening procedure was followed to optimize sepsis recognition, early ED resuscitation and disposition to an appropriate level of care. The quality improvement program is illustrated in Additional file 1 and has been described in detail elsewhere [18, 19].
Demographic and co-morbidity data, relevant time points and dates, laboratory variables, triage categories and vital signs, time to antibiotics, type of antibiotics, amount and type of fluids (L), administered oxygen (L/min), disposition and outcome variables were prospectively registered in the digital hospital information system Chipsoft Ezis (Chipsoft, Amsterdam, Netherlands) of each participating hospital. A medical student or registrar in emergency medicine subsequently transferred data from the electronic hospital information system to a web-based data collection file (PromiseBasic, Leiden, Netherlands, https://www.msbi.nl/promise/promise.aspx), which automatically calculated the Predisposition, Infection, Response and Organ failure (PIRO) score and the Mortality in ED Sepsis (MEDS) score. After the inclusion period, data of the three participating hospitals were transferred to one SPSS file (SPSS version 23.0, IBM, New York, USA).
Time to antibiotics was measured by subtraction of registration time at the ED desk from the registered time of antibiotic administration by the nurse. Time is zero was taken as the time at ED registration. The appropriateness of the initial dose of antibiotics administered in the ED was assessed in retrospect and is summarized in Additional file 2.
By means of an automated query in the digital hospital information system all ED patients who had been admitted with intravenous antibiotics were selected. Of these ED patients, we retrospectively investigated how many had been triaged as non-urgent but had been admitted with intravenous antibiotics. In this way, we could quantify the number of patients who had been missed by the screening procedure of the quality improvement program (which excluded non-urgent triage categories) because of atypical symptom presentation, which was expected to occur more often in older patients.
Disease severity scores
MEDS and PIRO scores are both a combination of age, comorbidities (predisposition factors) and acute physiology variables [7, 9]. The quick Sequential Organ Failure Assessment (qSOFA) score is a newly developed score that screens for low blood pressure (SBP ≤ 100 mmHg), high respiratory rate (≥22 per min), and altered mental status (Glasgow coma scale < 15) [12, 13]. The Modified Early Warning score (MEWS) incorporates temperature and urine production into the more common variables heart rate, systolic blood pressure, respiratory rate and altered mental status . The National Early Warning Score (NEWS) does not use urine production, but instead incorporates arterial oxygen saturation and the use of supplemental oxygen . The five scores have been originally developed for slightly different purposes. The MEDS and PIRO scores have been developed to predict in-hospital mortality in ED patients with a suspected infection and qSOFA and MEWS and NEWS to predict sepsis or clinical deterioration. However, this does not complicate the comparison of the prognostic and discriminative performance of these scores between older and younger patients.
All disease severity scores were calculated retrospectively so the treating physicians were not aware of the score at the time of ED presentation. Missing values were counted as normal, similar as in the APACHE score . A patient was considered to have a “Do not resuscitate” (DNR) status if existing medical files already stated that the patient had a DNR code or when it was decided at the time of ED presentation or during hospital admission.
The primary outcome measure was in-hospital mortality.
Secondary outcome measures were ICU or MCU admission, an unanticipated transfer to an ICU or MCU within 48 h after being admitted to a ward , and the composite outcome of in-hospital mortality, ICU or MCU admission, or unanticipated transfer to an ICU or MCU within 48 h.
Data are displayed as percentages, means and standard deviation for normally distributed variables or as median with interquartile range for non-normally distributed variables. Independent T-tests were used to assess differences between groups when normally distributed and with Mann-Whitney-U test for non-normally distributed variables. Chi-square test was used for categorical variables.
Each disease severity score was divided into 4 categories to allow comparison among the 5 individual scores: low (PIRO 0–6, qSOFA 0, MEDS 0–5, MEWS 0–3 and NEWS 0–3), moderate (PIRO 7–12, qSOFA 1, MEDS 6–9, MEWS 4–6 and NEWS 4–7), high (PIRO 13–18, qSOFA 2, MEDS 10–15, MEWS 7–9 and NEWS 8–11) and severe (PIRO ≥19, qSOFA 3, MEDS ≥16, MEWS ≥10 and NEWS ≥12). These values were chosen taking into account the individual score guidelines to best represent comparable disease severity categories.
The prognostic performance of all disease severity scores in both age groups was assessed by associating the aforementioned disease severity categories with in-hospital mortality.
We assessed the discriminative performance of each disease severity score in younger and older patients using a receiver operator characteristic (ROC) curve with area under the curve (AUC) analysis and in-hospital mortality as outcome. We calculated the sensitivities, specificities, negative predictive values (NPV), and positive predictive values (PPV) using the optimal cut-off points of each ROC curve. This cut-off point was determined by the maximum sum of the sensitivity and specificity in the ROC curve. To appropriately evaluate the qSOFA score, the cut-off point as originally proposed by Seymour et al. (≥2) has also been included in the analysis .
The AUC, sensitivity, specificity, PPV and NPV were reported as mean (95%-confidence interval (CI)). We considered AUCs to be poor at 0.6 to 0.7, adequate at 0.7 to 0.8, good at 0.8 to 0.9, and excellent at 0.9 or higher . Differences in AUC were considered to be significant if the mean of older patients was not included in the 95%-CI of the younger patients.
All data were analyzed using SPSS software (SPSS 23.0, IBM, New York, USA).
Differences between the AUCs between older and younger patients could be caused by the age per se or by differences in disease severity because we expected disease severity in older patients to be worse compared to younger patients. To investigate whether age or disease severity was responsible for the AUCs in older patients, we did two sensitivity analyses: First, we excluded patients with acute onset organ failure  in the older patients and compared the AUCs of the five most common disease severity scores with the AUCs including all older patients. Secondly, we excluded older patients with a DNR status from the group with older patients and compared the AUCs in this selection with the AUCs of all older patients, because we have previously shown that a DNR status is another predictor of mortality and consequently a sign of higher disease severity .
In a third sensitivity analysis, we assessed the impact on the AUCs of inclusion of the ED patients with non-urgent triage categories.
Finally, we performed sensitivity analyses to assess the impact of missing variables (with multiple imputation), type of hospital (urban or academic) and time of inclusion (first or second half of inclusion period) on the AUCs of older and younger ED patients.