Artificial intelligence algorithm to predict the need for critical care in prehospital emergency medical services

Background In emergency medical services (EMSs), accurately predicting the severity of a patient’s medical condition is important for the early identification of those who are vulnerable and at high-risk. In this study, we developed and validated an artificial intelligence (AI) algorithm based on deep learning to predict the need for critical care during EMS. Methods We conducted a retrospective observation cohort study. The algorithm was established using development data from the Korean national emergency department information system, which were collected during visits in real time from 151 emergency departments (EDs). We validated the algorithm using EMS run sheets from two EDs. The study subjects comprised adult patients who visited EDs. The endpoint was critical care, and we used age, sex, chief complaint, symptom onset to arrival time, trauma, and initial vital signs as the predicted variables. Results The number of patients in the development data was 8,981,181, and the validation data comprised 2604 EMS run sheets from two hospitals. The area under the receiver operating characteristic curve of the algorithm to predict the critical care was 0.867 (95% confidence interval, [0.864–0.871]). This result outperformed the Emergency Severity Index (0.839 [0.831–0.846]), Korean Triage and Acuity System (0.824 [0.815–0.832]), National Early Warning Score (0.741 [0.734–0.748]), and Modified Early Warning Score (0.696 [0.691–0.699]). Conclusions The AI algorithm accurately predicted the need for the critical care of patients using information during EMS and outperformed the conventional triage tools and early warning scores.


Introduction
An important objective of emergency medical services (EMSs) is to provide appropriate prehospital management and transfer to the relevant emergency department (ED) based on a patient's status [1]. Several prognosis prediction tools have been developed for EMS but are limited to specific situations, such as trauma [2]. Although some efforts have been made to apply existing ED triage tools and early warning scores to EMSs, these tools have so far performed unsatisfactorily [3].
In EMS, accurately predicting the need for critical care is important for the early identification of the vulnerability and high-risk of patients, and for deciding the most appropriate management during transfer [4]. If the patient is expected to require critical care, the EMS technician must pass through the nearest low-level ED to a high-level ED [5]. Accurate tools for predicting prognosis are important for communication between the prehospital EMS technician and hospital medical staff to provide online medical directions and prepare inhospital management [6,7].
The goal of this study was to develop and validate an artificial intelligence (AI) algorithm based on deep learning to predict the need for critical care of patients in EMSs accurately. Deep learning could overcome the limitations of conventional statistical methods and has recently achieved state-of-the-art performance in several domains, including medical imaging and outcome prediction [8][9][10]. To the best of our knowledge, this study is the first to predict severity in EMS using an AI algorithm.

Study design and setting
This was a multicenter retrospective cohort study, not a blind study. Furthermore, the study was entirely separated between development and external validation data. To establish the AI algorithm, we used the Korean national emergency department information system (NEDIS), which collects all patient visits in real time from 151 EDs in Korea. To externally validate our model, EMS run sheets from patients who visited two EDs were used. Specifically, the EMS run sheets contain information on when patients were contacted by an EMS. The run sheets were saved as electronic medical records. The sample size of the validation dataset was determined using an accurate algorithm in a previous study [11].
The data comprised age, sex, chief complaint, time from symptom onset to visit (or EMS contact), trauma, initial vital signs (systolic blood pressure, diastolic blood pressure, heart rate, respiratory rate, and body temperature), and mental status; these data were used as the predictor variables. The endpoint of this study was critical care (admission to intensive care unit). For the stabilized training, the input variables were normalized with a z-score.
The institutional review boards of Sejong General Hospital (2019-0212) and Mediplex Sejong Hospital (2019-049) approved this study protocol and waived the need for informed consent because of the impracticality and minimal harm involved.

Selection of participants
The study participants were adult patients (aged ≥18 years) who visited EDs. From the development data (NEDIS), we selected adult patients who visited EDs between January 2014 and December 2016. Moreover, we selected patients who visited two EDs using EMSs between September 2018 and February 2019 as the test data. We excluded subjects who were declared dead on arrival and those for whom data were missing, as shown in Fig. 1.

Development of AI algorithm based on deep learning
To establish our algorithm, development data (NEDIS) were utilized. To classify the presence of critical care needs, we used feedforward networks (5 hidden layers, 89 nodes, and batch normalization [12,13]), which train the output using the softmax classifier. We applied a dropout rate of 0.5 at each layer for regularization and a rectified linear unit was used for the activation function. The Adam optimizer was used to improve the efficiency of optimization, while the cross-entropy loss function was used to minimize the prediction loss based on a supervised learning. In addition, we used TensorFlow (the Google Brain Team, Mountain View, United States) as the backend [14]. The calibration plot and Brier score are described in a supplemental figure.

Performance test of AI algorithm and comparison with conventional methods
We compared the performance of the algorithm in terms of predicting critical care with those of the Emergency Severity Index (ESI), Korean Triage and Acuity System (KTAS), Modified Early Warning Score (MEWS), and National Early Warning Score (NEWS). The ESI is a globally used five-level ED triage algorithm, initially developed in 1999 [15,16]. It is based on the severity of patients' healthcare problems and the number of resources that is anticipated to require. KTAS was developed in 2012 based on the Canadian Triage and Acuity Scale and has been used nationwide as a triage since 2016 in Korea [11,17,18]. KTAS is a five-level ED triage algorithm that considers symptoms, pain, and physiological values. Three medical staff members with more than 5 years of experience in clinical practice in an ED participated in this study. They decided the ESI and KTAS levels with information from the EMS run sheets for patients in the test data. Conflicting results were decided by discussion.
MEWS is a widely used tool for predicting severity and deterioration, and is calculated using systolic blood pressure, heart rate, respiratory rate, body temperature, and mental status [19]. NEWS was developed in the United Kingdom. It is a popular aggregated scoring system that considers respiratory rate, oxygen saturation, temperature, systolic blood pressure, heart rate, and mental status [20]. The MEWS and NEWS scores have been well-validated and used globally. In previous studies, some efforts have been made to apply these early warning scores to EMSs [21]. We calculated the MEWS and NEWS scores based on information from the EMS run sheets. The EMS run sheets comprise data at the time of first contact of EMSs with each patient.
We validated the developed algorithm using exclusively divided test data. The performance measures were taken as the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score. The AUC is a frequently used metric and shows the sensitivity against 1-specificity [22]. Based on previous studies, we used levels 1-2, levels 1-2, points 3-14, and points 5-20 to predict the critical care with the ESI, KTAS, MEWS, and NEWS, respectively [11,[17][18][19][20][21]. When evaluating the continuous score predicted by the AI algorithm, we fixed the sensitivity as 0.8. Furthermore, we evaluated the 95% confidence interval using bootstrapping (10,000 times resampling with replacement) [23]. We used the ROCR package in R (R Development Core Team, Vienna, Austria) for these analyses.

Combining the AI algorithm and conventional triage tools
With the aim of developing a high-performance algorithm, we combined the AI algorithm with conventional triage tools. This method is called ensemble [24]. A major limitation of the ESI and KTAS, as reported by previous studies, is the decreasing accuracy attributed to patients at mid-level, such as level 3. We applied the AI algorithm for patients at level 3 for each ESI and KTAS, and validated the performance of the two ensemble models (AI+ESI and AI+KTAS). For this, patients at levels 1 and 2 were predicted to be critical while patients at levels 4 and 5 were predicted to be noncritical. The AI algorithm only evaluated the patients at level 3.

Results
In the development data, in total, 9,304,887 ED visits to 151 hospitals were included in the NEDIS. We excluded 323,696 visits because of the exclusion criteria: 44,815 were declared dead on arrival, while data were missing for 278,881 visits. No significant differences in the predictor variables were observed between the included and excluded study subjects due to the missing variables. Thus, the study subjects included 8,981,181 ED patients; 511,342 ended up in critical care (5.7%) and 125,219 (1.4%) died in hospital.
In the case of the test data, after excluding 124 patients (14 dead on arrival and 110 missing data), validation of the AI algorithm in EMSs was performed using 2604 patients from two hospitals, whose endpoints were 319 in critical care (12.3%) and 30 of whom died in hospital (1.2%). The baseline characteristics of the development and test data are shown in Table 1. These two data were exclusively divided, and their characteristics are significantly different.
As shown in Fig. 2

Discussion
This study demonstrated that the AI algorithm accurately predicted the need for critical care in a prehospital EMS situation.
Predicting the need for critical care is important for selecting the destination ED and for providing the appropriate management during transfer [4,5]. In addition, tools for accurately predicting the prognosis and treatment are important to communicate between the prehospital EMS technician and hospital medical staff [6,7]. However, most triage tools in prehospital situations were developed for trauma patients only, and there is no generalized tool that covers all EMS situations [2,25]. Although the conventional triage tools of EDs have been applied to predict the need for critical care at prehospital situations [21,26,27], they showed an unsatisfactory performance in predicting prognosis.
The important finding of this study is that the predictive performance of the AI model based on deep learning is superior to those of the conventional triage tools and scoring systems. In addition, three ED medical staff members were involved in deciding the level of triage with EMS run sheets. Interestingly, the accuracy of the AI algorithm was better than the accuracy of the decision of the expert medical staff. The AI algorithm performs automatic calculations based on basic information and does not require expert judgment and medical experience.
Deep learning can obtain a high performance without prior knowledge to train the model; thus, indicating that deep learning somehow automatically learns the feature relationship among input variables. In our previous study, we developed an AI algorithm based on deep learning for predicting the critical care of patients in an ED [11]. From the previous study, we found that conventional statistical methods such as logistic regression may have difficulty in determining the relationship among input variables [10,28,29]. As a large number of input variables were utilized, the dimensionality of the input increased. This somehow indicates that the process of feature extraction by humans should be required and effort should be made to determine the relationship between input variables. Fig. 2 Receiver operating characteristics curve for predicting critical care Legends: *AI: artificial intelligence; AUC: area under the receiver operating characteristics curve; CI: confidence interval; ESI: Emergency Severity Index, KTAS: Korean triage and acuity system; MEWS: modified early warning score; NEWS: national early warning score; NPV: negative predictive value; PPV: positive predictive value. †The alternative hypothesis for this p-value was that there is a difference between the artificial intelligence algorithm and the other predictive methods. ‡The alternative hypothesis for this p-value was that there is a difference between the ensemble model, combining artificial intelligence and the ESI, and the other predictive methods Meanwhile, deep learning includes feature learning, which allows the model to automatically learn the relationships and characteristics between input variables required to perform a task [30]. As shown in our previous studies, deep learning could be used to understand the connection between features and outperformed conventional and other machine learning methods [9,11,31]. It is important to note that feature learning is not designed by humans in deep learning. As this process evolves automatically, it will be easier and more effective to identify intricate structures in high-dimensional data without information loss, and will result in end-to-end learning, which requires little engineering by humans. Finally, it can be easily and quickly applied to other tasks [30].
In addition, one of the well-known concepts in the use of deep learning is the importance of the amount of data. The accumulation of numerous data for decades advanced the performance improvement of deep learning. Likewise, the performance of the model based on deep learning depends greatly on the amount of data. In this study, we used the NEDIS data, which comprise millions of data. We believe that this amount of accumulated data would be more suitable for deep learning than other approaches. Moreover, we only used the initial vital signs for patients (assuming that it would be difficult to measure vital signs several times during transport). We considered the simple DNN model as more suitable than LSTM. Because the LSTM is based on sequential information.
The prevention of overfitting into a single hospital is an important issue. Further, it is crucial to verify whether the model was overfitted to a specific environment. Thus, the acquisition of external validation data is important. Wolpert explains this in the "No Free Lunch" theorem: If optimized in one situation, an algorithm cannot produce good results in other situations [32]. In this study, the development (ED) and test data (EMS) were exclusively divided. More specifically, the model was evaluated on the external dataset and could possibly avoid overfitting in one environment.
A major limitation of conventional triage tools is their low accuracy at the middle level [18,33,34]. As shown in Fig. 3, at level 3 triages, the population were mixed as critical care and non-critical care patients. If the patients at level 3 can be distinguished, we consider that the accuracy of predicting the outcome will increase. Therefore, we made effort to apply the algorithm for patients at level 3.
In this study, we developed the ensemble algorithm to evaluate level 3 patients and confirmed a higher performance with the AI+ESI algorithm. It is interesting to note that the combination of the expert opinion (ESI level) and the AI algorithm exhibits a more accurate performance. These results provide an opportunity to solve the problem for researchers in other medical fields. For example, in previous studies of urology, AI algorithms have been applied for the prediction of prostate biopsy results and the recurrence-free probability of bladder cancer [35]. In addition, the in-hospital and long-term mortalities of patients with cardiovascular diseases have been predicted using AI algorithms in several previous studies. Our study has several limitations. First, deep learning is considered a black box. Although we can fit the AI algorithm based on deep learning, it is difficult to fully understand how the model predicts critical care. In addition, contrary to traditional methods, such as XGboost or Cat-Boost that can present uncertainty measures (e.g., 95% confidence interval), deep learning has greater difficulty in quantifying the uncertainty measures. In this paper, to describe the uncertainty information, we attempted to quantitatively measure the uncertainty as much as possible through bootstrapping [23]. In fact, recent attempts have been made to explain deep learning and measuring uncertainty, which will be our next area of study [36,37]. Second, as this study was conducted in only two hospitals in Korea, it is necessary to validate the model for patients in EMSs in greater populations or other countries.
We developed a high-performance algorithm by combining an AI algorithm and a conventional triage tool. Despite several limitations, deep learning achieved a high predictive performance in several medical domains. Further, the deep learning algorithm can be developed more easily than a machine learning method. Based on our methodologies and results, other researchers can develop algorithms for their own groups of patients and situations. Additionally, medical researchers could investigate the applicability and future development of deep learning in various domains of medicine. For example, using this algorithm, the need for the critical care of patients could be predicted during EMS situations, and the destination hospital could be optimized by considering the predicted critical care needs and hospital's situation (e.g., ED overcrowding, ICU capacity, and critical care availability). Moreover, the predictor variables in this algorithm were simple and could be used via a wearable device and information from a patient or their family. Because of this, patients with severe underlying diseases could be monitored daily while they are living at home regarding their needs for critical care, and could be referred to hospital earlier if they exhibit deterioration.

Conclusion
In this study, a triage using an AI algorithm accurately predicted the need for critical care of patients using information during EMS situations, and outperformed the conventional triage tools and early warning scores. The results showed the potential of AI for EMSs, which will be a useful and fast tool to identify vulnerable patients and help precise decision-making in daily practice.