This study demonstrated that the AI algorithm accurately predicted the need for critical care in a prehospital EMS situation.
Predicting the need for critical care is important for selecting the destination ED and for providing the appropriate management during transfer [4, 5]. In addition, tools for accurately predicting the prognosis and treatment are important to communicate between the prehospital EMS technician and hospital medical staff [6, 7]. However, most triage tools in prehospital situations were developed for trauma patients only, and there is no generalized tool that covers all EMS situations [2, 25]. Although the conventional triage tools of EDs have been applied to predict the need for critical care at prehospital situations [21, 26, 27], they showed an unsatisfactory performance in predicting prognosis.
The important finding of this study is that the predictive performance of the AI model based on deep learning is superior to those of the conventional triage tools and scoring systems. In addition, three ED medical staff members were involved in deciding the level of triage with EMS run sheets. Interestingly, the accuracy of the AI algorithm was better than the accuracy of the decision of the expert medical staff. The AI algorithm performs automatic calculations based on basic information and does not require expert judgment and medical experience.
Deep learning can obtain a high performance without prior knowledge to train the model; thus, indicating that deep learning somehow automatically learns the feature relationship among input variables. In our previous study, we developed an AI algorithm based on deep learning for predicting the critical care of patients in an ED [11]. From the previous study, we found that conventional statistical methods such as logistic regression may have difficulty in determining the relationship among input variables [10, 28, 29]. As a large number of input variables were utilized, the dimensionality of the input increased. This somehow indicates that the process of feature extraction by humans should be required and effort should be made to determine the relationship between input variables.
Meanwhile, deep learning includes feature learning, which allows the model to automatically learn the relationships and characteristics between input variables required to perform a task [30]. As shown in our previous studies, deep learning could be used to understand the connection between features and outperformed conventional and other machine learning methods [9, 11, 31]. It is important to note that feature learning is not designed by humans in deep learning. As this process evolves automatically, it will be easier and more effective to identify intricate structures in high-dimensional data without information loss, and will result in end-to-end learning, which requires little engineering by humans. Finally, it can be easily and quickly applied to other tasks [30].
In addition, one of the well-known concepts in the use of deep learning is the importance of the amount of data. The accumulation of numerous data for decades advanced the performance improvement of deep learning. Likewise, the performance of the model based on deep learning depends greatly on the amount of data. In this study, we used the NEDIS data, which comprise millions of data. We believe that this amount of accumulated data would be more suitable for deep learning than other approaches. Moreover, we only used the initial vital signs for patients (assuming that it would be difficult to measure vital signs several times during transport). We considered the simple DNN model as more suitable than LSTM. Because the LSTM is based on sequential information.
The prevention of overfitting into a single hospital is an important issue. Further, it is crucial to verify whether the model was overfitted to a specific environment. Thus, the acquisition of external validation data is important. Wolpert explains this in the “No Free Lunch” theorem: If optimized in one situation, an algorithm cannot produce good results in other situations [32]. In this study, the development (ED) and test data (EMS) were exclusively divided. More specifically, the model was evaluated on the external dataset and could possibly avoid overfitting in one environment.
A major limitation of conventional triage tools is their low accuracy at the middle level [18, 33, 34]. As shown in Fig. 3, at level 3 triages, the population were mixed as critical care and non-critical care patients. If the patients at level 3 can be distinguished, we consider that the accuracy of predicting the outcome will increase. Therefore, we made effort to apply the algorithm for patients at level 3.
In this study, we developed the ensemble algorithm to evaluate level 3 patients and confirmed a higher performance with the AI+ESI algorithm. It is interesting to note that the combination of the expert opinion (ESI level) and the AI algorithm exhibits a more accurate performance. These results provide an opportunity to solve the problem for researchers in other medical fields. For example, in previous studies of urology, AI algorithms have been applied for the prediction of prostate biopsy results and the recurrence-free probability of bladder cancer [35]. In addition, the in-hospital and long-term mortalities of patients with cardiovascular diseases have been predicted using AI algorithms in several previous studies.
Our study has several limitations. First, deep learning is considered a black box. Although we can fit the AI algorithm based on deep learning, it is difficult to fully understand how the model predicts critical care. In addition, contrary to traditional methods, such as XGboost or CatBoost that can present uncertainty measures (e.g., 95% confidence interval), deep learning has greater difficulty in quantifying the uncertainty measures. In this paper, to describe the uncertainty information, we attempted to quantitatively measure the uncertainty as much as possible through bootstrapping [23]. In fact, recent attempts have been made to explain deep learning and measuring uncertainty, which will be our next area of study [36, 37]. Second, as this study was conducted in only two hospitals in Korea, it is necessary to validate the model for patients in EMSs in greater populations or other countries.
We developed a high-performance algorithm by combining an AI algorithm and a conventional triage tool. Despite several limitations, deep learning achieved a high predictive performance in several medical domains. Further, the deep learning algorithm can be developed more easily than a machine learning method. Based on our methodologies and results, other researchers can develop algorithms for their own groups of patients and situations. Additionally, medical researchers could investigate the applicability and future development of deep learning in various domains of medicine. For example, using this algorithm, the need for the critical care of patients could be predicted during EMS situations, and the destination hospital could be optimized by considering the predicted critical care needs and hospital’s situation (e.g., ED overcrowding, ICU capacity, and critical care availability). Moreover, the predictor variables in this algorithm were simple and could be used via a wearable device and information from a patient or their family. Because of this, patients with severe underlying diseases could be monitored daily while they are living at home regarding their needs for critical care, and could be referred to hospital earlier if they exhibit deterioration.