Use of Machine Learning to Rapidly Predict Positivity to Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-COV-2) Using Basic Clinical Data

Objective: Reverse Transcription-Polymerase Chain Reaction (RT-PCR) for Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-COV-2) diagnosis currently requires quite a long time span. A quicker and more ecient diagnostic tool in emergency departments could improve management during this global crisis. Our maingoal was assessing the accuracy of articial intelligence in forecasting the resultsof RT-PCR for SARS-COV-2, using basic information at hand in all emergencydepartments. Methods: This is a retrospective study carried out between February 22 and March 16 2020 in one of the main hospitals in Milan, Italy. We screened for eligibility all patients admitted with inuenza-like symptoms tested for SARS-COV-2.Patients under 12 years old, with no leukocyte formula performed in the ED,were excluded. Input data through articial intelligence were made up of a combination of clinical, radiological and routine laboratory data upon hospital admission. Results: Among 199 patients subject to study (median [interquartile range] age 65 [46-78] years; 127 [63.8%] men), 124 [62.3%] resulted positive to SARS-COV-2. The best Machine Learning System reached an accuracy of 91.4% with 94.1% sensitivity and 88.7% specicity. Conclusion: Our study suggests that properly trained articial intelligence algorithms may be able to predict correct results in RT-PCR for SARS-COV-2, using basic clinical data. If conrmed,on a larger-scale study, this approach could have important clinical and organizational implications. The results are the average of two testing experiments with training-testing A-B and B-A sequence. A hundred cases were presented in subset A and ninety-nine cases in subset B.


Introduction
A direct consequence of these limitations is the time spent by a large number of patients awaiting results in the emergency department before a decision can be taken as to where admit them to, e.g. in wards and intensive care units focused on COVID-19 patients, or in "non-infective" wards of the hospital (8)(9)(10)(11)(12). This is particularly troublesome for critically ill patients requiring immediate endotracheal intubation and mechanical ventilation. Specialized medical staff need to attend to these patients in emergency departments where, however, the health system is already under a lot of pressure.
Finding an easy and fast method of predicting positivity or negativity to SARS-COV-2, would prove to be of great clinical value. Algorithms have already been proposed using advanced imaging, e.g. chest computed tomography (CT) (13,14). However, not all hospitals or countries can carry outa CT scan on every patient.
The main goal of this testing was to pinpoint the accuracy of arti cial intelligence in predictingRT-PCR for SARS-COV-2 results using basic clinical, radiological and routine laboratory data at hand in all emergency departments.
Our theory was that using Arti cial Neural Networks (ANNs), and other Machine Learning Systems (MLS),would lead to obtaining accurate results on RT-PCR testing for SARS-COV-2 and that these systems couldpossibly be applied in the future. The performance of different ANNs and MLS was analyzed so as to distinguish between patients resulting positive or negative to SARS-COV-2, thus identifying those variables which express the maximum amount of relevant information.

Study design and selection of Participants
This retrospective, single centre study, was approved by the Institutional Review Board of our hospital (N°3733).Mandatory informed consent was waived.
All patients admitted to the emergency department of our hospital between February 22 and March 162020 , were screened foreligibility.
Symptoms of presentation compatible with COVID-19 (fever, sore throat, cough, dyspnoea, chest pain, headache, syncope, asthenia, arthralgia, diarrhoea, nausea and vomit) constituted the inclusion criteria. Age < 12 years and absence of evaluation of the leukocyte formula (de ned as percentages of the ve types of leukocytes: neutrophils, lymphocytes, eosinophils, basophils and monocytes) in the emergency department constituted exclusion criteria.

Data collection
Clinical data were taken from the Patient Data Management System of our hospital. Demographic data included age, gender, presence and type of comorbidities, medication currentlybeing taken. Each drug was placed in a speci c category.
In addition, information regarding vital signs upon admission to the emergency department, presence and type of ventilatory support, routinely performed blood tests, major electrocardiographic characteristics (presence of sinus rhythm and ST abnormalities) and chest X-rays (presence of any type of parenchymal involvement, presence of pleural effusion) were collected.
The nal results of the RT-PCR swab for SARS-COV-2 were recorded. Should the outcome have proved negative in a symptomatic patient, our hospital made it mandatory for a second swab to be carried out after 48 hours and these swabs were checked to con rm negativity. The complete list of collected variables is summed up in Table 1.

Statistical analysis and sample size
All data were tested for consistency with variance and normality of distribution using the Shapiro-Wilk test.
Normally distributed data were expressed as mean ± standard deviation, while non-normally distributed data were reported as median and interquartile range. Binary data were summed up in percentages, frequency of occurrence, and compared through Chi-Square test. Continuous variables were compared through Student T-Test or Wilcoxon Rank-Sum test, as appropriate.
Pearson's and Spearman's correlation was used to assess the correlation between collected variables (continuous and nominal, respectively) and RT-PCR for SARS-COV-2 results. A P-value lower than 0.05 was considered as statistically signi cant. Analysis was performed with SigmaPlot v.12.0 (Systat Software Inc., USA).
The study was carried out on a conveniencesample of 200 patients admitted to the emergency department and tested for SARS-COV-2.

Machine Learning Methods
In order to predict and estimate the results of the RT-PCR for SARS-COV-2 using an input of the 74 variables under study (Table 1), different Machine Learning algorithms available on WEKA data mining software (15)(16)(17) and on Semeion Research Centre depository (Massimo Buscema, Deep Supervised ANNs, Semeion SW #12, version 33.0, 1999-2019) were trained. These classi cation tools were applied to predict RT-PCR results using the Training and Testing validation protocol, with the following steps: 1. Subdivision of the dataset into two sub-samples, A and B, each containing 50% of records and having an equal proportion of cases and controls (in our case SARS-COV-2 positive and SARS-COV-2 negative). The two sub-samples were obtained through the application of the TWIST algorithm, (i.e. they were not obtained by random extraction), in order to create two subsamples with similar probability density for all the input variables (see below). A homogeneity check was performed to con rm the substantial equivalence of the two subsetswith regard to the distribution of variables. In the rst run, A was used as Training Set and B as Testing Set.
2. Application of ANN on the Training Set. In this phase, the ANN learned to associate the input variables with those indicated as targets.
3. After the training phase, the weights matrix produced by the algorithm was saved and frozen together with all parameters used for the training.
4. The Testing Set was then shown to a virgin twin (same architecture and base parameters) ANN with the same weights matrix of the trained ANN, acting as nal classi er. This operation took place for all records and the results (right or wrong classi cation) was not communicated to the classi er. This allowed to assess the generalization ability of trained ANN. 5. In a second run, another virgin ANN was applied to subset B which was used as training subset and then to subset A which was used as testing subset.
6. Therefore, the results are relevant to two sequences of training testing protocol: A-B and B-A.
Results were drawn up in terms of sensitivity (correct classi cation of positive patients), speci city (correct classi cation of negative patients), global accuracy (arithmetic mean between sensitivity and speci city). Overall results are expressed as average of the two experiments.
This crossover procedure allows to classify blindly all records with the trained algorithm ensuring the generalization capability of the model on records never seen before.
The machine learning algorithms developed at the University of Waikato, New Zealand available on the WEKA data mining software are listed in Table 2 (18)(19)(20)(21)(22)(23)(24), while two ANNs (Self Momentum Back Propagation and Sine Net) (25,26) were implemented in "Supervised ANNs Software", developed at the Semeion Research Center in Rome, Italy (Buscema; Supervised ANNs. Semeion software #12, version 33.0). Table 3 shows main features of Semeion Machine Learning Systems.  However, since noisy input attributes can sometimes hide the small meaningful information embedded in other attributes, a pruning procedure was used as a pre-processing tool to eliminate noisy variables before the outcome prediction of the main test. In order to conduct this procedure, an input selection algorithm named TWIST (Training With Input Selection and Testing) was applied (34,35

TWIST algorithm
As previously shown (36), the TWIST algorithm is a complex algorithm able to search for the best distribution of the global dataset divided in two optimally balanced subsets containing a minimum number of input features, useful for optimal pattern recognition. TWIST is an evolutionary algorithm based on a paper about Genetic Doping Systems (17), which has already been applied to medical data with very promising results (36)(37)(38)(39)(40).
A detailed description of the algorithm is available in the Online supplement.

5-K-fold
In addition, a 5 K-fold cross-validation protocol was applied (41),in order analyze data also with a standard and popular approach. The dataset was randomly split in 5 groups (folds) with a similar number of subjects.Each unique groupwas used as a hold out, or validation dataset, and the remaining groups were used astraining datasets.The model tted on the training set was evaluated on the validation set. Five different models for each employed machine learning machine systems were created, and each model provided an evaluation score. The skill of the 5 models was summarized as mean sensitivity, speci city, overall accuracy and balance accuracy.

Results
Three hundred forty-seven patients ful lled the inclusion criteria, 148 patients presented exclusion criteria (9 patients <12 years, 139 patients without leukocyte formula), leaving 199 patients for the analysis.
Population description and classic statistics Table 4 summarizes the main characteristics of the overall study population (n=199) and of the two subgroups, i.e. patients who tested positive (n=124) and negative (n=75) to SARS-COV-2. Table 4.Characteristics of the study population in three groups: overall population, positive for SARS-COV-2 and negative for SARS-COV-2.
Median age in the overall population was 65   The most common comorbidities were hypertension (42.7%) and diabetes (14.6%) with similar prevalence between patients with and without SARS-COV-2. A lower prevalence of Chronic obstructive pulmonary disease (COPD) was observed in patients with COVID-19 (4.0% vs. 13.3% p=.03).
Regarding current medications, positive patients had lower chronic use of anti-epileptics and drugs for psychiatric disorders, as compared to patients that tested negative (0.8% vs. 12.0%, p=.002 and 0.8% vs. 8.0%, p=.02, respectively). Furthermore, administration of antibiotics before access to the emergency department was higher in the positive subgroups (29.0% vs. 14.7%, p=.03).
Several clinical ndings turned out to be signi cantly different between the two population studies (Table 3). In particular, SARS-COV-2 positive patients had a slightly higher external body temperature (37.8 ± 0.8 vs. 37.2 ± 1.0, p=<.001), lower prevalence of pleural effusion at chest X-ray (5.7% vs. 17.3%, p=.01) and a signi cant difference regarding the complete blood count and leukocyte formula (Table 3).
Table5 sums up the positive and negative linear correlation index between descriptive variables and the results of RT-PCR for SARS-COV-2.
Main results: Prediction of the PCR Outcome with Machine Learning Algorithms The TWIST system selected 42 variables of the original attributes. Selected variables are marked with and asterisk in Table 5. A global dataset of 42 input and 2 target attributes was thus generated. Thereafter, two optimal subsets were created, in order to apply the training and testing procedure described above. Table 6 shows the results obtained by the application of an array of machine learning systems to the variables selected by TWIST system.These results are the average of two training-testing procedures (A-B and B-A sequences). Detailed predictive results for each experiment (A-B and B-A) is available in Table S1 in the Online Supplement. The best machine learning system reached ad accuracy of 91.4% with 94.1% sensitivity (correct prediction of positivity to SARS-COV-2) and 88.7% speci city (correct prediction of negativity to SARS-COV-2). Table 7shows the results obtained through applying a selected number of machine learning systems to the variables selected by the TWIST system. These results are the average of ve Training-Testing procedures of a K-fold cross-validation protocol. Detailed predictive results for each experiment are available in Table S2 in the Online Supplement. Table 7 Predictive results with 5-K fold protocol,using Semeion(*) and WEKA (**) machine learning systems.

Discussion
We studied 199 adult patients admitted to the emergency department of the largest hospital in Milan, Lombardy, with symptoms compatible with COVID-19 during the rst 3 weeks of SARS-COV-2 outbreak in Italy. In the present manuscript, we describe this population and highlight the differences between patients who actually tested positive to SARS-COV-2 and those who did not. Few attempts in applying arti cial intelligence torapidly predict positivity/negativity to SARS-COV-2 were made since the outbreak, using mostly CT imaging and lab results, collected in Chinese population (42)(43)(44). Nevertheless, we present the rst European attempt and promising results applying arti cial intelligence to rapidly predict positivity/negativity to SARS-COV-2 using only basic clinical data, available in the vast majority of emergency departments all over the world. The wide application of this decision support tool could have a major clinical and organizational impact during the current pandemic.
In our study, several differences were observed between the two study groups (Table 3). However, none of these, or a combination of them, allows, so far, to clearly differentiate between patients with COVID-19 and patients with other diseases, having a similar clinical presentation. Our data underline the key nding of Coronavirus-induced alterations in the white blood cell differential count (Table 3). On the one hand, in contrast to other reports (45)(46)(47)(48) we did not observe a marked lymphocytopenia, possibly because of the early stage of the viral disease. On the other, other subtypes such as eosinophils might play a key role in COVID-19 (49).
When applying arti cial intelligence to our dataset, in particular ANNs and MLS, we were able to predict with high sensitivity and speci city the results of RT-PCR (Table 6).
Arti cial Neural Networks allow forecasting through understanding of the relationship between variables, in particular through the application of nonlinear relationships (36,50,51). These systems initially learn from a set of data with a known solution (training). Thereafter, the networks, inspired by the analytical processes of the human brain, are able to reconstruct imprecise rules, which may be underlying a complex dataset (testing). Machine learning systems and, in particular, ANNs analyse real-world data very e ciently. The internal validity of their assessment is provided by uniquely strict validation protocols, seldom used in classical statistics (50,52,53).
In the present manuscript, it was possible to predict with reasonable accuracy the status of being positive or negative to SARS-COV-2 based on 42 simple variables. This was achieved using the TWIST algorithm, which does not have, at the moment, the same popularity of other techniques, such as K Fold, Boosting and others. Nevertheless, it has been used extensively in the past 15 years in different context (54)(55)(56). The reason of its low diffusion is partly that TWIST is very complex to program, as it includes two evolutionary algorithms that work together managing a huge population of ANNs, kNN and Naive Bayes algorithms. The execution of TWIST needs therefore to be programmed in C language to be su ciently fast. Thus, for its complexity and for needed running time, TWIST is not suitable for programming in Phyton, R or similar languages.
TWIST system allowed reaching a global accuracy of 91.4% with the best machine learning system: 94.1% sensitivity (correct prediction of positivity to SARS-COV-2) and 88.7% speci city (correct prediction of negativity to SARS-COV-2). Considering the eight best machine-learning systems their average performance was the following: sensitivity of 91.8%, speci city of 89.6% and global accuracy of 90.8%.
Comparing the two testing procedures (A-B and B-A), explained in the mathematical section, the differences in predicting values between these two experiment is small, therefore reasonably excluding over tting of the model (57).
In order to analyze our dataset also with more popular and widely applied procedure, we applieda 5 k-fold crossvalidationprotocol,using a selected number of machine learning systems (Table 7). With this type of analysis,the best machine learning system obtained an overall accuracy of 87.7% with a sensibility and speci city of 89.2% and 86.2%, respectively.Global average performance was the following: sensitivity of 87.6%, speci city of 75.4% and a global accuracy of 81.5%.
Comparing these results with those obtained by the same machine learning systems, using the AB -BA Train-Testing protocol (shown in Table 6),the latter allows to obtain a slightly better predictive results,reasonably related to the optimal splitting of the records, with an average performance of 89.1% speci city, 82.2% speci city and 85.7% global accuracy.The high variance of results obtained with the K Fold protocol and the low variance of the same results using TWIST protocol is suggestive of the high polarization affecting the K Fold protocol with this kind of data.
Thisis the reason why we have chosen to rely on an optimized distribution of records in training and testing subsets, rather than on a random allocation. Nevertheless, also the application of a standard K fold crossvalidation, i.e. a system widely available, was able to predict accurately the results of SARS-COV-2 RT-PCR.
It is useful to analyse variables selected by AI, as they certainly bear speci c clinical information. As mentioned above, the white blood cells and their differential count are certainly very informative. Other authors have applied AI for the diagnosis of SARS-COV-2. Rao et al. employed an AI framework to a mobile phone-based survey, exclusively based on pre-hospital clinical symptoms and demographic characteristics to assess the probability of SARS-COV-2 infection (58). Three differentresearch groups tried to predict positivity to SARS-COV-2 using, among other variables, CT scans (14,42,59). Chest CT scan was analysed via deep learning by Li et al. to differentiate SARS-COV-2 induced viral pneumonia from other lung disease (60). Two other research groups developed from machine learning models free online applications, using only lab test results (43,44).
Our model signi cantly differs from the abovementioned. First, it relies on basic clinical information, available in almost every emergency department. The required information is quickly obtainable for every patient at hospital admission. For this reason, we decided to include chest X-Ray rather than CT in our model. Indeed, despite CT being certainly more sensitive in identifying alterations typical of viral pneumonia (13), not every SARS-COV-2 suspect will have access to a CT scan. Second, our study is the rst one analysing data from an European country. While there is no evidence so far, it is possible that different ethnicities will show slightly different responses to viral invasion.
This study has certainly limitations. First, it is a retrospective, single-center study. Second, the sample size is limited. Third, some fundamental clinical data, such as arterial blood gas analysis were not available for all patients and this information was thus not included in the model. Given the typical profound hypoxemia of patients with COVID-19, it is conceivable that adding these variables to the system could further improve its accuracy. Finally, while no de nitive data are available regarding the accuracy of RT-PCR testing for SARS-COV-2 (61), several studies have described a certain percentage of false negative results (5-7). However, every negative result was re-tested after 48 hours. This methodological aspect should reduce the risk of falsely negative results.

Possible clinical and organizational implications
Facing a highly contagious viral outbreak requires a complex effort in terms of political, economic, social and health systems re-organization (62,63).
A fundamental aspect is to de ne a clear management protocol, in order to separate infected from non-infected patients, i.e., those admitted for other clinical conditions. Indeed, it is of paramount importance to set up clearly separate pathways, in order to avoid the spread of viral infections within the hospital. A quick and reliable system to identify SARS-COV-2 infected patients is therefore fundamental. Currently, the gold standard for the diagnosis is a RT-PCR assay searching for SARS-COV-2 genome (9). This type of molecular assay has certainly several limitations. During the rst month of outbreak in Italy, the processing of samples became more e cient, theoretically reducing the technical time needed for the result. Despite this, the problem of delayed diagnosis still exists. This is due to the availability RT-PCR machines, considering the high demand during the outbreak. Furthermore, laboratories of referral hospitals, such as ours, analyse samples also arriving from smaller hospitals not equipped for SARS-COV-2 testing. Finally, of course, RT-PCR is not a perfect test, and false negative result have been described, even in the presence of strong clinical suspicion for COVID-19 (5-7).
For these reasons, applying AI as a rapid decision support tool for the diagnosis of COVID-19 and therefore to speed up the sorting of infected from non-infected patients would be of great clinical help. Indeed, a simple online software, fed with basic clinical data, easily obtainable in almost every emergency department, could apply trained ANNs to predict with high accuracy the RT-PCR result. The results obtained from this software should of course be integrated with available clinical data.
The application of AI to clinical practice is still limited for its complexity and for limited in-hospital availability of technical infrastructures and support. This, of course, could be particular troublesome for small centres with limited resources.The decision support software that could integrate the information contained in the present manuscript could ideally retrieve data directly from the electronic patient management system. Otherwise, data could be manually entered in an online software, which however signi cantly increases the risk of errors. An active support from and collaboration with the local information technology infrastructure is therefore fundamental in order to be able, in the future, to integrate AI into clinical practice.
Finally, it is conceivable that the information obtained from the present study might be useful also at the end of the current pandemic. Indeed, it is likely that SARS-COV-2 might become a seasonal virus. In this regard, the early identi cation would be a key factor to reduce the risk of a further epidemic outbreak.
In summary, our study suggests that basic clinical data might be su cient for properly trained ANNs and MLS algorithms to predict with good accuracy the positivity and negativity to SARS-COV-2.If con rmed, this could have important clinical and organizational implications. Indeed, while not directly changing the treatment of COVID-19 patients, it could reduce the time patients spend unnecessarily in the emergency department, could reduce the workload of intensive care staff and, nally, reduce the risk of collapsing healthcare systems. This study was approved by the Institutional Review Board of our hospital (N°3733). Mandatory informed consent was waived, considering the complete data anonymization.

Consent for publication
Mandatory informed consent was waived.

Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests

Funding
No funding.

Authors' contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by TL, MF, RG, FV, HG, EG, AM, AZ, SB, MB. The rst draft of the manuscript was written by TL, MF, EG, FV, RG, AM and all authors commented on previous versions of the manuscript. All authors read and approved the nal manuscript.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. OnlineSupplement.docx