The number of calls to EMCC showed a significant spike centered on the March 14, the day before the closure of all non-essential public places in France, three days before lockdown. The main reasons were related to flu-like symptoms (cough and/or fever), followed 14 days later by a peak in ER admissions, in calls for chest pain and calls for stress and anxiety. Calls for road traffic crashes, malaises with loss of consciousness, non-voluntary injuries and alcohol intoxication fell by 59, 38, 24 and 23% respectively during lockdown.
Probably the most interesting finding of our study is the delay we observed between the rise in calls for flu-like symptoms (mainly cough and fever) and the rise in ER visits for suspected COVID-19. Thus, the curve began to rise 20 days before the increase in ER visits. By the 16th day after the start of this rise, the levels reached a level higher than any levels in the past 5 years. One could hypothesize that the peak of calls recorded around March 14 was due by the concern, if not anxiety, caused by the announcement on television of the closure of public places by the French President on that day. However, in a more affected part of the country, the Ile-de-France region, the peak was reached much earlier (10 days earlier) [13], suggesting that most of the calls we recorded were more motivated by symptoms than by concerns raised by communication by the authorities. Further, a spike in calls for stress and anxiety was measure 14 days later. EMCC call content is therefore probably the most predictive early indicator of the start of the epidemic, as recently shown by Riou and colleagues who found in the Ile-de-France region a strong correlation between calls regarding suspected COVID-19 and the number of patients in intensive care, with a delay of 23 days [14]. This is why this is considered for the monitoring of a potential relapses in the epidemic [15]. Finally, while the number of calls for flu-like symptoms proved to be an early and relevant signal, its intensity was probably increased by the request of the authorities not to go directly to the ER and to contact instead the EMCC.
An important difference between the work of Riou and colleagues and this study is that our process was clearly agnostic to the COVID-19 epidemic or to lockdown, as the automatic classification used models trained using reports from previous years. This results in a procedure that remains independent of the COVID-19 epidemic and lockdown, which would have influenced human codification. The signal thus obtained depends less on the context and is more likely to be an indicator of the actual public health situation. More generally, the added value of using an automatic classification procedure based on a natural language processing model is that it frees us from the context in which the reported events are coded. For this reason, we did not use the coded diagnoses at the time of the call to observe trends. In addition, these diagnoses were absent from one-third of the reports.
In the context of the COVID-19 epidemic, several research teams have used a similar approach, attempting to investigate the internet or social media to build early indicators of the epidemic [16, 17]. However, no such signal could be found from a Google keyword search [18], as the peak for cough, fever, coronavirus or COVID-19 was not reached until the week of 15–21 March.
Contrary to what was observed in Paris [19], no increase in calls related to cardiac arrest was observed in our study. This observation supports the hypothesis that the transient twofold increase in out-of-hospital cardiac arrests observed in Paris and its suburbs could be due to COVID-19 infections and to pandemic-related health system issues in heavily impacted regions. This was clearly not the case in the Gironde department where EMCC and intensive care units have never been overwhelmed.
A very significant decrease in calls for malaise with loss of consciousness, and to a lesser extent for strokes, was observed, starting one week before lockdown. This paralleled the sudden drop in ER visits that was observed in many countries that issued a statewide stay-at-home order [20], raising concerns that patients who needed medical care were not presenting to the hospitals and, for example, that stroke patients arrived too late to receive tissue plasminogen activator. The actual overall public health impact of this phenomenon will have to be carefully assessed when we have enough hindsight to appreciate its medium-term health consequences.
The decrease in calls associated with interpersonal violence and alcohol intoxication is less surprising as it is probably due to the reduction in social interactions. Interestingly, the figures returned to normal levels by the end of the lockdown period. Early on, concerns were raised about the risk of domestic violence as a result of lockdown [21]. This was not confirmed here by calls to EMCC. Although, unfortunately, not all domestic violence is reported to EMCC, this is an interesting result because most statistics used during lockdown to estimate the incidence of intimate partner violence were derived from Police reports and not all violent events reported to EMCC are reported to the police.
In order to produce results in a time frame compatible with the health emergency related to the recent lifting of lockdown measures, we used the samples from 2016 to 2018 for which a diagnosis was coded during the call by the medical assistant in charge of handling it. The ideal procedure would have been to perform a manual coding of this training sample, which was done for a sample of 39,907 reports from 2019, but retained in this work as a validation sample. Our previous work has shown that it takes about a thousand different examples to maximize the performance of the model [9]. This would have meant manually coding more than 100,000 notes, a task that was out of reach in a short period of time. The performances of the GPT-2 model measured with the manually coded validation samples were, however, very high and allowed us to derive a reason for call for all reports including the 22% of them with a missing value for the diagnosis.
Some limitations need to be acknowledged. First, although we have shown that an AI-based natural language model shows high performance in classifying free-text clinical reports, a small proportion of reports remains misclassified. Because our exercise here was to provide prevalence estimates, we adjusted the decision cutpoints so that precision equals recall. The bootstrap analysis showed that this was a very reliable strategy. Second, not all calls are handled by the EMCC, a proportion of them remains unanswered and this proportion increases during peak periods. It is therefore likely that around March 14 the number of attempted calls was higher than those handled. Finally, the study was done in Gironde, a department with a reportedly low rate of SARS-Cov-2 infection if compared to the Ile de France and the north-east regions of France. However, lockdown and fear of the epidemic affected all French people and the Gironde EMCC are the third largest in terms of the number of calls received in France, which has made it possible to build up a sufficiently large database.