Skip to main content

The skåne emergency medicine (SEM) cohort



In the European Union alone, more than 100 million people present to the emergency department (ED) each year, and this has increased steadily year-on-year by 2–3%. Better patient management decisions have the potential to reduce ED crowding, the number of diagnostic tests, the use of inpatient beds, and healthcare costs.


We have established the Skåne Emergency Medicine (SEM) cohort for developing clinical decision support systems (CDSS) based on artificial intelligence or machine learning as well as traditional statistical methods. The SEM cohort consists of 325 539 unselected unique patients with 630 275 visits from January 1st, 2017 to December 31st, 2018 at eight EDs in the region Skåne in southern Sweden. Data on sociodemographics, previous diseases and current medication are available for each ED patient visit, as well as their chief complaint, test results, disposition and the outcome in the form of subsequent diagnoses, treatments, healthcare costs and mortality within a follow-up period of at least 30 days, and up to 3 years.


The SEM cohort provides a platform for CDSS research, and we welcome collaboration. In addition, SEM’s large amount of real-world patient data with almost complete short-term follow-up will allow research in epidemiology, patient management, diagnostics, prognostics, ED crowding, resource allocation, and social medicine.


All over the world, emergency departments (ED) are struggling with an increasing inflow of patients, and especially elderly patients with complex pathology that is difficult to assess due to simultaneous chronic diseases, risk factors and/or polypharmacy [1, 2]. ED clinicians need to make fast and accurate risk estimates, and optimal management from the start is crucial for good patient outcomes. At the same time, the amount of available clinical information in electronic medical records is also increasing, as is the total body of medical knowledge. Often the ED physician can no longer grasp and process all available information, making it impossible for an individual clinician to provide the theoretically best possible care.

Artificial intelligence (AI) and machine learning (ML) are now developing fast, and most industries will likely be fundamentally changed by AI in the coming years [3]. In medicine, AI and ML provide new possibilities when applied to extensive electronic health records and registers [4]. The most impressive advances have occurred in radiology and pathology, where ML accuracy of image classifications now exceeds that of humans [5]. In emergency medicine, AI/ML-driven decision support tools have the potential to improve diagnostic accuracy [5], alleviate ED crowding [6, 7], and decrease the use of inpatient beds and healthcare costs [8]. The Swedish Board of Health and Welfare has therefore emphasized the great potential of AI/ML in emergency medicine [9]. So far however, there have been few AI/ML studies in the ED setting, and practically no implementation in routine ED care. The creation of ML-based decision support for ED use requires large amounts of high-quality clinical data, preferably from representative unselected ED patients in routine care.

In the present paper we describe the rationale for, and construction of, the Skåne Emergency Medicine (SEM) cohort and outline possible studies. The SEM cohort is a recently established data platform for developing clinical decision support systems (CDSS) based on traditional statistical methods or AI/ML, to be used in ED triage or later in the management of specific patient conditions. Specific aims include the prediction of diagnoses, critical interventions (e.g. defibrillation of cardiac arrest, thrombolysis in stroke) or inpatient care within 30 days of the ED visit, and mortality up to 1 year after the ED visit. We describe in this paper the process of building the SEM dataset with careful consideration of ethics, data protection, and bias. With the SEM cohort, we hope to create CDSS that can be tested in randomized trials in routine emergency care.


The formation of the SEM cohort was an initiative within the Artificially Intelligent use of Registers at Lund University (AIR Lund) research environment [10], which is a multidisciplinary collaboration between Lund University (Emergency medicine, Internal medicine, Epidemiology and biostatistics, Computational biology, Technology and society/ethics, and Law), Halmstad University (Information technology), and the Swedish health care regions Skåne and Halland.


Skåne is Sweden’s southernmost region and has some 1.4 million inhabitants. Healthcare is publicly financed with a small copayment at every visit. Patients in region Skåne almost always go to the nearest ED, and in general do not seek care outside the region. The SEM cohort includes data from patients presenting at eight general EDs in Skåne from January 1st, 2017 to December 31st, 2018. The characteristics of these EDs are described in Table 1. Five EDs are open 24/7/365 (Skåne university hospital at Lund and Malmö, Helsingborg general hospital, Kristianstad central hospital and Ystad hospital) and three EDs are open during office hours (Landskrona, Trelleborg and Hässleholm hospitals). There are very few patients with psychiatric disorders, problems related to obstetrics/ gynecology, ophthalmology, and pediatric patients without orthopedic problems at these EDs, since there are specialized EDs for these patients in the region. Table 1 describes that the yearly ED census ranges between 80000 (Malmö) and 5000 (Landskrona) patient cases, and that admission rates to in-hospital care range between 20% (Helsingborg) and 32% (Hässleholm or Landskrona). All EDs use the rapid emergency triage and treatment system (RETTS [11]) that includes five priority levels: Highest priority 1 (Red); Priority 2 (Orange); Priority 3 (Yellow); Lowest priority 4 (Green); and Priority primary care (Blue). The RETTS set of chief complaints are thus common for all EDs in the SEM cohort. All EDs have similar access to patient testing, and clinical guidelines are generally the same in the entire region.

Table 1 Characteristics of the EDs included in the SEM cohort, after Welch et al.[21] *trauma level according the American College of Surgeons [22]. EM, emergency medicine; ENT, Ear nose and throat; Ob/Gyn, Obstetrics/Gynecology

During and after the data collection period, the patients were informed of the purpose and structure of the SEM cohort in writing via public advertising on a website, and that they could decline participation at any time, for any reason, by contacting a research nurse or the first author at Lund. The creation of the SEM cohort and its use for AI/ML research and cross-sectional analyses has been approved by the Swedish Ethical Review Authority (Dnr 2019–05783), and by Region Skåne (302 − 19). There is no approval for commercial use of the data.

Data collection

During the study period, all patients at the eight EDs were included in the SEM cohort by default via identification in the common ED patient log system (Patientliggaren™, Tietoevry [12]), and data from the other registers (below) were then linked by each patient’s unique Swedish identification (ID) number, which is universally used in Swedish healthcare and all government registers. After collection and linkage, all data were pseudonymized with patient study ID numbers and kept on secure servers behind firewalls at Lund University where access is logged. The key between personal and study IDs is kept separately on a Region Skåne server with standard healthcare data security.

The data sources include healthcare databases and registers with complete national or regional coverage, which should ensure close to complete data on all patient visits. As much as possible, we used well described high-quality data sources (see e.g. references [13,14,15,16]) to collect the SEM data in order to decrease bias and data errors. The number of missing data varies across the sources but is generally very low. Data variables were chosen based on importance in the emergency care process as well as availability in the source registers. The collected data were the same as used in clinical care, and there was no major change in data labelling during 2017–2018. The SEM cohort was not designed with a specific CDSS or study in mind, but the size of the cohort (below) and the number of variables and data included was chosen to ensure sufficient statistical power for most CDSS research projects.

Data from the source registers were kept in their exported form with no deletion or curation, and software scripts are used to extract data to form tailor-made new datasets for each specific research project. Data curation or deletion will generally take place in each CDSS project, and only as needed in the original SEM cohort data.

As shown in Table 2, the available data for each patient visit include the patient’s baseline data, data on the ED visit, and the outcome within 30 days up to three years after the ED visit: diagnoses, ED returns, hospital admissions, death, and healthcare costs. In total, the SEM data include several hundred variables for each patient, and many more that can be calculated from the original variables, such as ED crowding or boarding data, return visits, and mortality at different times after ED arrival. Detailed variable lists are available on reasonable request.

Table 2 Available data for each patient visit in the SEM cohort

The SEM cohort is thus mainly based on register data and does not include free text information such as the patient’s detailed symptom history, findings at the physical examination, reasons for decisions and preliminary assessments. Also missing are the initial ED vital signs (blood oxygen saturation, respiratory rate, pulse rate, blood pressure, consciousness level and body temperature) and pharmacological treatment in the ED, since these data are primarily recorded on paper in the region. However, all this missing information can be obtained as needed by manual review of the individual patient records. As for diagnostic tests, ECG data are available as the raw signal, amplitude/interval measurements as well as the machine interpretation, and imaging and functional test data are available as the free text results. The images are not part of the SEM cohort data but can be obtained in specific projects.

Basic cohort characteristics

The SEM cohort is briefly described in Table 3 and includes 325 539 unique patients with 630 275 ED visits during 2017 and 2018. Fewer than five patients declined participation which makes the cohort almost 100% complete. The mean age was 55 years, 49% were male and 23.5% of all patients arrived by ambulance. The most common triage category was 3, Yellow, and 15.0% of the patients had no registered triage category mostly due to immediate referral from the ED to external primary care or self-care. 11% of the patients had previous diagnoses of diabetes, 10% of cancer, 8% of pulmonary disease, and 1.7% suffered from dementia.

Table 3 Baseline patient characteristics and management in the SEM cohort. Std, standard deviation. *Among the unique patients

Table 4 shows that the most common chief complaint in SEM was abdominal pain, followed by chest pain, dyspnea, hand injury and unspecific disorder. (The term “unspecific disorder” is used when the triage nurse is unable to classify the patient’s problem using the more specific terms in the system.) Some 9% of all visits had no registered chief complaint, again mostly because of immediate referral to primary or self-care. The median time to doctor was 70 min and the median length of stay was 206 min. In 24% percent of all ED visits the patient was admitted to in-hospital care.

Table 4 Twenty most common chief complaints in the SEM cohort, according to the RETTS system [11]. The term “Unspecific disorder” is used when the triage nurse is unable to classify the patient’s problem using the more specific terms in the system

As can be seen in Table 5, the most common discharge diagnoses were bacterial pneumonia, cerebrovascular incident, and acute myocardial infarction. The mortality at the ED was 0.2%, it was 0.9% within 7 days, and 2.2% within 30 days.

Table 5 Selected discharge diagnoses from the ED or from in-hospital care directly following the ED visit, in the SEM cohort


In addition to CDSS development, SEM’s large amount of real-world ED patient data with almost complete follow-up will allow research in many fields of emergency medicine: Epidemiology, patient management, diagnostics, prognostics, ED crowding, resource allocation, and social medicine. Some of these studies may need supplementary ethics approval. The SEM cohort is currently being used to analyze cases of missed acute aortic syndrome, for prediction of venous thromboembolism, mapping of characteristics and outcomes in patients with dizziness or with head trauma, and for the evaluation of emergency care for adult patients with congenital heart disease.

Studies of the epidemiology of ED patients may be beneficial for public health surveillance, resource planning, evaluating healthcare delivery and for facilitating research, e.g. sample size calculations for prospective studies. Epidemiological information supports clinical evidence-based decision-making and enables the ED to organize according to the needs of the population. The SEM cohort includes almost all patients presenting at eight EDs in southern Sweden during two years, and it should therefore be possible to obtain reasonably accurate and generalizable data on chief complaints and underlying disease states in the entire population as well as in subgroups based on age, sex, comorbidities or sociodemographics. Also, diurnal, weekly, and seasonal variations may be described.

ED patient management and its impact on outcomes may be studied in the SEM cohort by analyzing e.g. waiting times, length of ED stay, admissions to intensive care, as well as patients who left without being seen by a physician or who returned to the ED. These analyses may also be made in the absence or presence of ED crowding. As mentioned, pharmaceutical treatment at the ED is not immediately available but can be extracted for all patients from the digitized (scanned) ED patient paper records.

The SEM cohort allows analysis of the accuracy of diagnostic and functional testing by comparing pre-test probability with short or medium-term outcomes such as diagnoses or death.

The utilization and costs of diagnostic testing, hospital admission and care at specific wards in each patient up to 30 days in the cohort can be used to analyze resource use in all patients and in specific subgroups. Also, the SEM cohort may be used to evaluate ED care and acute healthcare consumption in different socioeconomic and demographic groups, as well as inequalities and possible discrimination.

Strengths and limitations

SEM includes real-world clinical data from consecutive patients presenting to eight different EDs during two years. The large number of patient visits, variables, and clinical events should be sufficient for most analyses of interest. Data were collected in regular care and there are several general advantages with using routine care data when building CDSS. Firstly, it provides access to large amounts of data from a diverse and unselected patient population, which is crucial for developing CDSS that work across different patient demographics. Secondly, routine care data may be immediately available, reducing the cost and time required to collect data. Finally, routine care data collection will often allow simple tracking of patient outcomes and evaluation of the effectiveness of the CDSS, especially in a country with comprehensive healthcare databases like Sweden. In the future, it may be possible to use native, uncurated electronic health records directly for medical research [17]. Another strength of the multimodal SEM cohort is its potential utility in developing CDSS that provide relative risks of multiple diagnoses, in contrast to algorithms based on a single type of input and output (e.g. radiology algorithms detecting cancer), and current clinical decision support tools which often serve merely as rule-out tests, e.g. the PERC rule for pulmonary embolism.

SEM includes data from ED patient visits in one Swedish region, and the data may therefore not be generalizable to other populations or healthcare settings. There are few patients in the SEM cohort with problems related to psychiatry, obstetrics/gynecology, and ophthalmology, as well as few pediatric patients without orthopedic problems. Some clinical variables are missing or less readily available in SEM, e.g. free text imaging results that require manual review, and this will of course prevent or complicate the creation of some types of CDSS, as well as some data disaggregation. Missing data in SEM are rare, but there may of course be errors in the data, which can lead to biased or inaccurate CDSS. Since SEM data were registered as part of regular care, bias may also arise from different patient evaluation and management based on previous clinical findings (verification bias) or based on patients’ ethnic or socioeconomic background. Also, historical bias will exist in any clinical database, i.e. when the data no longer accurately reflect a new healthcare reality.

Several variables in the SEM database were originally manually entered or determined subjectively, such as time stamps in the ED and discharge diagnoses and may therefore contain errors or bias. Diagnoses might also have been registered several times for the same care episode. Bias or errors in the training data will cause a high risk of bias in the final CDSS, but the size and impact of the problem will vary in different CDSS. The optimal approach to the potential problem with bias is therefore best determined in each use case and CDSS. Before clinical implementation, any CDSS based on SEM data should be carefully reviewed and prospectively tested in a clinical trial in the specific healthcare setting.

On the other hand, it should be noted that if a CDSS is intended to operate in real time with standard register data as input, it is preferable that the underlying ML model is developed using this type of data rather than curated data that do not reflect the “dirty” truth of day-to-day operations. With sufficiently large training data, current ML algorithms can cope with a fair amount of noise and navigate between varying levels of noise in different types of input data.

In addition to algorithm quality, several barriers to successful implementation and use of AI/ML-based CDSS must be considered: IT problems, low model transparency (black box algorithms), proprietary code, lack of trust and knowledge among physicians and decision-makers, legal framework (oversight, malpractice issues) and ethical issues, integrity risks and financial challenges [18,19,20]. However, the size and implications of these barriers will vary in different use cases.

In conclusion, the SEM cohort provides a platform for collaborative CDSS research. SEM’s large amount of real-world patient data with almost complete follow-up will also allow research in epidemiology, patient management, diagnostics, prognostics, ED crowding, resource allocation, and social medicine.

SEM cohort access

So far, collaborations have been established with other research groups at Lund and Halmstad Universities in Sweden. We welcome initiatives on international collaborative projects using the SEM cohort. Anonymized parts of the SEM database will be available for sharing on reasonable request, as will detailed variable lists. Please contact the corresponding author via email (

Data availability

Anonymized parts of the SEM database will be available for sharing on reasonable request. Please send an email to



artificial intelligence


clinical decision support system


emergency department


machine learning


rapid emergency triage and treatment system

SEM cohort:

Skåne emergency medicine cohort


  1. Muth C, Blom JW, Smith SM, et al. Evidence supporting the best clinical management of patients with multimorbidity and polypharmacy: a systematic guideline review and expert consensus. J Intern Med. 2019;285(3):272–88.

    Article  CAS  PubMed  Google Scholar 

  2. Socialstyrelsen. Väntetider och patientflöden på akutmottagningar. 2015:1–80. (

  3. Lynch S. Andrew Ng: Why AI Is the New Electricity. Stanford Graduate School of Business. (

  4. Obermeyer Z, Emanuel EJ. Predicting the Future - Big Data, Machine Learning, and Clinical Medicine. N Engl J Med. 2016;375(13):1216–9.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Tang X. The role of artificial intelligence in medical imaging research. BJR Open. 2020;2(1):20190031.

    Article  PubMed  Google Scholar 

  6. Harrou F, Dairi A, Kadri F, Sun Y. Forecasting emergency department overcrowding: a deep learning framework. Chaos Solitons Fractals. 2020;139:110247.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Sudarshan VK, Brabrand M, Range TM, Wiil UK. Performance evaluation of Emergency Department patient arrivals forecasting models by including meteorological and calendar information: a comparative study. Comput Biol Med. 2021;135:104541.

    Article  PubMed  Google Scholar 

  8. NHS using AI to. Reduce ‘avoidable’ hospital admissions this winter. The Independent: Nov 14; 2023.

    Google Scholar 

  9. Socialstyrelsen. Digitala vårdtjänster och artificiell intelligens i hälso- och sjukvården. Swedish National Board of Health and Welfare (Socialstyrelsen); 2019.

  10. AIR Lund– Artificially Intelligent use of Registers at Lund University. Lund University. (

  11. Predicare. The RETTS system. (

  12. TietoEvry. Patientliggaren. (

  13. Statistics Sweden. (

  14. Ludvigsson JF, Andersson E, Ekbom A, et al. External review and validation of the Swedish national inpatient register. BMC Public Health. 2011;11:450.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Wettermark B, Hammar N, Fored CM, et al. The new Swedish prescribed Drug Register–opportunities for pharmacoepidemiological research and experience from the first six months. Pharmacoepidemiol Drug Saf. 2007;16(7):726–35.

    Article  PubMed  Google Scholar 

  16. Brooke HL, Talback M, Hornblad J, et al. The Swedish cause of death register. Eur J Epidemiol. 2017;32(9):765–73.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Khan MS, Usman MS, Talha KM et al. Leveraging electronic health records to streamline the conduct of cardiovascular clinical trials. Eur Heart J 2023.

  18. Stewart J, Lu J, Goudie A, et al. Applications of machine learning to undifferentiated chest pain in the emergency department: a systematic review. PLoS ONE. 2021;16(8):e0252612.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Panch T, Mattie H, Celi LA. The inconvenient truth about AI in healthcare. NPJ Digit Med. 2019;2:77.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Goodman KE, Rodman AM, Morgan DJ. Preparing Physicians for the clinical algorithm era. N Engl J Med. 2023;389(6):483–7.

    Article  PubMed  Google Scholar 

  21. Welch S, Augustine J, Camargo CA Jr., Reese C. Emergency department performance measures and benchmarking summit. Acad Emerg Med. 2006;13(10):1074-80. (In eng). DOI: j.aem.2006.05.026 [pii]

  22. The American College of Surgeons CoT. Resources for Optimal Care of the injured patient. The American College of Surgeons; 2006.

  23. National Prescribed Drug Register. National Board of Health and Wellfare. Jan 14. 2022 (

  24. National Patient Register, Swedish National Board of Health and Welfare. (

  25. National Cause of Death Register. National Board of Health and Welfare. (

  26. Letterstal A, Ekelund U, Castren M, Lindmarker P, Safwenberg U, Kurland L. [SVAR–a unique Swedish emergency registry]. Lakartidningen. 2010;107(43):2659–60.

    PubMed  Google Scholar 

  27. Ekelund U, Kurland L, Eklund F, et al. Patient throughput times and inflow patterns in Swedish emergency departments. A basis for ANSWER, A National SWedish Emergency Registry. Scand J Trauma Resusc Emerg Med. 2011;19:37.

    Article  PubMed  PubMed Central  Google Scholar 

  28. The Swedish Emergency Care Registry. SVAR. (

  29. Siemens Healthineers. (

  30. Healthcare G. (

  31. SECTRA. (

Download references


We are grateful for the excellent help and project coordination by Cecilia Åkesson Kotsaris, and for the invaluable data management by Paul Söderholm, both at Region Skåne. We also thank the patients for their participation, and the research assistants and the emergency department staff in Region Skåne for their kind help.


The study was supported by an ALF research grant at Skåne University Hospital and by a grant from Region Skåne. This study was part of the AIR Lund (Artificially Intelligent use of Registers at Lund University) research environment and received funding from the Swedish Research Council (VR; grant no. 2019 − 00198). There was no industry involvement. Funding organizations had no role in the planning, design, or conduct of the study, collection, analysis or interpretation of data, or preparation, review or approval of the manuscript.

Open access funding provided by Lund University.

Author information

Authors and Affiliations



UE, BO, and OM conceived the cohort, were responsible for the ethics approval, and generated funding together with JB. JB, MO, JLF and POC provided expert opinion in the design of the cohort and the database. AB led the data management together with AN and made the general data analyses. UE and AB drafted the manuscript. All authors critically revised and approved the final manuscript and meet the criteria for authorship established by the ICMJE.

Corresponding author

Correspondence to Ulf Ekelund.

Ethics declarations

Ethics approval and consent to participate

The creation of the SEM cohort and its use for AI/ML research and cross-sectional analyses has been approved by the Swedish Ethical Review Authority (Dnr 2019–05783) and Region Skåne (KVB 302 − 19). There is no approval for commercial use of the data. All included patients had access to written information on the SEM cohort and its purpose, and had the possibility to decline participation at any time, for any reason.

Consent for publication

Not applicable.

Competing interests

None of the authors declare competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ekelund, U., Ohlsson, B., Melander, O. et al. The skåne emergency medicine (SEM) cohort. Scand J Trauma Resusc Emerg Med 32, 37 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: