Skip to main content
  • Original research
  • Open access
  • Published:

The development of emergency medical services benefit score: a European Delphi study



The helicopter emergency services (HEMS) Benefit Score (HBS) is a nine-level scoring system developed to evaluate the benefits of HEMS missions. The HBS has been in clinical use for two decades in its original form. Advances in prehospital care, however, have produced demand for a revision of the HBS. Therefore, we developed the emergency medical services (EMS) Benefit Score (EBS) based on the former HBS. As reflected by its name, the aim of the EBS is to measure the benefits produced by the whole EMS systems to patients.


This is a four-round, web-based, international Delphi consensus study with a consensus definition made by experts from seven countries. Participants reviewed items of the revised HBS on a 5-point Likert scale. A content validity index (CVI) was calculated, and agreement was defined as a 70% CVI. Study included experts from seven European countries. Of these, 18 were prehospital expert panellists and 11 were in-hospital commentary board members.


The first Delphi round resulted in 1248 intervention examples divided into ten diagnostic categories. After removing overlapping examples, 413 interventions were included in the second Delphi round, which resulted in 38 examples divided into HBS categories 3–8. In the third Delphi round, these resulted in 37 prehospital interventions, examples of which were given revised version of the score. In the fourth and final Delphi round, the expert panel was given an opportunity to accept or comment on the revised scoring system.


The former HBS was revised by a Delphi methodology and EBS developed to represent its structural purpose better. The EBS includes 37 exemplar prehospital interventions to guide its clinical use.

Trial registration The study permission was requested and granted by Turku University Hospital (decision number TP2/010/18).


Evaluating the potential benefits of emergency medical services (EMS) missions is crucial to allocate EMS resources purposefully and to focus the dispatching of advanced-level prehospital units to missions where patients are likely to benefit from their advanced skills. Due to the multifaceted nature of prehospital missions, the benefits of prehospital care are difficult to evaluate [1, 2], and the benefits of advanced prehospital care are continuously subject to debate [3]. Existing scoring systems estimate the severity of injuries or illnesses for patients, such as the National Advisory Committee for Aeronautics (NACA) severity score [4], which focuses on the severity of an incident and patient characteristics and does not consider the impact of prehospital care, limiting its use in benchmarking and benefit assessments.

The helicopter emergency medical services (HEMS) Benefit Score (HBS) is a nine-level scoring system developed to evaluate the benefits of HEMS missions in the 1990s in Finland [5]. Each category is defined by a written description along with exemplar interventions which can be used to guide the scorer's choice of category. The highest HBS score is reserved for the most advanced prehospital interventions, but the idea is to evaluate the benefit produced by the whole EMS system, not only HEMS units. The scoring system has been used in the Finnish HEMS units since 1997, originally to follow the benefit of the HEMS launched at that time, but nowadays also to compare individual national HEMS units and to collect data for administration purposes.

Despite the everyday use of the HBS in Finnish HEMS for over two decades, its validity has not been studied at all, and reliability has been studied only recently [5, 6]. According to study results, the HBS’s inter-rater reliability was noticed to vary from poor to substantial or almost perfect, and mean difference between raters and reference values were substantial [5, 6]. As the scoring is guided by exemplar interventions, it can be argued, that the reliability could be improved by more detailed and comprehensive examples. Additionally, it has been suggested, that the exemplar interventions should be updated to meet the current treatment guidelines [5].



The aim of this study is to develop a score to measure the benefits of prehospital interventions to a single patient. This score development is based on the HBS, but the old exemplar interventions are replaced by more relevant examples. The meaning of these updated instructions is to cover the most common prehospital mission types and make evaluating the effectiveness of prehospital treatments easier and more accurate. Because this evaluation tool is appropriate for the whole EMS system, the score is renamed the EMS Benefit Score (EBS).

Design and setting

This is a four-round, web-based, international Delphi study using expert panel consensus. The technique involves a panel of experts who are asked to complete a series of questionnaires focusing on their opinions, predictions and judgements on a topic of interest. The Delphi technique is widely used in health research to obtain consensus in serial surveys, which are referred to as “rounds”. Key elements of the technique are (1) expert participants, (2) anonymity and individuality, and (3) a summary of results of the former round at the start of each round [7, 8]. The data collection, Delphi rounds and data analysis of the current study were performed from 3.12.2018 to 19.11.2020. A pilot study was performed prior to the actual study to evaluate the study setting. The pilot study participants consisted of Finnish and Danish prehospital physicians who did not participate in the planning of the study or in the actual study.

The work of the expert panel and the commentary board were executed in four Delphi rounds as follows:

  1. 1.

    Each expert panellist was asked to list both common and rare examples of prehospital treatments and interventions and to locate them based on their current knowledge and personal experience into HBS categories 3–8 as comprehensively as possible in subsections based on ten complaint-based diagnoses: “acute neurology excluding stroke”, “breathing difficulties”, “cardiac arrest”, “chest pain”, “infection”, “obstetrics including child birth”, “other”, “psychiatry including intoxication”, “stroke” and “trauma”. These diagnosis groups are recommended in prehospital reporting [9]. The answers were collected anonymously into an electronic data sheet by a data-collection officer who did not participate in the example selection but gathered suggestions in a common table. HBS categories 0–2 were excluded from the study because they are used for scoring when a prehospital intervention is deemed unnecessary or the patient was not met. A commentary board commented on the data gathered from the first Delphi round on the diagnosis groups related to their individual specialties. These comments were shown to the expert panel in the second Delphi round to help them rate the examples on a 5-point Likert scale. Identical suggestions from the first round were combined and overlapping examples removed for the second Delphi round.

  2. 2.

    The examples from the first Delphi round with the commentary board’s opinions were set in a table and sent back to the panellists, who were asked to rate each item on a 5-point Likert scale from 1 (strongly disagree) to 5 (strongly agree). A content validity index (CVI) was calculated for each example, and at least 70% of the experts were required to assign a suggested example a high-agreement score (4 or 5) for it to be included in the third Delphi round. Overlapping examples were then removed.

  3. 3.

    In the third Delphi round, the remaining examples were listed in their suggested HBS categories. The expert panellists were asked to assign each of these remaining examples one of the following labels: “Accept”, “Delete” or “Relocate to EBS category number __”. An acceptance rate of 70% or more was required to assign an example to a category. The examples with acceptance rates below 70% were deleted or relocated to category with the most “Relocate” suggestions—whichever had the higher percentage.

  4. 4.

    In the final Delphi round, the EBS was revealed to the prehospital expert panellists, who were offered an opportunity to comment on it or accept it in that form.

In addition to these Delphi rounds, each phase included an opportunity for free comments on the exemplar interventions and category descriptions.


Two expert groups were formed for the study: a prehospital expert panel and a separate commentary board. Experts were recruited with open letters: the prehospital expert panel via the European Prehospital Research Alliance (EUPHOREA) and the commentary board via National Finnish specialty societies. The participants were selected based on individual clinical and scientific experiences. The prehospital expert panel ultimately included 18 prehospital physicians from Scandinavia and Northern Europe and the commentary board 11 Finnish in-hospital physicians from seven specialties. The total number of study experts was 29. Table 1 presents characteristics of the 18 prehospital expert panellists. Physicians from intensive care, traumatology, cardiology, neurology, neurosurgery, paediatrics and obstetrics were recruited for the commentary board. Members of the commentary board were recruited to give an in-hospital viewpoint, and therefore they did not have prior or current prehospital experience.

Table 1 Characteristics of the 18 prehospital expert panellists

Statistical methods

This study used the Delphi method and expert consensus. Data handling and collection were performed using Webropol 3.0 by the Webropol Group. A 5-point Likert scale was used on the second Delphi round, and a CVI was calculated for the collected data by Webropol 3.0. Agreement was defined as 70% of the experts rating a suggested example with a high-agreement score (4 or 5) [10].


By Finnish law, no ethical approval was needed for this study because no patients or personal data were involved. The study permission was requested and granted by Turku University Hospital (decision number TP2/010/18). The study subjects participated voluntarily. The Standards for Reporting Qualitative Research (SRQR) guidelines by the EQUATOR network were followed in reporting the study.

Patient and public involvement

No patients were involved.


The first Delphi round resulted in 1284 examples from 18 expert panellists divided into HBS categories 3–8 in ten complaint-based subsections. Seven of the responders gave free comments (each Delphi round included sections for free written comments). Figure 1 describes the course of the Delphi rounds, and Additional files 1 and 2 present the materials of the second and third Delphi rounds (Additional files 1 and 2).

Fig. 1
figure 1

The course of the Delphi rounds in the study

Table 2 presents the final form of the scoring system, and additional materials present the expert panellists’ free comments. The definitions of the score categories were kept in their original forms, and no free comment was related to the content of these written definitions. In the fourth Delphi round, one participant suggested moving “Administration of tranexamic acid” from EBS 4 to EBS 6 based on current scientific evidence, and this manoeuvre was performed.

Table 2 The EBS


In this study, we updated the HEMS Benefit Score by using the Delphi method to meet the current needs of prehospital emergency care. The structure of nine-level numerical scoring categories, inherited from the original HBS, remained intact, but the exemplar interventions in each category were totally renovated. With this renewal, the scoring system was expanded from HEMS usage to cover all prehospital emergency care, including non-HEMS units, and to better face present-day needs. The renamed score, EBS, better represents the fundamental features of this scoring system and encourages non-HEMS units to utilise it in their practice.

The EBS focuses on interventions that are performed prehospitally and considers the impact of these manoeuvres for treated patients. By this, the EBS aims to evaluate the true benefit of EMS for single patients. In contrast, other scores and classifications used in prehospital settings, such as the American Society of Anesthesiologists Physical Status Classification System (ASA-PS) or NACA [5, 6, 9], describe patient background characteristics and acute clinical status. However, these scores do not evaluate the influence of prehospital care and were not originally built or implemented for prehospital use, so their reliability in prehospital settings is questionable [6].

The revised scoring examples are expected to improve correct benefit category selection. After each EMS mission, EMS personnel responsible for mission documenting, choose a suitable benefit category depending on the individual mission circumstances. Even though the revised examples introduce the consensus opinion of the experts and give guidelines to the benefit category selection, the scoring is ultimately based on the subjective judgement of the person doing documentation. This is because the revised examples are obviously not comprehensive, even if they are versatile. Additionally, it is justifiable to deviate from the score suggested by the exemplar interventions, if the patient has, for example, benefited from several interventions or fast air transport or, on the other hand, the interventions performed have been unnecessary or ineffectual. Despite the subjective nature of the EBS, it can serve as a valuable tool for gathering information from one aspect of prehospital missions, as the effectiveness of prehospital emergency care is a highly complex ensemble and a totally inclusive scoring system for this purpose does not exist.

During the Delphi process, the benefit category examples were renovated, but the numerical scoring categories remained intact, as it was judged unreasonable to evaluate the number of the categories during the same process. These numerical categories were originally developed based on practical experience, so there is no science behind them, and they or the number of them might be inappropriate. This issue must be taken into account in the future studies, and one must estimate the need of possible revision of the categories.

To evaluate the effectiveness of prehospital care, various quality indicators and measurement protocols have been launched [1, 11,12,13], but few studies have focused on their implementation or outcomes. A single scoring system does not solve the absence of process control in EMS systems, but combined with other manoeuvres, the EBS can support intrinsic quality improvement. For example, data on EMS unit-dispatch codes and criteria can be compared on EBSs and the benefit produced by EMS to prehospitally treated patients, based on interpretation of a treating clinician. Beyond accurately dispatching the proper level and number of EMS units, however, EMS system coverage and the geographic locating of units remain challenges [14, 15]. The type and number of missions historically presented in the areas under observation are important aspects in locating EMS units and bases. With the EBS, additional information on regional missions can be gathered. However, far-reaching conclusions based on the EBS are not justified until its reliability and validity have been studied in various settings.

Strengths and limitations

The international expert panel improved the EBS’s generalisability. Despite variations in EMS systems between countries, the EBS evaluates the potential advantages for prehospital patients regardless of the level of the treating EMS unit, the only exception being the highest EBS category, which is reserved for treatments usually offered by only advanced-level units.

The Delphi technique in this study enabled a panel of 18 experienced panellists to express their opinions freely and impersonally guided by the opinions of 11 in-hospital experts from seven specialties. This method limits dominance by eminent, eloquent or highly opinionated individuals in their respective fields of expertise [7, 8], and the panel moderator is less likely to bias the work of the panel. The Delphi method gives panellists substantial time to express their ideas, reflect on their answers and make changes, P and it avoids geographical constraints. On the other hand, the Delphi method itself is vulnerable to a loose definition of an expert, and biases might influence participant selection. The method is also dependent on questionnaire design [7, 8].

A major limitation of this study is, that there is limited data on the impact of several prehospital interventions such as prehospital airway management [16, 17]. An intervention may or may not be life-saving, depending on context. However, in the absence of a thorough research-based data on the impact of different interventions, a consensus opinion of experts is meaningful. In addition, currently no evidence exists of paramedics` ability to predict mortality.


The EBS is based on the subjective opinion of an attending prehospital clinician. To make the scoring system less dependent on individual variation, the renewed exemplar interventions in each EBS category support the selection of the appropriate category. The revised EBS can be used to benchmark different types of units, enabling quality control, which also allows the development of EMS efficiency. The given EBS scores can be compared to in-hospital interventions and patient outcome, to evaluate the adequacy of prehospital care. For example, a person unconscious due to alleged alcohol intoxication has been given EBS 2 on paramedic evaluation but needs rapid sequence intubation upon arrival in the emergency department. In this case EBS could be used to detect and study why this has happened, and this way for system quality control. Moreover, if the patients with low EBS receive intensive care or emergency procedures in hospital, this should raise the question of the quality of prehospital evaluation of the patients’ condition. Finally, this scoring system can be used to categorize prehospital interventions in clinical studies on EMS performance and to get more data where and in which type of missions, the patients are likely to benefit most. In the future EBS could optimally be linked to the care patient receive in hospital and their later level of performance. However, further reliability and validity studies are needed, before a wide-scale implementation.


Using the Delphi method, the new scoring system, the EBS, was formed by a panel of experienced experts from across Northern Europe. We recommend implementing the EBS to every EMS systems as a part of a routine reporting.

Availability of data and materials

The datasets analysed in this study are available from the corresponding author upon reasonable request.



EMS Benefit Score


Emergency medical service


European Prehospital Research Alliance


HEMS Benefit Score


Helicopter emergency medical service


Physician-staffed emergency medical service


Standards for Reporting Qualitative Research


  1. Murphy A, Wakai A, Walsh C, et al. Development of key performance indicators for prehospital emergency care. Emerg Med J. 2016;33:286–92.

    Article  Google Scholar 

  2. Saviluoto A, Björkman J, Olkinuora A, et al. The first seven years of nationally organized helicopter emergency medical services in Finland—the data from quality registry. Scand J Trauma Resusc Emerg Med. 2020;28(1):66.

    Article  Google Scholar 

  3. McLean SA, Maio RF, Spaite DW, et al. Emergency medical services outcomes research: evaluating the effectiveness of prehospital care. Prehosp Emerg Care. 2009;6(sup 2):S52–6.

    Google Scholar 

  4. Raatiniemi L, Mikkelsen K, Fredriksen K, et al. Do pre-hospital anaesthesiologists reliably predict mortality using the NACA severity score? A retrospective cohort study. Acta Anaesthesiol Scand. 2013;57(10):1253–9.

    Article  CAS  Google Scholar 

  5. Raatiniemi L, Liisanantti J, Tommila M, et al. Evaluating helicopter emergency medical missions: a reliability study of the HEMS benefit and NACA scores. Acta Anesthesiol Scand. 2017;61:557–65.

    Article  CAS  Google Scholar 

  6. Heino A, Laukkanen-Nevala P, Raatiniemi L, et al. Reliability of prehospital patient classification in helicopter emergency medical service missions. BMC Emerg Med. 2020;20(1):42.

    Article  CAS  Google Scholar 

  7. Polit D, Beck C. Nursing research—principles and methods. Lippincott, Williams & Wilkins/Wolters Kluwer; 2004.

  8. Diamond I, Grant R, Feldman B, et al. Defining consensus: a systematic review recommends methodologic criteria for reporting of Delphi studies. J Clin Epidemiol. 2014;67:401–9.

    Article  Google Scholar 

  9. Kruger AJ, Lockey D, Kurola J, et al. A consensus-based template for documenting and reporting in physician-staffed pre-hospital services. Scand J Trauma Resusc Emerg Med. 2011;19:71.

    Article  Google Scholar 

  10. Polit D, Beck C, Owen S, et al. Focus on research methods. Is the CVI an acceptable indicator of content validity? Appraisal and recommendations. Res Nurs Health. 2007;30:459–67.

    Article  Google Scholar 

  11. Cairns CB, Garrison HG, Hedges JR, et al. Development of new methods to assess the outcomes of emergency care. Acad Emerg Med. 1998;5:157–61.

    Article  CAS  Google Scholar 

  12. Haugland H, Rehn M, Klepstad P, et al. Developing quality indicators for physician-staffed emergency medical services: a consensus process. Scand J Trauma Resusc Emerg Med. 2017;25:14.

    Article  Google Scholar 

  13. Rehn M, Krüger AJ. Quality improvement in pre-hospital critical care: increased value through research and publication. Scand J Trauma Resusc Emerg Med. 2014;22:34.

    Article  Google Scholar 

  14. Pappinen J, Laukkanen-Nevala P, Mäntyselkä P, et al. Development and implementation of a geographical area categorisation method with targeted performance indicators for nationwide EMS in Finland. Scand J Trauma Resusc Emerg Med. 2018;26:41.

    Article  Google Scholar 

  15. Røislien J, Van den Berg PL, Lindner T, et al. Comparing population and incident data for optimal air ambulance base locations in Norway. Scand J Trauma Resusc Emerg Med. 2018;26:42.

    Article  Google Scholar 

  16. Fullerton JN, Roberts KJ, Wyse M. Should non-anaesthetists perform pre-hospital rapid sequence induction? An observational study. Emerg Med J. 2011;28(5):428–31.

    Article  CAS  Google Scholar 

  17. Van der Velden MWA, Ringburg AN, Bergs EA, et al. Prehospital interventions: time wasted or time saved? An observational cohort study management in initial trauma care. Emerg Med J. 2008;25(7):444–9.

    Article  Google Scholar 

Download references


First, the authors acknowledge Dr Janne Reitala for developing the original HEMS Benefit score and his support for this current study. Second, the authors acknowledge the EUPHOREA network for assistance in prehospital expert panel recruitment. Third, authors thank Mr Jarmo Määttänen, RN, for his outstanding contribution on the development of data collection platform. Fourth, the authors commend Professor Erika Frischknecht Christensen, Dr Morten Føhrby Overgaard and FinnHEMS 20 – Turku physicians for their participation in the pilot study. Collaborators in the pre-hospital expert panel: Andreas Krüger, The Norwegian Air Ambulance Foundation, Norwegian University of Science and Technology St. Olavs University Hospital Trondheim, Norway; Fabrice Dami, Emergency department Lausanne, Switzerland; Didier Moens, Liège University hospital and University of liège, Emergency Department, Belgium; Espen Fevang, The Norwegian Air Ambulance Foundation and Department of Anesthesiology and Intensive Care, Stavanger University Hospital, Stavanger, Norway; Heini Harve-Rytsälä, Emergency Medicine and Services, University of Helsinki and Helsinki University Hospital, Finland; Helena Jäntti, Center for Prehospital Emerrgency Care, Kuopio University Hospital, Finland; Jouni Nurmi, Research and Development Unit, FinnHEMS Ltd, Vantaa, Emergency Medicine Services, Helsinki University Hospital, and Department of Emergency Medicine, University of Helsinki, Finland; Kristin Tønsager, The Norwegian Air Ambulance Foundation and Department of Anesthesiology and Intensive Care, Stavanger University Hospital, Stavanger, Norway; Leif Rognås, Danish Air Ambulance, Denmark; Marius Rehn, Department of Research and Development, Norwegian Air Ambulance Foundation, Air Ambulance Department, Division of Prehospital Services, Oslo University Hospital, Faculty of Health Sciences, University of Stavanger, Norway; Patrick Schober, Department of Anesthesiology, Amsterdam University Medical Center, Vrije Universiteit Amsterdam, The Netherlands; Per P. Bredmose, Air Ambulance Department, Division of prehospital services, Oslo University Hospital, Norway; Peter Martin Hansen, Danish Air Ambulance, Region of Central Denmark, Aarhus, Denmark; Peter Temesvari, Hungarian Air Ambulance, Hungary; Søren Mikkelsen, The Prehospital Research Unit, Region of Southerm Denmark, Odense University Hospital, Denmark; Thomas W. Lindner, Regional Centre for Emergency Medical Research and Developmnet in western Norway (RAKOS), Quality and Patient Safety EMS, Stavanger University Hospital, Norway; Troels Martin Hansen, Danish Air Ambulance, Denmark and Vesa Lund, Department of Prehospital and emergency care, Satakunta District Hospital, Finland. Collaborators in the commentary board: Anna Nikula, New Children's Hospital, Helsinki University Hospital and University of Helsinki, Finland; Anne-Mari Kantanen Kuopio University Hospital Neurocenter, Department of Neurology, Finland; Antti E Lindgren, Department of Neurosurgery, Kuopio University Hospital and University of Eastern Finland, Kuopio, Finland; Heli Salmi, Department of Anaesthesia and Intensive Care, New Children`s Hospital, University of Helsinki and Helsinki University Hospital, Finland; Karri Kirjasuo, Department of Orthopedics and Traumatology, Turku University Hospital and University of Turku, Finland; Marjut Varpula, Cardiology, Heart and Lung Center, University of Helsinki, Helsinki University Hospital, Helsinki, Finland; Matti Reinikainen, Department of Anaesthesiology and Intensive Care, Kuopio University Hospital and University of Eastern Finland, Kuopio, Finland; Nanneli Paalasmaa Department of Obstetrics and gynecology, Turku University Hospital, Finland; Outi Peltoniemi, Department of Children and Adolescents, Oulu University Hospital, Finland; Teemu Luoto, Department of Neurosurgery, Tampere University Hospital and Tampere University, Finland and Ville Jalkanen, Department of Intensive Care, Tampere University, Tampere University Hospital, Finland.


No funding was received for this study.

Author information

Authors and Affiliations




AH, LR, TI, MM, JL and MT contributed to the conception and design of the study. AH, LR, TI, MM, JL and MT contributed to acquiring data and substantially contributed to the drafting and revision of the manuscript. AH, LR, TI, MM, JL and MT contributed to the analysis of the data and approved the final manuscript. AH, LR, TI, MM, JL and MT agreed to be personally accountable for their own contributions and to ensure that questions related to the accuracy or integrity of any part of the work, even those in which a certain author was not personally involved, were appropriately investigated and resolved, the resolution of which would thereafter be documented in the literature. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Anssi Heino.

Ethics declarations

Ethics approval and consent to participate

By Finnish law, no ethical approval was needed for this study because no patients or personal data were involved. The study permission was requested and granted by Turku University Hospital (decision number TP2/010/18). The study subjects participated voluntarily.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Collaboration Group: Participants listed in acknowledgements

Supplementary Information

Additional file 1.

The material of second Delphi round: specialty comments, and Likert-Scale distribution of intervention examples given by prehospital experts.

Additional file 2.

The material of third Delphi round.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Heino, A., Raatiniemi, L., Iirola, T. et al. The development of emergency medical services benefit score: a European Delphi study. Scand J Trauma Resusc Emerg Med 29, 151 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: