Skip to main content
  • Original research
  • Open access
  • Published:

The role of a checklist for assessing the quality of basic life support performance: an observational cohort study



Training lay rescuers in Basic Life Support (BLS) is essential to improve bystander cardiopulmonary resuscitation (CPR) rates; in addition, simple methods are needed to provide feedback on CPR performance. This study evaluated whether a simple observational checklist can be used by BLS instructors to adequately measure the quality of BLS performance as an alternative to other feedback devices.


The BLS performances of 152 first-year medical students (aged 21.4 ± 3.9 years) were recorded on video, and objective data regarding the quality of the BLS were documented using Laerdal PC SkillReporting software. The performances were categorized according to quality. Ten BLS instructors observed the videos and completed a ten-point checklist based on the Cardiff Test of BLS (version 3.1) to assess the performances. The validity of the checklist was reviewed using interrater reliability as well as by comparing the checklist-based results with objective performance data.


Matching the checklist-based evaluation with the objective performance data revealed high levels of agreement for very good (82%) and overall insufficient (75%) performances. Regarding the checklist-based evaluation, interrater reliability depended on the checklist item; thus, some items were more easily identified correctly than others. The highest and lowest levels of agreement were observed for the items “undressed torso” and “complete release between compressions” (mean joint-probability 95 and 67%, respectively).


The observational checklist adequately distinguished sufficient from insufficient BLS performances and offered an assessment of items not incorporated by SkillReporting software such as the initial assessment or undressing the chest. Although its usefulness was reduced for scaling intermediate performance groups, the checklist may be overall a useful rating tool in BLS-training if objective feedback devices are not available, for example, due to large groups of participants or limited training time.


Worldwide, about 17 million people per year die as a result of cardiovascular diseases [1]. In 25% of these cases, patients experience sudden cardiac death [2]. The high number of patients dying from sudden cardiac death as well as the low success rates of cardiopulmonary resuscitation (CPR) make sudden cardiac death a persisting predicament in patient care and public health [3,4,5].

In order to improve the currently insufficient implementation of CPR measures in out-of-hospital cardiac arrest (OHCA) patients in most countries, it is essential that lay rescuers receive efficient and extensive Basic Life Support (BLS) training [6]. Bystander-CPR is crucial to improving survival rates and neurological outcome in OHCA [7,8,9]. Furthermore, lay people trained in BLS are more willing to perform CPR in emergencies [7, 10]. As part of its latest guidelines, the European Resuscitation Council (ERC) recommends providing BLS training to every member of a community [7]. A 2013 study conducted the USA in 2013 revealed that financial factors are a main barrier for learning CPR in low-income environments and showed the necessity for low-cost or free BLS training in order to increase the number of lay people capable of performing CPR [11]. In addition, the implementation of BLS training may not proceed due to limited resources; for example, limited funds for CPR courses at schools, which underscores the importance of inexpensive but efficient learning strategies [12].

Several studies have estimated the cost-effectiveness of extensive CPR-training for laypersons since economic factors must be considered within health care systems, but further research is needed [13,14,15]. As feedback is an essential part of BLS training, several devices are available to assess CPR performance [16, 17]. For example, directive or audio feedback devices are recommended within the current ERC guidelines to improve the ability to perform CPR [7]. As such high-fidelity devices may not be available in low-income environments or financially weak surroundings, a simpler method of assessment is needed to provide feedback on CPR performance in these settings.In 1999, Graham et al. tested a scoring system based on simple observation as an inexpensive but effective method to assess CPR performance. The results suggested that an observation-based scoring system is an objective method to reflect the ability to perform BLS [18].

In the present study used a simple ten-point checklist modified after the Cardiff Test of BLS [19] to assess BLS performance recorded on video. The objective was to evaluate the checklist as a sufficient rating tool and an alternative instrument compared to SkillReporting software for CPR quality measurement using BLS training manikins.


The data used for this study were acquired within the emergency medicine course for first-year medical students during the first three weeks of their curriculum at the Medical School of RWTH in Aachen, Germany in 2013. In total, 278 first-year medical students were included in the study.

Clinical experts in the field of emergency medicine and medical education designed the checklist based on the Cardiff test of BLS (version 3.1). The Cardiff test lists ten points associated with the quality of CPR based on the established ERC Guidelines.

The study was approved by the ethics committee of the Medical School of RWTH Aachen, Germany (EK- 100/12).


Each participant was confronted with the same standardized scenario. They were expected to resuscitate a collapsed person represented by a BLS manikin (Resusci Anne™, Laerdal, Stavanger, Norway). None of the participants had received BLS training during their medical studies up to this point. Each student performance was recorded on videotape and performance data were obtained using Laerdal PC SkillReporting System Software (Version 2.4.1, Laerdal, Stavanger, Norway). The students were guided following a structured protocol and every student received exactly the same instructions. The scenario started equally every time, described as follows:

The participant was asked to enter a room in which a BLS manikin was lying on the floor with a zippered jacket covering the torso. No information about the scenario was provided in advance. The standardized text was read by the course instructor: “Imagine you are witnessing a person collapsing right in front of you. The manikin represents this person. There is no one else nearby. Please take all measures you would take if the manikin was a real person. Keep going until you receive a signal to stop.”

The performance was terminated 120 s after the first external chest compression (ECC). If the participant did not perform CPR, the scenario was stopped after 90 s. No further instructions were provided during the performance.

Measurement and data acquisition

The performance data collected during the assessment was listed in a tabular form. The following three measuring criteria were identified as congruent to the ERC guidelines [7] and used in this study to determine the quality of CPR:

  • ≥ 60% correct compression depth

  • Average compression rate of 100–120 min− 1

  • ≥ 60% compressions with complete release

Based on these criteria, the participant performances were assigned to four different categories based on the collected data and on how many of the criteria were met. The categories were color-coded and referred to a “traffic light classification.” An additional black category was defined for those who did not meet any of the criteria:

  • Green: all three criteria were met

  • Yellow: two of the three criteria were met

  • Red: one of the criteria were met

  • Black: none of the criteria we met

In order to compare the results regarding the quality of ECC assessed by either the Laerdal PC SkillReporting Software or by the checklist-based evaluation, the participant performances were recorded on video from the time that they entered the scenario until they were signaled to stop. Ten experienced BLS instructors were invited to rate the performance of every participant using the checklist. The raters were asked to use a nominal scale (1 = yes, 2 = no) to rate the criteria. The checklist rating criteria were defined as follows:

  • Undressed torso

  • Adequate minimum no-flow time (no longer than 2 s for two rescue breaths)

  • Correct hand position

  • Correct compression depth

  • Correct compression rate

  • Complete release between compressions

  • Arms kept straight

  • Vertical direction of compressions

  • No delay to start CPR

  • Compression-ventilation ratio of 30:2

The same standardized conditions were applied to the raters and the rating process. As a requirement, all raters had to be BLS instructors. The raters were instructed to observe each video for at least one minute and to evaluate the performance by means of the checklist. Soon after data collection, the video rating took place. None of the observers were involved in the training of the medical students whose performances were assessed since instructors assessing their own students reportedly tend to overestimate their competences [19]. The raters were informed in advance that the elements of the checklist were self-explanatory and no questions were answered during the evaluation process.

The Laerdal Rescue Anne with SkillReporting System Software assessed the following five items: correct hand position, correct compression depth and rate, complete release between compressions, and minimum no-flow time. After the study, all data were exported from the software.

At the time the study was performed, the current ERC Guidelines recommended an average compression rate of 100–120 min− 1, a compression depth of at least 50 mm, and complete chest recoil after each compression [20].

Statistical methods

The interrater reliability was investigated for every item on the checklist by means of joint probability of agreement (in %) as well as Light’s Kappa (multi-rater version of Cohen’s kappa) in order to determine the agreement between the raters as a quality feature of the checklist as a rating tool. An average Kappa across all rater pairs was determined for every item of the checklist (mean Light’s kappa).

In order to examine the validity of the checklist items, the results of the checklist-based evaluation were compared to the performance data assessed by the Laerdal PC SkillReporting Software also using the joint probability of agreement and Light’s Kappa.

Sensitivity and specificity (in %) were calculated for the “correct compression rate,” “correct compression depth,” and “complete release between compressions” criteria. For sensitivity calculations, the number of performances correctly detected by the raters as matching the criteria was set as the “true positives.” To identify the true positive rate (sensitivity), the proportion of true positives was calculated among all performances that were classified as correct by the Laerdal PC SkillReporting System. Thus, the specificity or true negative rate was defined as the proportion of performances not matching the criteria which were correctly identified as such by the raters.

To compare the results of performance data and checklist-based evaluation in terms of the traffic light categories, the “correct compression rate”, “correct compression depth” and “complete release between compressions” checklist criteria were also used to assign the performance to one of the traffic light categories. For one of the criteria to apply, the mean checklist value across all raters for that item had to be less than 1.5 as a nominal scale was used to evaluate the performance (1 = yes, 2 = no). Using the classification by traffic lights for both performance data and checklist-based evaluation, it was possible to identify the number of performances that were assigned to the same traffic light category by both methods.

All statistical analyses were performed using IBM SPSS Statistics for Windows and Mac, version 23.0 (Armonk, NY: IBM Corp.).


Study population

Of 278 potential participants, 152 were included in the study. All participants were first-year medical students at the medical faculty of RWTH Aachen University and had no relevant medical experience prior to their studies. Their mean age was 21.4 ± 3.9 years (range: 17–39). Among the participants, 67% were female, 26% were male, and 7% did not report their sex. One hundred and twenty-six subjects were excluded due to missing performance data, written consent, or video data.

Observed endpoints

Performance data

The distributions of participants across the traffic light categories showed that only a small number of students achieved an overall adequate CPR performance by fulfilling all criteria (n = 11). The yellow category (fulfilling two out of three criteria) consisted of 52 (34.5%) participants. The largest group was represented by the red category (n = 79; 52,3%) consisting of participants whose performance matched only one criterion. Within the red group, 77 of 79 students achieved a complete release between compressions, two showed sufficient compression depth, and < 60% showed a complete release. The black group (none of the criteria) contained nine participants (Table 1).

Table 1 Performance data according to traffic light category

Interrater reliability

There were considerable differences in the interrater reliability between the checklist items. While the items “undressed torso” (mean joint probability of agreement 94.9%; mean Kappa 0.866) and “compression-ventilation ratio of 30:2” (mean joint probability of agreement 85.3%; mean Kappa 0.630) had equal explicit measurements across all raters, the item “complete release between compressions” (mean joint probability of agreement 67.2%; mean Kappa 0.295) showed great variation (Table 2).

Table 2 Interrater reliability for all subjects (n = 152)

Matching rater evaluations and performance data

Comparison of the checklist-based evaluation by the raters with the performance data obtained by the Laerdal PC SkillReporting software revealed differences in the descriptive values of mean Light’s Kappa between items. The joint probabilities of agreement (%) between raters and software were close for the items (Table 3).

Table 3 Agreement between performance data and checklist-based evaluation for all subjects (n = 152)

Across all categories, the item “correct compression rate” showed the highest agreement between performance data and checklist-based evaluation (mean joint probability of agreement 72.6%; mean Kappa 0.41). The largest range was observed for the item “complete release between compressions” (mean joint probability of agreement 67.7%; range 47.3–82.7%).

The sensitivity and specificity of the different checklist items were also highest for the item “correct compression rate”, while the item “complete release between compressions” had the lowest sensitivity and specificity. Generally, the sensitivity was slightly higher than the specificity for all items (Table 3).

Regarding the item “compression rate”, a compression rate lower than 100 min− 1 was more often correctly identified as wrong (sensitivity mean: 90.0%; range: 77.1–97.1%) than a compression rate higher than 120 min− 1 (sensitivity mean: 38.4%; range: 17.1–87.8%).

Concerning the traffic light classification, out of all performances defined as “green” by the performance data (n = 11), 81.8% (n = 9) of the performances were also assigned to the green category using the checklist-based evaluation. In terms of the black category, 75.0% (n = 6) of the participants were allocated correctly using the checklist data.

In contrast, within the yellow category (n = 52), only 50.0% (n = 26) matched that category according to the checklist-based evaluation data. Within the red category (n = 79), the result was even lower (35.4%, n = 28) (Table 4).

Table 4 Distributions of traffic light categories by checklist-based evaluation within traffic light categories of performance data


This observational cohort study evaluated whether an observational checklist was an adequate assessment tool for BLS instructors to estimate the quality of a CPR performance.

The main result was that the use of the observational checklist appropriately distinguished between overall good and overall insufficient performances. This was demonstrated by the allocation of the participants to the green and the black categories based on the checklist in accordance with the objective performance data-based distribution. Regarding all adequate CPR performances (as defined by the skill reporter), 81.8% were identified as such by the checklist-based evaluation. In contrast, the low agreement between the performance data-based and the checklist-based allocation regarding the yellow and the red categories suggests that the use of the checklist is not suitable to differentiate between mediocre performances.

The study further indicated that crucial elements of CPR, such as minimum delay to start CPR, correct compression-ventilation ratio, and undressing the torso, were accurately assessable by simple observation, which is shown by the high interrater reliability. However, these aspects cannot be recorded by skill reporter systems. The low interrater reliability for complete release between compressions suggests that this item is not easily accurately identified by simple observation and benefits from SkillReporting software.

Furthermore, the comparison of the sensitivity and specificity suggests that correct performance was easier for the raters to identify, whereas incorrect performance was more difficult to detect.

Graham et al. also suggested that a simple scoring system is a valid method to assess CPR performances. Students were evaluated based on a 10-point checklist and were assigned penalty points when the element was performed incorrectly. The scoring system differed between minor, moderate, and serious errors in the number of penalty points. The participants were assigned only to “pass” or “fail” categories without distinguishing the quality of CPR. Their study presented observed and performance data but, unlike the present study, did not compare their results to objectively obtained data from SkillReporting software [18].

In a more recent study, Kim et al. also used a checklist-based evaluation to assess BLS performances in medical students. The checklist consisted of 11 items representing the BLS algorithm such as initial patient assessment and calling for help, as well as performing CPR, including compression-ventilation ratio and correct hand position as independent items, whereas compression rate and depth was a single item. The participants were assessed as “correct” or “incorrect” for each item and graded on a scale from 1 to 5 for the whole performance. Within their study, the assessment by BLS instructors was compared to self-assessment by the students, both using the same checklist. Interestingly, the analysis showed no significant differences between tutor and self-assessments [21].

Whether the checklist used within our study could also be used for adequate self-assessment by medical students or laypersons is a topic for further study. Additionally, the influence of the implied setting on the applicability of the checklist, for example, in different study populations, requires further investigation.

If the checklist-based evaluation was used to assess real cases of CPR in OHCA, it could be interesting to investigate whether the raters would evaluate performances differently if they were aware of the patient outcomes.

Another point of interest was how the raters are influenced while evaluating a CPR performance by means of the checklist. It is possible that a good performance for most items on the checklist might lead the rater to be more indulgent with an inaccurate performance for other items. In addition, an altogether poor performance could bias the rater to more negatively evaluate each criterion.

A low-tech feedback device such as the checklist used in the current study might be useful in the implementation of CPR training for large groups such as school classes, where high-fidelity manikins might not be available, for example, due to limited funds. Training schoolchildren in CPR is a highly effective method to improve bystander CPR and patient outcome in OHCA [22,23,24].


Due to the recording of the performance from only one perspective, some of the video data could not be assessed by the raters. This is a limitation to use the checklist to evaluate performances, but only if the rater is unable to directly observe the performance.

Most of the performances were inadequate because untrained lay persons were observed in this study. Having mainly negative performances makes false positive evaluations carry more weight than false negative ones. Due to that fact, both sensitivity and specificity have been calculated.

In terms of the traffic light categories, compression rates not between 100 and 120 min− 1 were identified as wrong based on the ERC guidelines. Thus, a compression rate of 121 was valued the same as a rate of 0. This example of two different performances not matching the previously determined criteria cannot have equally negative effects on patient outcome. In this particular case, the developed checklist might allow users to distinguish between the two since it is slightly more inaccurate and accepts performances with compression rates very close to the recommended range while also detecting inadequate compression rates with a high specificity.


A simple observational checklist can be used to assess BLS quality and identify sufficient and insufficient performances. In order to provide more detailed feedback concerning CPR, skill feedback devices may be useful in addition to the checklist. The checklist is a valuable assessment tool if high-tech feedback devices are not available or useful; for example, due to high numbers of participants in training groups or limited training time.



Basic life support


Cardiopulmonary resuscitation


External chest compression


Out-of-hospital cardiac arrest


  1. World Health Organization. Cardiovascular diseases (CVDs). Fact sheet No317, September 2012. 2012. Accessed 16 May 2018.

  2. Priori SG, Blomström-Lundqvist C, Mazzanti A, et al. 2015 ESC Guidelines for the management of patients with ventricular arrhythmias and the prevention of sudden cardiac death. Eur Heart J. 2015;36:2793–867.

    Article  Google Scholar 

  3. Zheng ZJ, Croft JB, Giles WH, Mensah GA. Sudden cardiac death in the United States, 1989 to 1998. Circulation. 2001;104:2158–63.

    Article  CAS  Google Scholar 

  4. Ornato JP, Becker LB, Weisfeldt ML, Wright BA. Cardiac arrest and resuscitation. Circulation. 2010;122:1876–9.

    Article  Google Scholar 

  5. Hasselqvist-Ax I, Riva G, Herlitz J, et al. Early cardiopulmonary resuscitation in out-of-hospital cardiac arrest. N Engl J Med. 2015;372:2307–15.

    Article  CAS  Google Scholar 

  6. Kanstad BK, Nilsen SA, Fredriksen K. CPR knowledge and attitude to performing bystander CPR among secondary school students in Norway. Resuscitation. 2011;82:1053–9.

    Article  CAS  Google Scholar 

  7. Greif R, Lockey AS, Conaghan P, et al. European resuscitation council guidelines for resuscitation 2015. Section 10. Education and implementation of resuscitation. Resuscitation. 2015;95:288–301.

    Article  Google Scholar 

  8. Perkins GD, Jacobs IG, Nadkarni VM, et al. Cardiac arrest and cardiopulmonary resuscitation outcome reports: update of the Utstein resuscitation registry templates for out-of-hospital cardiac arrest. Resuscitation. 2015;96:328–40.

    Article  Google Scholar 

  9. Kragholm K, Wissenberg M, Mortensen RN, et al. Bystander efforts and 1-year outcomes in out-of-hospital cardiac arrest. N Engl J Med. 2017;376:1737–47.

    Article  Google Scholar 

  10. Sopka S, Biermann H, Rossaint R, et al. Resuscitation training in small-group setting--gender matters. Scand J Trauma Resusc Emerg Med. 2013;21:30.

    Article  Google Scholar 

  11. Sasson C, Haukoos JS, Bond C, et al. Barriers and facilitators to learning and performing cardiopulmonary resuscitation in neighborhoods with low bystander cardiopulmonary resuscitation prevalence and high rates of cardiac arrest in Columbus, OH. Circ Cardiovasc Qual Outcomes. 2013;6:550–8.

    Article  Google Scholar 

  12. Van Raemdonck V, Monsieurs KG, Aerenhouts D, De Martelaer K. Teaching basic life support: a prospective randomized study on low-cost training strategies in secondary schools. Eur J Emerg Med. 2014;21:284–90.

    Article  Google Scholar 

  13. Bouland AJ, Risko N, Lawner BJ, et al. The price of a helping hand: modeling the outcomes and costs of bystander CPR. Prehospital Emerg Care. 2015;19:524–34.

    Article  Google Scholar 

  14. Friesen J, Patterson D, Munjal K. Cardiopulmonary resuscitation in resource-limited health systems-considerations for training and delivery. Prehosp Disaster Med. 2015;30:97–101.

    Article  Google Scholar 

  15. Nichol G, Huszti E, Birnbaum A, Mahoney B, Weisfeldt M, Travers A, et al. Cost-effectiveness of lay responder defibrillation for out-of-hospital cardiac arrest. Ann Emerg Med. 2009;54:226–35.

    Article  Google Scholar 

  16. Zapletal B, Greif R, Stumpf D, et al. Comparing three CPR feedback devices and standard BLS in a single rescuer scenario: a randomised simulation study. Resuscitation. 2014;85:560–6.

    Article  Google Scholar 

  17. Sopka S, Biermann H, Rossaint R, et al. Evaluation of a newly developed media-supported 4-step approach for basic life support training. Scand J Trauma Resusc Emerg Med. 2012;20:37.

    Article  Google Scholar 

  18. Graham CA, Lewis NF. A scoring system for the assessment of basic life support ability. Resuscitation. 2000;43:111–4.

    Article  CAS  Google Scholar 

  19. Whitfield RH, Newcombe RG, Woollard M. Reliability of the Cardiff test of basic life support and automated external defibrillation version 3.1. Resuscitation. 2003;59:291–314.

    Article  Google Scholar 

  20. Nolan JP, Soar J, Zideman DA, et al. European resuscitation council guidelines for resuscitation 2010 section 1. Executive summary. Resuscitation. 2010;81:1219–76.

    Article  Google Scholar 

  21. Kim SJ, Choi SH, Lee SW, Hong YS, Cho H. The analysis of self and tutor assessment in the skill of basic life support (BLS) and endotracheal intubation: focused on the discrepancy in assessment. Resuscitation. 2011;82:743–8.

    Article  Google Scholar 

  22. Böttiger BW, Semeraro F, Wingen S. “Kids save lives”: educating schoolchildren in cardiopulmonary resuscitation is a civic duty that needs support for implementation. J Am Heart Assoc. 2017;6(3):e005738.

  23. Plant N, Taylor K. How best to teach CPR to schoolchildren: a systematic review. Resuscitation. 2013;84:415–21.

    Article  Google Scholar 

  24. Abelairas-Gómez C, Rodríguez-Núñez A, Casillas-Cabana M, Romo-Pérez V, Barcala-Furelos R. Schoolchildren as life savers: at what age do they become strong enough? Resuscitation. 2014;85:814–9.

    Article  Google Scholar 

Download references


We thank the first-year students of the medical faculty at RWTH Aachen University, Aachen, Germany, and the ten BLS instructors for participating in this study.


This is an author-initiated observational cohort study. No funds were received.

Availability of data and materials

The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.

Author information

Authors and Affiliations



JvD, SS, and SB developed the study concept and design. SS and RR had the primary idea for this study. Data were collected by JvD. SS wrote the ethical approval. JvD was responsible for the data acquisition, analysis, and interpretation. LH supervised the statistical analysis. JvD drafted the article. SS, LV, SB, and RR provided input. HS, LV, and SS revised the article critically for important intellectual content. HS coordinated the submission procedure. All authors read and approved the final version of the manuscript. All authors agreed to be accountable for all aspects of the work.

Corresponding author

Correspondence to Hanna Schröder.

Ethics declarations

Ethics approval and consent to participate

The study was approved by the local ethics committee of the medical faculty at RWTH Aachen University (No. EK- 100/12). All participants and raters provided written consent to contribute to this study.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

van Dawen, J., Vogt, L., Schröder, H. et al. The role of a checklist for assessing the quality of basic life support performance: an observational cohort study. Scand J Trauma Resusc Emerg Med 26, 96 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: