Study design and ethics statement
This study has a prospective design in order to analyse the impact of written peer assessment based on a quantitative questionnaire about the lecturers’ performances in a lecture series in emergency medicine.
As stated by the Ethics board of the medical faculty of J.W. Goethe University Hospital, Frankfurt, Germany, ethical approval was not required for this study. The research of educational methods is required in the regulations on the licence to practice medicine in Germany and is supported by the medical faculty.
Participants
The study participants were physicians from different disciplines who as part of their function as a medical teacher participate as lecturers in the lecture series on emergency medicine for undergraduate medical students at Johann Wolfgang Goethe University, Frankfurt/Main, Germany.
Data were obtained from all lecturers regarding age, years of lecturing experience, and training in medical education (e.g. Instructor training). Prior to the beginning of this study, all of the participants provided written informed consent to participate in this study and to be videotaped during their lectures.
Study protocol
The analysed lecture series is part of the obligatory curriculum of emergency medicine for undergraduate medical students at Frankfurt Medical School. The emergency medicine curriculum consists of a longitudinally structured program with educational units in nearly all semesters of the four years of clinical studies in the six-year program, a structure that is designed to regularly reinforce and increase the depth of understanding of the basic theoretical and practical skills during clinical training [17],[18].
The interdisciplinary lecture series is scheduled for 3rd year undergraduate medical students, taking place once per year over an 8-week period from January to March. During this period, the lectures are scheduled twice per week. The lectures cover the main cardinal symptoms of in-hospital as well as out-of-hospital emergency medicine with its algorithm-based treatment and management. Furthermore, topics such as team work and the management of human resources and medical errors are integrated. Depending on the extent of the topic, a single lecture lasts 45 minutes (n = 10) or 90 minutes (n = 11). Four of the 90-minute lectures are conducted by two lecturers together in an interdisciplinary approach. Resulting in a total of 21 lectures.
The students’ attendance of the lectures is optional. However, the lecture series ends with an obligatory 20-item multiple choice examination. Passing the examination is a prerequisite for participating in additional emergency medicine curriculum.
Measurement
The study measurements took place from January to March 2011 (lecture series 1) and January to March 2012 (lecture series 2). Two months before the second lecture series, all of the participating lecturers received standardised written peer feedback on their lecturing performance. For the peer feedback, two cameras videotaped each lecture. A fixed camera in the back of the lecture hall captured both the slides and the lecturer in the auditorium. The second camera focused directly on the lecturer to capture gestures and facial expressions. The lecturer’s talk was recorded via a microphone tethered to the lecture hall camera.
Each lecture was transcribed into a timeline covering the timing of the different section of each lecture, e.g. introduction and presentation of learning objectives, as well as the existence and duration of interactive parts, e.g. a question and answer section.
In the second step, each lecture was viewed independently by two peer reviewers using a standardized assessment instrument to provide written documentation and feedback. The video reviewer room was equipped with a large TV screen which could display video recordings from both cameras simultaneously on a split screen with optimized tone.
The assessment instrument was based on the criteria defined in existing literature regarding effective lecturing behaviours, skills, and characteristics [1],[6],[7],[9],[12],[19]-[21] and the validated peer assessment instrument for lectures reported by Newman et al. [14],[22]. The 21-item instrument is divided into three categories: content/structure (10 items), visualisation (5 items), delivery (6 items) (Figures 1, 2, 3).
Each item was rated on a 5-point Likert Scale (from 5 = excellent demonstration to 1 = does not demonstrate/present/poor) with descriptive benchmarks for the excellent (5), adequate (3) and poor performance (1) rating levels [14],[22]. Furthermore, areas of strength were noted, and suggestions for improving weaknesses in lecturing performance were made.
All 4 reviewers were physicians with training in emergency medicine and specific didactic training (postgraduate Master of Medical Education (MME) or currently in a MME program). Herewith, they were acquainted with the assessment instrument because they used the instrument to assess fellow students’ presentations during their postgraduate studies. For this study, all of the raters received an additional 3-hour training session, watching several 15-min examples of previous lectures. They shared their scores and discussed the observed behaviours that had persuaded them to choose a particular performance score for each assessment item. Proper training of the raters is crucial to reduce variability in the instrument’s inter-rater agreement measure by increasing accuracy and consistency of performance assessment ratings [14],[22]. During the training, the raters learned to avoid common rater errors (e.g. the halo effect and central tendency) and discussed behaviours indicative of each performance dimension until a consensus was reached [21],[23]. Each lecture was reviewed by two raters. The ratings were analysed as described in the ‘data analysis’ section.
The students were regularly asked to evaluate each lecture in emergency medicine with a 3-item questionnaire (overall lecture quality, didactics and delivery/presentation) on a voluntary basis at the end of each lecture using a 5-point Likert scale. These evaluations were used to analyse changes in the lecturers’ evaluations.
In November 2011, two months prior to the beginning of the next lecture series, each lecturer participating in this study received a copy of the lecture observation schedule, the assessment instrument including the written feedback of the raters, and the students’ evaluations.
Each lecture was recorded as described for the first part of the study. The reviewer training, review process and student evaluations were repeated for the second round as described above.
Data analysis
The statistical analysis was performed using Microsoft Excel for the epidemiological data and evaluation and SPSS 17 for the checklist results. Once Gaussian distribution of the data was verified, the values were presented as the mean ± standard deviation. The Kappa coefficient was computed to determine the inter-rater reliability. The differences in the scores between both groups (no didactic training versus didactic training) were analysed using Student’s t-test for independent samples. The differences between the ratings prior to and after the interventions were analysed using Student’s t-test for dependent samples.