Article Text

Measurement of perceptions of educational environment in evidence-based medicine
  1. Anne-Marie Bergh1,
  2. Jackie Grimbeek1,
  3. Win May2,
  4. A Metin Gülmezoglu3,
  5. Khalid S Khan4,
  6. Regina Kulier3,
  7. Robert C Pattinson1
  1. 1Medical Research Council Unit for Maternal and Infant Health Care Strategies, University of Pretoria, Pretoria, South Africa
  2. 2Department of Medical Education, Keck School of Medicine, University of Southern California, Los Angeles, California, USA
  3. 3World Health Organization, Geneva, Switzerland
  4. 4Women's Health Research Unit, The Blizard Institute, Barts and The London School of Medicine, Queen Mary, University of London, London, UK
  1. Correspondence to: Dr Anne-Marie Bergh
    Medical Research Council Unit for Maternal and Infant Health Care Strategies, University of Pretoria, Private Bag X323, Arcadia, Pretoria 0007, South Africa; anne-marie.bergh{at}


In recent years, there has been a renewed interest in measuring perceptions regarding different aspects of the medical educational environment. A reliable tool was developed for measuring perceptions of the educational environment as it relates to evidence-based medicine as part of a multicountry randomised controlled trial to evaluate the effectiveness of a clinically integrated evidence-based medicine course. Participants from 10 specialties completed the questionnaire. A working dataset of 518 observations was available. Two independent subsets of data were created for conducting an exploratory factor analysis (n=244) and a confirmatory factor analysis (n=274), respectively. The exploratory factor analysis yielded five 67-item definitive instruments, with five to nine dimensions; all resulted in acceptable explanations of the total variance (range 56.6–65.9%). In the confirmatory factor analysis phase, all goodness-of-fit measures were acceptable for all models (root mean square error of approximation ≤0.047; comparative fit index≥0.980; normed χ² ≤1.647; Bentler-Bonett normed fit index ≥0.951). The authors selected the factorisation with seven dimensions (factor-7 instrument) as the most useful on pragmatic grounds and named it Evidence-Based Medicine Educational Environment Measure 67 (EBMEEM-67). Cronbach's α for subscales ranged between 0.81 and 0.93. The subscales are: ‘Knowledge and learning materials’; ‘Learner support’; ‘General relationships and support’; ‘Institutional focus on EBM’; ‘Education, training and supervision’; ‘EBM application opportunities’; and ‘Affirmation of EBM environment’. The EBMEEM-67 can be a useful diagnostic and benchmarking tool for evaluating the perceptions of residents of the environment in which evidence-based medicine education takes place.

  • Medical Education & Training
  • Obstetrics
  • Gynaecology


The authors wish to thank all the trial and non-trial participants and the investigators of the trial for their continued support. The editorial support by Barbara English from the office of the Deputy Dean: Research of the University of Pretoria's Faculty of Health Sciences is acknowledged with appreciation.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Educational environment in medical education can be described as the context in which clinical staff and students teach and learn.1 This environment has also been associated with educational or learning climate2–5 and educational culture.5 According to Genn,4 ,5 climate is a manifestation of the concept of environment. In the past 20 years, several publications have appeared on the measurement of students’ perceptions of various types of medical educational environments. Besides more general descriptions of the medical education environment,4–7 environments have also been delimited to specific situations such as the operating theatre,8–10 general practice training11 and undergraduate12 ,13 and postgraduate3 ,14 ,15 training. An implicit aim of all education in healthcare settings is to produce an environment conducive to advanced and in-service learning.

Measuring students’ or doctors’ perceptions of the medical educational environment has a long history. Some of the older, widely used instruments are the Learning Environment Questionnaire (LEQ) and the Medical School Learning Environment Survey (MSLES).7 According to Schönrock-Adema et al,14 the lack of consensus about which concepts to measure may be explained by the absence of a common theoretical framework. They propose a framework with three broad domains for developing medical educational environment measures: goal orientation (content and aims of education in relation to personal development); relationships (open and friendly atmosphere and affiliation); and organisation/regulation (system maintenance and change). An instrument that has in the past 15 years been applied in a variety of settings across the world and that also caught our attention is the so-called Dundee Ready Education Environment Measure (DREEM).12 The DREEM sparked the development and validation of other tools measuring more specific postgraduate educational environments.8 ,10 ,11 ,15–22

The measurement of the educational environment was a secondary outcome of a randomised controlled trial conducted between March 2009 and November 2011 to evaluate the effectiveness of a clinically integrated EBM course for obstetrics and gynaecology residents.23 Ethical approval for the validation of the tool was received as part of the trial protocol. One of the points of departure in the trial was that the application of EBM was also influenced by the workplace climate or environment. Any teaching in EBM should therefore also be measured in terms of its ability to facilitate evidence-based practice in the broader clinical environment, going beyond imparting knowledge and skills. This paper reports on the development and validation of such a tool, which was administered before and after the intervention.


As the measurement of educational environment in the EBM education trial would be based on the perceptions of participants in postgraduate education, a survey design was considered appropriate for measuring attitudes and opinions.24 The study design comprised two phases. The first was to develop a draft instrument for measuring residents’ perceptions of their educational environment as it related to EBM, whereas the second focused on the validation of this tool as a secondary outcome measure in the EBM education trial. The development and validation process is depicted in figure 1.

Figure 1

Development and testing of the instrument.

Development of scales and items for a draft questionnaire

Two investigators (WM and A-MB) reviewed the literature on the development and validation of educational environment tools to identify and adapt potentially useful scales and items. They formulated provisional themes for taking the process forward, namely perceptions of: (1) learning opportunities, (2) self-learning (‘EBM competence’), (3) availability of learning resources, (4) teachers and teaching, supervision and support (‘EBM specific’), (5) EBM practice (‘EBM atmosphere’), (6) general atmosphere. These themes correspond to some extent with the three domains proposed by Schönrock-Adema et al14 to measure the medical educational environment. Themes 2 and 5 relate to goal direction; theme 6 to relationships and themes 1, 3 and 4 to organisation/regulation.

The manuals of the e-course of the randomised education trial were then studied and more items that would give feedback on the actual EBM practice of an institution were generated. Investigators in the trial also interacted with further inputs and comments. Two senior consultants involved in the teaching of EBM in obstetrics and gynaecology and in internal medicine at the University of Pretoria, South Africa were then requested to comment on the scope of the scales and items. The result was a preliminary instrument with 62 items. These were presented randomly in a questionnaire given for completion to five registrars (residents) in obstetrics and gynaecology and three in internal medicine at the University of Pretoria. An item-by-item discussion checked for any ambiguities or unclear statements and ranked the items according to importance. In this process, some items were dropped and others were split into more than one item to enhance clarity, yielding the second preliminary instrument with 73 items. These items were submitted to a group of five residents from obstetrics and gynaecology, three chief residents from internal medicine, two paediatric pulmonary fellows, two programme directors and one assistant programme director at the Keck School of Medicine of the University of Southern California. The 73 items of the previous version remained, with some further refinement in the wording. The English version of the third preliminary instrument was then presented at a working group of the EBM education trial23 investigators for further discussion. Eleven more items were added, of which 8 pertained to access, use and usefulness of the WHO Reproductive Health Library (RHL)1—amounting to 84 items. It was agreed that the RHL-related items would be used for trial purposes only and not during the data analysis exercise for developing an instrument.

An electronic version and a paper-based version of the questionnaire were generated to encourage greater participation in the validation process. Respondents could participate anonymously and participants with difficulty in accessing the Internet could complete the paper-based version. For the purposes of the trial, experts familiar with the EBM environment also translated the questionnaire into French, Spanish and Portuguese. One of the challenges in the development of the instrument was the different terminologies used in different countries. Eventually, we decided on the following as alternatives in the instrument: ‘registrars’/‘residents’ and ‘faculty’/‘consultants’. For the purposes of this paper, registrars and residents will be called ‘residents’ and consultants will be called ‘faculty’. The final instrument contains both sets of terminologies for readers wishing to adapt the tool according to their context (see online Supplement 1).

Study participants in the administration of the preliminary instrument

Using the factor analysis rule of thumb of 10 participants per item,25 760 participants were needed for this study and more participants had to be recruited beyond the trial candidates. To be able to generate a tool for use wider than obstetrics and gynaecology (the field of specialisation targeted in the randomised education trial23), we recruited ‘non-trial’ participants from other specialties also, namely anaesthesiology, otolaryngology, family medicine, general surgery, internal medicine, neurology, paediatrics, psychiatry and radiology. Trial participants came from Argentina, Brazil, the Democratic Republic of the Congo, India, Philippines, South Africa and Thailand; only their preintervention data were used. Non-trial data came from India, the Netherlands, Philippines, South Africa and the UK. There were participants from all specialty years, between years 1 and 6. Table 1 provides a summary of respondents according to country, specialty and year of study specialisation.

Table 1

Summary of respondents included in the analysis

Non-trial participants were recruited through three initiatives—a web-based questionnaire administered at the WHO (n=13), a paper-based initiative in the UK (n=34) and a paper-based initiative in South Africa (n=106) (December 2008–May 2009). The rest of the responses came from the trial dataset (n=410; March 2009–November 2010). The final set of raw data had 563 observations.

Preparation of data

After administering the preliminary instrument to trial and non-trial participants, the data were consolidated and scores were allocated as follows:

Strongly disagree=0; disagree=1; neutral=2; agree=3 and strongly agree=4.

Scores for items formulated in the negative were reversed and those pertaining to the use of the RHL excluded. The remaining 76 items were kept for analysis and the development of the instrument.

Participants responding to 46 or more of the 76 items were retained after performing a binomial analysis,26 which was applied to calculate the threshold for the minimum number of items to have significantly more than 50% of item responses per questionnaire. This resulted in the removal of 45 cases, leaving 518 observations in the final dataset. Data imputation was conducted in three consecutive phases to create a complete dataset:

  1. On a detailed level ((country) by (year of specialisation) by (specialty)), missing data were imputed by using the mode of the available data.

  2. On a second level, the imputation used the mode of the available data for the combinations ((country) by (year of specialisation)).

  3. On a more general level, all data still missing after step (B) were imputed by using the mode of the data available at the (country) level as substitution.

Outline of the data analysis process

Two independent datasets were created from the raw dataset of 518 observations through stratification according to (country) by (year of specialisation) by (specialty). Random systematic sampling resulted in 244 and 274 observations for an exploratory factor analysis (EFA) and a confirmatory factor analysis (CFA),27 respectively. As the data were measured on a Likert scale, polychoric correlations were calculated separately for the EFA and CFA datasets.28

The EFA dataset served as a basis to identify possible factor-constructs for which Cronbach's α was calculated to determine internal consistency and to drop items not correlating highly enough with the dimension under consideration. The factor models identified during the EFA and item analysis phases were subjected to a structural equation modelling (SEM) analysis during the CFA for possibly removing item(s) from dimension(s) and/or reallocating items to other dimension(s).

Descriptive statistics, polychoric correlations, EFA, Cronbach's α and generalisability coefficients were calculated using SAS V.9.3 software.29 EQS V.6.130 was mainly used for the CFA. Robust statistics (Satorra and Bentler31 from EQS) were used in place of the statistics based on normal theory, because of the large multivariate kurtosis as measured by the normalised estimate of Mardia's coefficient of kurtosis.32


Exploratory factor analysis

We applied oblique varimax rotation27 to the EFA dataset. Factor-contructs with 5, 6, 7, 8 and 9 dimensions were identified as models with practical application value (henceforth called ‘factor-5 model’, ‘factor-6 model’, etc), all of which presented with an acceptable explanation of the total variance (range 56.6–65.9%). The preliminary subscales (division of items) only had a few factor loadings below 0.50; the vast majority was above 0.50 and some even above 0.90. In a further investigation of all the factor models, items with a too small loading (<0.45) or not fitting logically under any dimension were removed.

Cronbach's α was calculated per dimension on the remaining items for each factor, which led to the further removal of items. At this stage, all factor models had 67 items, with Cronbach's α ranging between 0.76 and 0.96, which is well above the suggested rule of thumb value of 0.70.27 ,25 Details of Cronbach's α are provided in online Supplement 2. There was a generally decreasing trend in mean Cronbach's α from the factor-5 to factor-9 models (0.882, 0.868, 0.870, 0.855 and 0.841).

Confirmatory factor analysis

All five factor models (5–9) identified during the EFA phase were investigated by application of SEM during the CFA phase. Tests for goodness-of-fit27 ,25 ,33 and other aspects of the model resulted in very acceptable values (see table 2). The normed χ² was calculated at about 1.6 for the factor-5 to factor-8 models and at 1.3 for the factor-9 model, which can be regarded as a satisfactory fit. The Bentler-Bonett normed fit index (BBNFI) and the comparative fit index (CFI) were both relatively high. The estimated value and 90% CI of the root mean square error of approximation were also within the accepted limits (<0.05). Therefore, the goodness-of-fit measures were all acceptable. The maximum absolute standardised residuals observed were 0.43, 0.50, 0.35, 0.37 and 0.28 for the factor-5 to factor-9 models, respectively, with the residuals in general following a bell-shaped distribution. Using robust statistics, all items of all models loaded significantly on the dimension to which an item was allocated (p≤0.05).

Table 2

Goodness-of-fit summary for SEM measurement models

The Lagrange multiplier test for reallocating items from one dimension to another was applied simultaneously with the Wald test for dropping parameters. For all dimensions of all models, some items were reallocated, but none could be removed. Cronbach's α at this stage varied between 0.75 and 0.96.

The models fitted explained the data to a very good degree, and it can be assumed that the EFA procedure followed by the CFA modelling was very successful in developing a measurement model based on the available data.

Labelling of dimensions

Dimensions were named differently across the different models. The researchers formulated the label names to reflect the content of the items included under each label. A summary of dimension labels is given in table 3. Three dimension labels (A–C in table 3) feature in all models: ‘General relationships and support’, ‘EBM application opportunities’ and ‘Affirmation of EBM environment’. There was some difficulty in finding a construct for appropriately naming the last dimension above, as all (except one) of the negatively phrased items clustered together here. This dimension was therefore interpreted as an affirmation of a respondent's perception of the EBM environment.

Table 3

Dimension labels and number of items for the different factor models

One dimension (D) appears in four of the five models (‘Education, training and supervision’) and three (E, I, M) in three models (‘Institutional focus on EBM’, ‘Knowledge and learning materials’ and ‘Resources and communication’). Two dimensions, ‘Teachers’ (F) (2 models) and ‘Learner support’ (G) (1 model), are related to the dimension ‘Education, training and supervision’. The dimensions that were more fragmented or where items were grouped in different ways in different models relate to EBM knowledge, learning materials and resources, and communication (dimension labels H–M).


To the best of our knowledge, this is the first study to report on the validation of an educational environment or learning climate questionnaire that yielded more than one acceptable instrument for use and where the choice of model to propose for general use was dictated by pragmatic considerations. The goodness-of-fit results for all five SEMs were very satisfactory and no single model fitted the data markedly ‘better’ than the others (table 2). The factor-9 model may be considered the best statistically, but dimensions labelled as ‘Communication and resources’ and ‘Access to learning resources’ in this model have a semantic overlap and the division appears artificial (table 3). The same goes for the dimensions ‘Teachers’ and ‘Education, training and supervision’.

The factor-7 and factor-8 models appear to be the favoured models from a practical point of view, because of a more balanced distribution of items across the different dimensions. If the dimension labels of the factor-7 and factor-8 models are compared, the labels of the factor-7 model appear to be more ‘neatly’ divided, whereas there is some semantic overlap between three dimensions in the factor-8 model, namely ‘Knowledge and learning materials’, ‘Access to learning materials and teachers’ and ‘Resources and communication’. The factor-5 and factor-6 models have too many items in some of the dimensions.

We therefore propose a tool to measure the education environment based on the factor-7 model, which also had a slightly improved Cronbach's α mean value (0.870) compared to the factor-6 model (0.868). Following the naming of other instruments, the 67-item tool is called the Evidence-Based Medicine Educational Environment Measure 67 (EBMEEM-67) and has the following subscales:

  1. Knowledge and learning materials (8 items)

  2. Learner support (10 items)

  3. General relationships and support (8 items)

  4. Institutional focus on EBM (14 items)

  5. Education, training and supervision (9 items)

  6. EBM application opportunities (12 items)

  7. Affirmation of EBM environment (6 items)

Table 4 contains a summary of the subscales with their items. A user-friendly format of the complete tool and the instructions for use are attached as an online supplementary file (Supplement 1).

Table 4

Subscales and items per subscale for the EBMEEM-67 tool (CFA data, n=274)

Closer investigation of the subscales and individual items revealed a large degree of correspondence with the three-domain framework of Schönrock-Adema et al.14 Table 5 gives an overview of the similarities.

Table 5

Comparison of an existing theoretical framework with the subscales and items of the EBMEEM-67 instrument

Compared to other instruments measuring some aspect of the medical educational environment, the EBMEEM-67 has a higher internal consistency for its subscales, with Cronbach's α ranging from 0.81 to 0.93. The psychometric properties of the instruments reported in the literature are summarised in an online supplementary file. The internal consistency of the instrument supports the rigorous process followed in the development and validation of the tool. Items were developed and refined by means of a review of the existing tools and reviews by residents, field experts and the trial study group. The study used the EFA and CFA procedures, with all results for both phases being statistically acceptable and at least significant at a 5% level, where applicable. The EFA was accomplished by factor analysis followed by a varimax oblique rotation to enhance the interpretation of the different dimensions. Cronbach's α demonstrated internal consistency. SEM was successfully applied to the CFA data. Polychoric correlations formed the basis of the correlation matrices for the EFA and CFA analyses. The results of a generalisability study34 ,35 for the EBMEEM-67 showed that mean absolute and relative coefficients of above 0.80 may be expected, which confirmed its effectiveness in measuring the study populations.

Potential strengths and limitations of the study

Respondents were recruited from nine different countries spanning 10 different specialties and 6 years of study, which strengthens the generalisation of the instrument. The wider application of the instrument was confirmed by the generalisability study. Our study fell somewhat short of the general rule requiring 5–10 respondents per item to successfully apply an EFA or CFA procedure.25 Calculating polychoric correlations and using robust statistics during the CFA phase somewhat rectified the lack of a larger sample. No statistical explanation could be found for virtually all the negatively formulated items loading together on the seventh subscale. King36 cites examples of studies with a similar tendency. It also appears as if subscale 7 may contain a few items (eg, 6 and 57) that do not fit well into the theoretical framework proposed by Schönrock-Adema et al.14 Lastly, the sample was biased towards a very large representation of participants with obstetrics and gynaecology.


A tool, the EBMEEM-67, was successfully developed for soliciting the perceptions of residents on their EBM educational environment. This tool can be recommended for use, especially with residents in obstetrics and gynaecology. The EBMEEM-67 can be applied within one institution or department or across sites for benchmarking purposes, cross-sectionally or longitudinally. Cross-sectional investigations can be undertaken within one department across all years of study or by comparing results from different departments or from different training sites at a particular point in time. Longitudinally, the EBMEEM-67 can be used for before-and-after comparison in EBM education intervention studies, or for following the same cohort of residents over the different years of study. Administering the EBMEEM-67 in combination with other tools measuring the educational or learning environments, such as the PHEEM15 or D-RECT,3 could serve a useful diagnostic purpose.

The number of items remaining in the EBMEEM-67 is quite high for use in settings where time and motivation to complete an instrument of this nature are of the essence. A statistical process of reducing the number of items is currently underway. It is proposed that a thorough generalisability and decision (G&D) study34 ,35 be conducted during this follow-up. Further analyses should also be carried out to interpret the results of our study in relation to the theory behind the measurement of the education environment.14


The authors wish to thank all the trial and non-trial participants and the investigators of the trial for their continued support. The editorial support by Barbara English from the office of the Deputy Dean: Research of the University of Pretoria's Faculty of Health Sciences is acknowledged with appreciation.


View Abstract

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Funding The randomised controlled trial to evaluate the effectiveness of a clinically integrated evidence-based medicine course was funded by the UNDP/UNFPA/WHO/World Bank Special Programme of Research, Development and Research Training in Human Reproduction, Department of Reproductive Health and Research, WHO.

  • Competing interests A technical report on the validation process of the five definitive instruments was submitted to JAMA as part of their requirements for publishing the results of the e-learning trial. Some of the data for this study were derived from that report. JG was remunerated by the WHO and the MRC Unit for Maternal and Infant Health Care Strategies for statistical services.