Measurement of perceptions of educational environment in evidence-based medicine
================================================================================

* Anne-Marie Bergh
* Jackie Grimbeek
* Win May
* A Metin Gülmezoglu
* Khalid S Khan
* Regina Kulier
* Robert C Pattinson

*   Medical Education & Training
*   Obstetrics
*   Gynaecology

## Introduction

Educational environment in medical education can be described as the context in which clinical staff and students teach and learn.1 This environment has also been associated with educational or learning climate2–5 and educational culture.5 According to Genn,4 ,5 climate is a manifestation of the concept of environment. In the past 20 years, several publications have appeared on the measurement of students’ perceptions of various types of medical educational environments. Besides more general descriptions of the medical education environment,4–7 environments have also been delimited to specific situations such as the operating theatre,8–10 general practice training11 and undergraduate12 ,13 and postgraduate3 ,14 ,15 training. An implicit aim of all education in healthcare settings is to produce an environment conducive to advanced and in-service learning.

Measuring students’ or doctors’ perceptions of the medical educational environment has a long history. Some of the older, widely used instruments are the Learning Environment Questionnaire (LEQ) and the Medical School Learning Environment Survey (MSLES).7 According to Schönrock-Adema *et al*,14 the lack of consensus about which concepts to measure may be explained by the absence of a common theoretical framework. They propose a framework with three broad domains for developing medical educational environment measures: goal orientation (content and aims of education in relation to personal development); relationships (open and friendly atmosphere and affiliation); and organisation/regulation (system maintenance and change). An instrument that has in the past 15 years been applied in a variety of settings across the world and that also caught our attention is the so-called Dundee Ready Education Environment Measure (DREEM).12 The DREEM sparked the development and validation of other tools measuring more specific postgraduate educational environments.8 ,10 ,11 ,15–22

The measurement of the educational environment was a secondary outcome of a randomised controlled trial conducted between March 2009 and November 2011 to evaluate the effectiveness of a clinically integrated *EBM* course for obstetrics and gynaecology residents.23 Ethical approval for the validation of the tool was received as part of the trial protocol. One of the points of departure in the trial was that the application of *EBM* was also influenced by the workplace climate or environment. Any teaching in *EBM* should therefore also be measured in terms of its ability to facilitate evidence-based practice in the broader clinical environment, going beyond imparting knowledge and skills. This paper reports on the development and validation of such a tool, which was administered before and after the intervention.

## Method

As the measurement of educational environment in the *EBM* education trial would be based on the perceptions of participants in postgraduate education, a survey design was considered appropriate for measuring attitudes and opinions.24 The study design comprised two phases. The first was to develop a draft instrument for measuring residents’ perceptions of their educational environment as it related to *EBM*, whereas the second focused on the validation of this tool as a secondary outcome measure in the *EBM* education trial. The development and validation process is depicted in figure 1.

![Figure 1](http://ebm.bmj.com/https://ebm.bmj.com/content/ebmed/19/4/123/F1.medium.gif)

[Figure 1](http://ebm.bmj.com/content/19/4/123/F1)

Figure 1 
Development and testing of the instrument.

### Development of scales and items for a draft questionnaire

Two investigators (WM and A-MB) reviewed the literature on the development and validation of educational environment tools to identify and adapt potentially useful scales and items. They formulated provisional themes for taking the process forward, namely perceptions of: (1) learning opportunities, (2) self-learning (‘EBM competence’), (3) availability of learning resources, (4) teachers and teaching, supervision and support (‘EBM specific’), (5) *EBM* practice (‘EBM atmosphere’), (6) general atmosphere. These themes correspond to some extent with the three domains proposed by Schönrock-Adema *et al*14 to measure the medical educational environment. Themes 2 and 5 relate to goal direction; theme 6 to relationships and themes 1, 3 and 4 to organisation/regulation.

The manuals of the e-course of the randomised education trial were then studied and more items that would give feedback on the actual *EBM* practice of an institution were generated. Investigators in the trial also interacted with further inputs and comments. Two senior consultants involved in the teaching of *EBM* in obstetrics and gynaecology and in internal medicine at the University of Pretoria, South Africa were then requested to comment on the scope of the scales and items. The result was a preliminary instrument with 62 items. These were presented randomly in a questionnaire given for completion to five registrars (residents) in obstetrics and gynaecology and three in internal medicine at the University of Pretoria. An item-by-item discussion checked for any ambiguities or unclear statements and ranked the items according to importance. In this process, some items were dropped and others were split into more than one item to enhance clarity, yielding the second preliminary instrument with 73 items. These items were submitted to a group of five residents from obstetrics and gynaecology, three chief residents from internal medicine, two paediatric pulmonary fellows, two programme directors and one assistant programme director at the Keck School of Medicine of the University of Southern California. The 73 items of the previous version remained, with some further refinement in the wording. The English version of the third preliminary instrument was then presented at a working group of the *EBM* education trial23 investigators for further discussion. Eleven more items were added, of which 8 pertained to access, use and usefulness of the WHO Reproductive Health Library (RHL)1—amounting to 84 items. It was agreed that the RHL-related items would be used for trial purposes only and not during the data analysis exercise for developing an instrument.

An electronic version and a paper-based version of the questionnaire were generated to encourage greater participation in the validation process. Respondents could participate anonymously and participants with difficulty in accessing the Internet could complete the paper-based version. For the purposes of the trial, experts familiar with the *EBM* environment also translated the questionnaire into French, Spanish and Portuguese. One of the challenges in the development of the instrument was the different terminologies used in different countries. Eventually, we decided on the following as alternatives in the instrument: ‘registrars’/‘residents’ and ‘faculty’/‘consultants’. For the purposes of this paper, registrars and residents will be called ‘residents’ and consultants will be called ‘faculty’. The final instrument contains both sets of terminologies for readers wishing to adapt the tool according to their context (see online Supplement 1).

### Study participants in the administration of the preliminary instrument

Using the factor analysis rule of thumb of 10 participants per item,25 760 participants were needed for this study and more participants had to be recruited beyond the trial candidates. To be able to generate a tool for use wider than obstetrics and gynaecology (the field of specialisation targeted in the randomised education trial23), we recruited ‘non-trial’ participants from other specialties also, namely anaesthesiology, otolaryngology, family medicine, general surgery, internal medicine, neurology, paediatrics, psychiatry and radiology. Trial participants came from Argentina, Brazil, the Democratic Republic of the Congo, India, Philippines, South Africa and Thailand; only their preintervention data were used. Non-trial data came from India, the Netherlands, Philippines, South Africa and the UK. There were participants from all specialty years, between years 1 and 6. Table 1 provides a summary of respondents according to country, specialty and year of study specialisation.

View this table:
[Table 1](http://ebm.bmj.com/content/19/4/123/T1)

Table 1 
Summary of respondents included in the analysis

Non-trial participants were recruited through three initiatives—a web-based questionnaire administered at the WHO (n=13), a paper-based initiative in the UK (n=34) and a paper-based initiative in South Africa (n=106) (December 2008–May 2009). The rest of the responses came from the trial dataset (n=410; March 2009–November 2010). The final set of raw data had 563 observations.

### Preparation of data

After administering the preliminary instrument to trial and non-trial participants, the data were consolidated and scores were allocated as follows:

Strongly disagree=0; disagree=1; neutral=2; agree=3 and strongly agree=4.

Scores for items formulated in the negative were reversed and those pertaining to the use of the RHL excluded. The remaining 76 items were kept for analysis and the development of the instrument.

Participants responding to 46 or more of the 76 items were retained after performing a binomial analysis,26 which was applied to calculate the threshold for the minimum number of items to have significantly more than 50% of item responses per questionnaire. This resulted in the removal of 45 cases, leaving 518 observations in the final dataset. Data imputation was conducted in three consecutive phases to create a complete dataset: 

1.  On a detailed level ((*country*) by (*year of specialisation*) by (*specialty*)), missing data were imputed by using the *mode* of the available data.

2.  On a second level, the imputation used the *mode* of the available data for the combinations ((*country*) by (*year of specialisation*)).

3.  On a more general level, all data still missing after step (B) were imputed by using the *mode* of the data available at the (*country*) level as substitution.

### Outline of the data analysis process

Two independent datasets were created from the raw dataset of 518 observations through stratification according to (*country*) by (*year of specialisation*) by (*specialty*). Random systematic sampling resulted in 244 and 274 observations for an exploratory factor analysis (EFA) and a confirmatory factor analysis (CFA),27 respectively. As the data were measured on a Likert scale, polychoric correlations were calculated separately for the EFA and CFA datasets.28

The EFA dataset served as a basis to identify possible factor-constructs for which Cronbach's α was calculated to determine internal consistency and to drop items not correlating highly enough with the dimension under consideration. The factor models identified during the EFA and item analysis phases were subjected to a structural equation modelling (SEM) analysis during the CFA for possibly removing item(s) from dimension(s) and/or reallocating items to other dimension(s).

Descriptive statistics, polychoric correlations, EFA, Cronbach's α and generalisability coefficients were calculated using SAS V.9.3 software.29 EQS V.6.130 was mainly used for the CFA. Robust statistics (Satorra and Bentler31 from EQS) were used in place of the statistics based on normal theory, because of the large multivariate kurtosis as measured by the normalised estimate of Mardia's coefficient of kurtosis.32

## Results

### Exploratory factor analysis

We applied oblique varimax rotation27 to the EFA dataset. Factor-contructs with 5, 6, 7, 8 and 9 dimensions were identified as models with practical application value (henceforth called ‘factor-5 model’, ‘factor-6 model’, etc), all of which presented with an acceptable explanation of the total variance (range 56.6–65.9%). The preliminary subscales (division of items) only had a few factor loadings below 0.50; the vast majority was above 0.50 and some even above 0.90. In a further investigation of all the factor models, items with a too small loading (<0.45) or not fitting logically under any dimension were removed.

Cronbach's α was calculated per dimension on the remaining items for each factor, which led to the further removal of items. At this stage, all factor models had 67 items, with Cronbach's α ranging between 0.76 and 0.96, which is well above the suggested rule of thumb value of 0.70.27 ,25 Details of Cronbach's α are provided in online Supplement 2. There was a generally decreasing trend in mean Cronbach's α from the factor-5 to factor-9 models (0.882, 0.868, 0.870, 0.855 and 0.841).

### Confirmatory factor analysis

All five factor models (5–9) identified during the EFA phase were investigated by application of SEM during the CFA phase. Tests for goodness-of-fit27 ,25 ,33 and other aspects of the model resulted in very acceptable values (see table 2). The normed χ² was calculated at about 1.6 for the factor-5 to factor-8 models and at 1.3 for the factor-9 model, which can be regarded as a satisfactory fit. The Bentler-Bonett normed fit index (BBNFI) and the comparative fit index (CFI) were both relatively high. The estimated value and 90% CI of the root mean square error of approximation were also within the accepted limits (<0.05). Therefore, the goodness-of-fit measures were all acceptable. The maximum absolute standardised residuals observed were 0.43, 0.50, 0.35, 0.37 and 0.28 for the factor-5 to factor-9 models, respectively, with the residuals in general following a bell-shaped distribution. Using robust statistics, all items of all models loaded significantly on the dimension to which an item was allocated (p≤0.05).

View this table:
[Table 2](http://ebm.bmj.com/content/19/4/123/T2)

Table 2 
Goodness-of-fit summary for SEM measurement models

The Lagrange multiplier test for reallocating items from one dimension to another was applied simultaneously with the Wald test for dropping parameters. For all dimensions of all models, some items were reallocated, but none could be removed. Cronbach's α at this stage varied between 0.75 and 0.96.

The models fitted explained the data to a very good degree, and it can be assumed that the EFA procedure followed by the CFA modelling was very successful in developing a measurement model based on the available data.

### Labelling of dimensions

Dimensions were named differently across the different models. The researchers formulated the label names to reflect the content of the items included under each label. A summary of dimension labels is given in table 3. Three dimension labels (A–C in table 3) feature in all models: ‘General relationships and support’, ‘EBM application opportunities’ and ‘Affirmation of EBM environment’. There was some difficulty in finding a construct for appropriately naming the last dimension above, as all (except one) of the negatively phrased items clustered together here. This dimension was therefore interpreted as an affirmation of a respondent's perception of the *EBM* environment.

View this table:
[Table 3](http://ebm.bmj.com/content/19/4/123/T3)

Table 3 
Dimension labels and number of items for the different factor models

One dimension (D) appears in four of the five models (‘Education, training and supervision’) and three (E, I, M) in three models (‘Institutional focus on EBM’, ‘Knowledge and learning materials’ and ‘Resources and communication’). Two dimensions, ‘Teachers’ (F) (2 models) and ‘Learner support’ (G) (1 model), are related to the dimension ‘Education, training and supervision’. The dimensions that were more fragmented or where items were grouped in different ways in different models relate to *EBM* knowledge, learning materials and resources, and communication (dimension labels H–M).

## Discussion

To the best of our knowledge, this is the first study to report on the validation of an educational environment or learning climate questionnaire that yielded more than one acceptable instrument for use and where the choice of model to propose for general use was dictated by pragmatic considerations. The goodness-of-fit results for all five SEMs were very satisfactory and no single model fitted the data markedly ‘better’ than the others (table 2). The factor-9 model may be considered the best statistically, but dimensions labelled as ‘Communication and resources’ and ‘Access to learning resources’ in this model have a semantic overlap and the division appears artificial (table 3). The same goes for the dimensions ‘Teachers’ and ‘Education, training and supervision’.

The factor-7 and factor-8 models appear to be the favoured models from a practical point of view, because of a more balanced distribution of items across the different dimensions. If the dimension labels of the factor-7 and factor-8 models are compared, the labels of the factor-7 model appear to be more ‘neatly’ divided, whereas there is some semantic overlap between three dimensions in the factor-8 model, namely ‘Knowledge and learning materials’, ‘Access to learning materials and teachers’ and ‘Resources and communication’. The factor-5 and factor-6 models have too many items in some of the dimensions.

We therefore propose a tool to measure the education environment based on the factor-7 model, which also had a slightly improved Cronbach's α mean value (0.870) compared to the factor-6 model (0.868). Following the naming of other instruments, the 67-item tool is called the Evidence-Based Medicine Educational Environment Measure 67 (EBMEEM-67) and has the following subscales: 

1.  Knowledge and learning materials (8 items)

2.  Learner support (10 items)

3.  General relationships and support (8 items)

4.  Institutional focus on *EBM* (14 items)

5.  Education, training and supervision (9 items)

6.  *EBM* application opportunities (12 items)

7.  Affirmation of *EBM* environment (6 items)

Table 4 contains a summary of the subscales with their items. A user-friendly format of the complete tool and the instructions for use are attached as an online supplementary file (Supplement 1).

View this table:
[Table 4](http://ebm.bmj.com/content/19/4/123/T4)

Table 4 
Subscales and items per subscale for the EBMEEM-67 tool (CFA data, n=274)

Closer investigation of the subscales and individual items revealed a large degree of correspondence with the three-domain framework of Schönrock-Adema *et al*.14 Table 5 gives an overview of the similarities.

View this table:
[Table 5](http://ebm.bmj.com/content/19/4/123/T5)

Table 5 
Comparison of an existing theoretical framework with the subscales and items of the EBMEEM-67 instrument

Compared to other instruments measuring some aspect of the medical educational environment, the EBMEEM-67 has a higher internal consistency for its subscales, with Cronbach's α ranging from 0.81 to 0.93. The psychometric properties of the instruments reported in the literature are summarised in an online supplementary file. The internal consistency of the instrument supports the rigorous process followed in the development and validation of the tool. Items were developed and refined by means of a review of the existing tools and reviews by residents, field experts and the trial study group. The study used the EFA and CFA procedures, with all results for both phases being statistically acceptable and at least significant at a 5% level, where applicable. The EFA was accomplished by factor analysis followed by a varimax oblique rotation to enhance the interpretation of the different dimensions. Cronbach's α demonstrated internal consistency. SEM was successfully applied to the CFA data. Polychoric correlations formed the basis of the correlation matrices for the EFA and CFA analyses. The results of a generalisability study34 ,35 for the EBMEEM-67 showed that mean absolute and relative coefficients of above 0.80 may be expected, which confirmed its effectiveness in measuring the study populations.

### Potential strengths and limitations of the study

Respondents were recruited from nine different countries spanning 10 different specialties and 6 years of study, which strengthens the generalisation of the instrument. The wider application of the instrument was confirmed by the generalisability study. Our study fell somewhat short of the general rule requiring 5–10 respondents per item to successfully apply an EFA or CFA procedure.25 Calculating polychoric correlations and using robust statistics during the CFA phase somewhat rectified the lack of a larger sample. No statistical explanation could be found for virtually all the negatively formulated items loading together on the seventh subscale. King36 cites examples of studies with a similar tendency. It also appears as if subscale 7 may contain a few items (eg, 6 and 57) that do not fit well into the theoretical framework proposed by Schönrock-Adema *et al*.14 Lastly, the sample was biased towards a very large representation of participants with obstetrics and gynaecology.

## Conclusion

A tool, the EBMEEM-67, was successfully developed for soliciting the perceptions of residents on their *EBM* educational environment. This tool can be recommended for use, especially with residents in obstetrics and gynaecology. The EBMEEM-67 can be applied within one institution or department or across sites for benchmarking purposes, cross-sectionally or longitudinally. Cross-sectional investigations can be undertaken within one department across all years of study or by comparing results from different departments or from different training sites at a particular point in time. Longitudinally, the EBMEEM-67 can be used for before-and-after comparison in *EBM* education intervention studies, or for following the same cohort of residents over the different years of study. Administering the EBMEEM-67 in combination with other tools measuring the educational or learning environments, such as the PHEEM15 or D-RECT,3 could serve a useful diagnostic purpose.

The number of items remaining in the EBMEEM-67 is quite high for use in settings where time and motivation to complete an instrument of this nature are of the essence. A statistical process of reducing the number of items is currently underway. It is proposed that a thorough generalisability and decision (G&D) study34 ,35 be conducted during this follow-up. Further analyses should also be carried out to interpret the results of our study in relation to the theory behind the measurement of the education environment.14

## Acknowledgments

The authors wish to thank all the trial and non-trial participants and the investigators of the trial for their continued support. The editorial support by Barbara English from the office of the Deputy Dean: Research of the University of Pretoria's Faculty of Health Sciences is acknowledged with appreciation.

## Footnotes

*   **Funding** The randomised controlled trial to evaluate the effectiveness of a clinically integrated evidence-based medicine course was funded by the UNDP/UNFPA/WHO/World Bank Special Programme of Research, Development and Research Training in Human Reproduction, Department of Reproductive Health and Research, WHO.

*   Competing interests A technical report on the validation process of the five definitive instruments was submitted to JAMA as part of their requirements for publishing the results of the e-learning trial. Some of the data for this study were derived from that report. JG was remunerated by the WHO and the MRC Unit for Maternal and Infant Health Care Strategies for statistical services.

## References

1.  Kulier R, Khan KS, Gülmezoglu AM, et al. A cluster randomized controlled trial to evaluate the effectiveness of the clinically integrated RHL evidence-based medicine course. Reprod Health 2010;7:8.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1186/1742-4755-7-8&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=20470382&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 

2.  Roff S, McAleer S. What is education climate? Med Teach 2001;23:333–4.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1080/01421590120063312&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=12098377&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 

3.  Boor K, Van der Vleuten C, Teunissen P, et al. Development and analysis of D-RECT, an instrument measuring residents’ learning climate. Med Teach 2011;33:820–7.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.3109/0142159X.2010.541533&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=21355691&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 

4.  Genn JM. AMEE Medical Education Guide No. 23 (Part 1): curriculum, environment, climate, quality and change in medical education—a unifying perspective. Med Teach 2001;23:337–44.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1080/01421590120063330&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=12098379&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 
    
    [Web of Science](http://ebm.bmj.com/lookup/external-ref?access_num=000169898700003&link_type=ISI) 

5.  Genn JM. AMEE Medical Education Guide No. 23 (Part 2): curriculum, environment, climate, quality and change in medical education—a unifying perspective. Med Teach 2001;23:345–54.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1080/01421590120057012&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=12098380&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 
    
    [Web of Science](http://ebm.bmj.com/lookup/external-ref?access_num=000169898700004&link_type=ISI) 

6.  Soemantri D, Herrera C, Riquelme A. Measuring the educational environment in health professions studies: a systematic review. Med Teach 2010;32:947–52.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.3109/01421591003686229&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=21090946&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 

7.  Stewart TJ. Learning environments in medical education. Med Teach 2006;28:387–9.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1080/01421590600727043&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=16807184&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 

8.  Holt MC, Roff S. Development and validation of the Anaesthetic Theatre Educational Environment Measure (ATEEM). Med Teach 2004;26:553–8.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1080/01421590410001711599&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=15763835&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 

9.  Cassar K. Development of an instrument to measure the surgical operating theatre learning environment as perceived by basic surgical trainees. Med Teach 2004;26:260–4.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1080/0142159042000191975&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=15203505&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 

10. Kanashiro J, McAleer S, Roff S. Assessing the educational environment in the operating room—a measure of resident perception at one Canadian institution. Surgery 2006;139:150–8.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1016/j.surg.2005.07.005&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=16455322&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 
    
    [Web of Science](http://ebm.bmj.com/lookup/external-ref?access_num=000235358800003&link_type=ISI) 

11. Mulrooney A. Development of an instrument to measure the practice vocational training environment in Ireland. Med Teach 2005;27:338–42.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1080/01421590500150809&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=16024417&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 

12. Roff S, McAleer S, Harden R, et al. Development and validation of the Dundee Ready Education Environment measure (DREEM). Med Teach 1997;19:295–9.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.3109/01421599709034208&link_type=DOI) 

13. Roff S. The Dundee Ready Educational Environment Measure (DREEM)—a generic instrument for measuring students’ perceptions of undergraduate health professions curricula. Med Teach 2005;27:322–5.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1080/01421590500151054&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=16024414&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 

14. Schönrock-Adema J, Bouwkamp-Timmer T, Van Hell EA, et al. Key elements in assessing the educational environment: where is the theory? Adv in Health Sci Educ 2012;17: 727–42.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1007/s10459-011-9346-8&link_type=DOI) 

15. Roff S, McAleer S, Skinner A. Development and validation of an instrument to measure the postgraduate clinical learning and teaching educational environment for hospital-based junior doctors in the UK. Med Teach 2005;27:326–31.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1080/01421590500150874&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=16024415&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 

16. Clapham M, Wall D, Batchelor A. Educational environment in intensive care medicine—use of Postgraduate Hospital Educational Environment Measure (PHEEM). Med Teach 2007;29:e184–91.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1080/01421590701288580&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=17917989&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 

17. Aspegren K, Bastholt L, Bested KM, et al. Validation of the PHEEM instrument in a Danish hospital. Med Teach 2007;29:504–6.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1080/01421590701477357&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=17885981&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 

18. Malling B, Mortensen LS, Scherpbier AJJ, et al. Educational climate seems unrelated to leadership skills of clinical consultants responsible of postgraduate medical education in clinical departments. BMC Med Educ 2012; 10:62.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1186/1741-7015-10-62&link_type=DOI) 

19. Schönrock-Adema J, Heijne-Penninga M, Van Hell EA, et al. Necessary steps in factor analysis: enhancing validation studies of educational instruments. The PHEEM applied to clerks as an example. Med Teach 2008;31:e226–32.
    
    

20. Riquelme A, Herrera C, Aranis C, et al. Psychometric analyses and internal consistency of the PHEEM questionnaire to measure the clinical learning environment in the clerkship of a Medical School in Chile. Med Teach 2009;31:e221–5.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1080/01421590902866226&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=19811154&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 

21. Wall D, Clapham M, Riquelme A, et al. Is PHEEM a multi-dimensional instrument? An international perspective. Med Teach 2009;31:e521–7.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.3109/01421590903095528&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=19909030&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 

22. Nagraj S, Wall D, Jones E. Can STEEM be used to measure the educational environment within the operating theatre for undergraduate medical students? Med Teach 2006;28: 642–7.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1080/01421590600922875&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=17594557&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 

23. Kulier R, Gülmezoglu AM, Zamora J, et al. Effectiveness of a clinically integrated e-learning course in evidence-based medicine for reproductive health training: a randomized controlled trial. JAMA 2012;308:2218–25.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.1001/jama.2012.33640&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=23212499&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 
    
    [Web of Science](http://ebm.bmj.com/lookup/external-ref?access_num=000311829500013&link_type=ISI) 

24. Black TR. Doing quantitative research in the social sciences. An integrated approach to research design, measurement and statistics. London: Sage, 1999.
    
    

25. Hair JF, Anderson RE, Tatham RL, et al. Multivariate data analysis. 5th edn. Upper Saddle River, NJ: Prentice-Hall, Inc., 1998.
    
    

26. Steyn AGW, Smit CF, Du Toit SHC, et al. Modern statistics in practice. 5th edn. Pretoria: JL van Schaik, 1994.
    
    

27. Hatcher L. A step-by-step approach to using SAS for factor analysis and structural equation modeling. Cary, NC: SAS Institute, Inc, 1994.
    
    

28. 1.  Kotz S, 
    2.  Johnson NL
    
    Drasgow F. Polychoric and polyserial correlations. In: Kotz S, Johnson NL. eds Encyclopedia of statistical sciences. vol 7. New York: John Wiley & Sons, 1986:68–74.
    
    

29. SAS Institute, Inc. SAS/STAT® 9.3 User's Guide. Cary, NC: SAS Institute, Inc., 2011.
    
    

30. EQS 6.1 for Windows, Multivariate Software, Inc.
    
    

31. Satorra A, Bentler PM. Scaling corrections for chi-square statistics in covariance structure analysis. In: American Statistical Association 1988 Proceedings of the Business and Economics Section. Alexandria, VA: American Statistical Association, 1988:308–13.
    
    

32. Mardia KV. Measures of multivariate skewness and kurtosis with applications. Biometrika 1970;57:519–30.
    
    [Abstract/FREE Full Text](http://ebm.bmj.com/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NjoiYmlvbWV0IjtzOjU6InJlc2lkIjtzOjg6IjU3LzMvNTE5IjtzOjQ6ImF0b20iO3M6MjA6Ii9lYm1lZC8xOS80LzEyMy5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 

33. Kline RB. Principles and practice of structural equation modeling. 2nd edn. New York: The Guilford Press, 2005:143–4, 321–2.
    
    

34. Shavelson RJ, Webb NM. Generalizability theory: a primer. Thousand Oaks, CA: Sage, 1991.
    
    

35. Bloch R, Norman G. Generalizability theory for the perplexed: a practical introduction and guide: AMEE Guide No. 68. Med Teach 2012;34:960–92.
    
    [CrossRef](http://ebm.bmj.com/lookup/external-ref?access_num=10.3109/0142159X.2012.703791&link_type=DOI) 
    
    [PubMed](http://ebm.bmj.com/lookup/external-ref?access_num=23140303&link_type=MED&atom=%2Febmed%2F19%2F4%2F123.atom) 

36. King CV. Factor analysis and negatively worded items. Populus. [http://csi.ufs.ac.za/resres/files/King.pdf](http://csi.ufs.ac.za/resres/files/King.pdf) (accessed 12 Feb 2014).