Component 1 seemed like the baseline characteristics of a good lecturer – being friendly, accessible, fair, offering support, an all-round professional that can explain the work. It accounted for 19% of the variance in the data. Component 2 looked like the things that you have to learn to become a good lecturer. It is not only about knowing your field, you also need to know a bit about formulating outcomes, action verbs, putting it into a study guide and using it. This explained 15% of the variance. Component 3 looked like it was about the enthusiasm and adding value – engaging with the students. And it explained another 15% of the variance. The other 50% of the variance was not explained by these three constructs.
Now, students don't really like doing these lecturer evaluations for all their courses. After the second or third one they realise that it takes up valuable beer-drinking time. And as with many of these "customer satisfaction" type surveys, they don't really see their inputs making a difference. Unless you are repeating the course next year, you would not know if the lecturer took the low score for providing quick feedback to heart.
On the lecturers' side, the process does not seem to provide that much useful information or action. If it is going badly in a course, the School director knows about it long before the lecturer evaluation is done. Low scores on a few items might get a mention at the meeting where next year's teaching load is assigned. If you are really keen, scoring low on "basing assesment on learning outcomes" might get you to enroll in the next assessment training course on campus, but then you also do have some articles to write. At best (and worst) the aggregate lecturer evaluation score matters for the top-20% bonus at the end of the year.
Which brings me to my point, if we are capturing half of what makes for a good lecturer and the result is not particularly useful to students or lecturers, shouldn't we replace the whole system with a simple vote by text message: rate your lecturer with score out of 100.
This has some support in what has been written about so-called "thin slicing": In his book Blink, Malcolm Gladwell writes about people’s ability to 'thin slice' – to judge what is good or important from a narrow period of experience. He goes on to argue that having too much information can interfere with the accuracy of a judgement.
To get an idea of the difference between our questionnaire score and a 'thin-sliced' score we asked our students for both this semester and the results are as follows. For the analysis I have 869 usable responses from undergraduates in 5 modules for 9 lecturers.
The average score from the questionnaire was 85.7% and the standard deviation 13.3. The figure shows the distribution between the courses.
The average of the 'thin slice' scores out of 100 was 82% with a standard deviation of 17. The correlation coefficient between the two sets of scores was .741 and it was significant at the 5% level. Controlling for the different modules the partial correlation is slightly lower at .731. A paired samplest-test showed that the differences between the means is statistically significant.
It seems that the questionnaire is useful to the extent that it ammeliorates a general dislike of the lecturer. For 'thin slice' scores between 0 and 50%, the average questionnaire score was 63%. Having considered all 25 elements of the evaluation it turns out the that lecturer is not so bad as the initial 'thin slice' score out of 100 indicated. This effect gets smaller at higher scores: For 'thin slice' scores between 71 and 80%, the average questionnaire score was 83%.
Thus, for all its shortcomings, it does seem that the questionnaire serves a purpose. Next we'll have to try a five-point scale and see if we can measure more accurately!