Wednesday, January 14, 2009

Performance related pay for teachers: a dialectic

Thesis: Teachers go mad mental crazy when you ever suggest putting them on performance related pay.

Antithesis: On the other hand, education is actually the performance-measuringest business you ever did see. At least a quarter of the workload of the average teacher is performance measurement. I would guess that your average teacher carries out more assessments of someone else's performance than any other category of worker, supervisory or non-supervisory, anywhere in the world.

Synthesis: It is surely nonsense, therefore, that teachers' performance can't be measured, or that the flaws in student performance evaluations can't be corrected for. If it's not possible to measure how well a class of students have been taught, then this undermines the whole logical basis of the system.

Hypothesis: Teachers object to performance-related pay so strenuously precisely because they know how arbitrary, biased and unfair their own assessments of their students are. Apparently it's possible to like totally game the student evaluation system by blatant pandering, easy grading and a bit of showbiz. I wonder how many undeserved A-grades have been handed out to good-looking students who sit near the front, ask one sycophantic question per class in a loud voice, and feign interest in the professor's fuck-dull book?

Prosthesis: It might be right that:

"If the administration at A&M were serious about improving classroom performance, they'd invest quite a bit more money in pedagogical training for their graduate students; hiring more professors and reducing class sizes; offering release-time for professors to design new courses; and so on and so forth"


Sounds good to me (although the word "pedagogy" always makes me think of dodgy priests somehow). On the other hand, if the administration at A&M did make such a big investment in good-teacher-talking-stuff, then you'd certainly hope that they might follow it up with a bit of a look to see if it worked or not.

Photosynthesis: And if they were doing that, it wouldn't seem like the daftest idea to bung a couple of quid to the bloke whose teaching turned out to be the best, just to encourage the rest.

94 comments:

  1. "totally game the student evaluation system by blatant pandering, easy grading and a bit of showbiz"

    But given that most student evals are anonymous, that they come at the end of the term and that most students don't repeat profs much it's just one big implicit non credible promise (you pander, gimme an A and I'll give you a good eval after you can't do anything bad to me).

    I've got good bit anecdotal evidence from grad school to support this. Some TAs who were bad teachers (basically cuz they didn't give a fuck about the students whom they regarded as morons and an annoying distraction from thesis writing and being brilliant) and who knew were bad teachers tried to pad their evals precisely through pandering and easy grading (in grad schools student evals mostly determined whether or not you got your pick of classes to TA or in extreme cases if your funding would continue. They also helped marginally on the job market). As a result they got comments from students, which I've seen often enough, along the lines "horrible teacher who tried to bribe us with easy grades and extra credit".

    So while I think there's a good bit of arbitrariness in both teachers grading students and students grading teachers I think there's enough substance there for it to have a well deserved place. I support both, contra those who go mad mental crazy.

    ReplyDelete
  2. Oh, and incidentally one of these bad TAs was my roommate who was an aspiring game theorist and who should have been able to freakin' solve for the correct equilibrium.

    ReplyDelete
  3. just one big implicit non credible promise

    yes, I did wonder about this. Surely if it was a matter of that the result would simply be instant grade hyperinflation and we would be left with an equilibrium in which teachers no longer gave assessments of the performance of students in any meaningful sense, while students had no incentive to offer anything other than an honest assessment of teachers. I'd call this the best of both worlds, although I am realistic about the prospects of getting any teachers on my side.

    If I was a teacher handing out grades for quid pro quo, btw, I would almost certainly reason that the currency was spent on a definite promise of sexual intercourse than on a remote chance of a $10,000 bonus.

    ReplyDelete
  4. Photosynthesis. And if they were doing that, it wouldn't seem like the daftest idea to bung a couple of quid to the bloke whose teaching turned out to be the best, just to encourage the rest.
    Not the daftest idea if the administration didn't care about getting an accurate assessment (*). Unless we're assuming that the teachers wouldn't respond to the incentive to game the system which is not impossible I guess, though contradicts economic dogma on REM and so will give Notsneaky palpatations.

    There's a fairly large body of work on performance related pay. Broadly speaking its hard to get right, quite easy to get wrong and it can cause all kinds of unexpected problems, particularly for group morale in organisations. Even for salesmen its surprisingly easy to get it wrong, and its far easier to come up with a meaningful assessment of sales performance.

    Incidentally, while education is measured in all kinds of ways, nobody can really agree on what the figures mean, or what the best figures to use are. Which given that nobody can really agree on what the purpose of education is (other than a vague feeling that it should be economically productive, for which see Alison Wolff's rather jaundiced comments - or somehow create good citizens, whatever that might be). Not really an ideal environment for performance related pay.

    (*) I'm assuming you mean by "assessment" half arsed use of figures that are compared to a previous year, and used to justify managerial changes, government policy, educational change. Which is how educational policies/techniques seem to be mostly tested.

    ReplyDelete
  5. "Surely if it was a matter of that the result would simply be instant grade hyperinflation"

    Wait, how come? If I know that the students' implicit promises to gimme a good eval conditional on me giving them all A's and pandering is not credible, then I have no incentive to actually fulfill my part of the implicit bargain, since it's not much of a bargain. I might as well grade them on what I think they deserve.

    Add to that the fact that most people don't appreciate being manipulated (this includes being bribed. Bribes only work if they are high enough in value to offset the basic discomfort associated with doing something you wouldn't normally do) and you actually got an incentive to punish would be bribers (since they can't credibly promise to deliver anyway).

    The upshot is that it's precisely the fact that you CAN'T game the system that leads some teachers to oppose performance related pay and the like. If you could game it, why would you care? You'd game it and the performance related pay wouldn't be performance related anyway.

    ReplyDelete
  6. When I was a graduate student at the University of (let's say) Winnemac, I used to get horrible teaching evaluations in graduate school, having an attitude rather like the one YouNotSneaky describes. Then I decided to change that, became a better teacher, and my evaluation improved but were still mediocre. Eventually I realized that the trick was to make the students feel good about me, by making them feel good about themselves, and my evals went through the roof (seriously, one semester the kids got together and bought me a fancy pen-set, which I still have), where I am happy to say they have stayed. Of course I can no longer cast myself as the good cop to the professor's (unwitting) bad cop, but there are other ways to win them over, with no discernable impact on how much they actually learn.

    ReplyDelete
  7. Anon, sure, there's a psychological/social element to it. Basically what you're saying is that if you're good at making people like you then they are more likely to say they that like you when they evaluate you.
    The thing is, it's not that easy to make people like you (despite you making it seem that way) unless you do your job - which in this case means learning them something.

    I think there's still a pretty strong relationship between ... well, at least the effort that is put into teaching and the evaluation of the teacher, just like there's a strong relationship between the effort put into learning something on the part of the student and the grade at the end of the course (I rely on time series data from myself as evidence for the latter and believe me, there's enough variance there to estimate that sucker pretty precisely). Of course more ideally you'd like the strong relationship to be between effort and teaching ability and effort and amount learned but to the extent those are correlated...

    On a related note I'm also of the opinion that the pedagogical value of the entertainment aspect of teaching is under appreciated. You gotta have good jokes and interesting stories to go along with the theory and numbers. It's sort of like Daniel's writings. They're appreciated and even persuasive not just because he's all smart and shit but also cuz he's pretty damn funny.

    Ok, enough pandering.

    ReplyDelete
  8. At least a quarter of the workload of the average teacher is performance measurement

    You do have a statistical basis for this claim? "Marking" won't do: only part of what occurs in the marking process is actual performance measurement.

    ReplyDelete
  9. Yesss! Let a hundred flowers blossom and a hundred schools of thought contend! Parents may love me, but not as much as Chairman Mao!

    ReplyDelete
  10. Even assuming you could come up with a reliable measure of teacher performance,performance related pay, where the bonus makes up a small fraction of the total salary does nothing or very little, to boost performance in my experience.

    A bonus scheme that just pisses about at the margins serves very little purpose and in some cases has a negative impact on performance. The difference between a bonus of £500 or £1,000 for teachers would do nothing to change teacher behaviours over a year, I'm sure.

    Also, teachers can be an obtuse bunch, I can imagine some (probably those with least to gain) willfully working against a performance related pay system.

    ReplyDelete
  11. Dsquared, you've made a huge logical jump in your synthesis: just because it's possible to measure student performance in great detail, you assume it's possible to measure teacher performance as easily.

    It's fairly easy to find the fastest 1500m runner in the world. Finding the best 1500m coach could be rather tricky.

    ReplyDelete
  12. Ajay,

    Don't schools now measure "added value", rather than just school grades? My understanding is that this bases teacher performance on improvement from their starting point. So, moving a student from a F to a C is better than moving a student from a B to an A.

    Tom

    ReplyDelete
  13. Finding the best 1500m coach could be rather tricky

    Yeah, but I doubt it's really that much more difficult than growing a human ear on the back of a mouse, which we can already do.

    Also, we don't need to find the best teachers with any exactitude at all; we don't even need a metric as accurate as exam results are for finding the best student. We just need a rough-and-ready indicator that's somewhat correlated and robust to the most blatant gaming. People solve these problems all the time in the real world you know, and the objective isn't to give everyone a precise reward for their intrinsic merit; it's just to incentivise more effort and better performance, which is a lot easier and a lot less demanding of specific information.

    (I made this point once before with respect to performance-related pay for policemen. Someone objected "why should the chief of police in Bristol be penalised just because a crack gang suddenly moves in?", to which the reply is that the chief doesn't have a right to his bonus, the purpose of PRP isn't to reward intrinsic merit and that the sudden presence of a crack gang and consequent need for more effort is exactly the sort of thing that the bonus system's meant to respond to. Or closer to home, the global financial crisis wasn't my fault, but it's still going to affect my income).

    ReplyDelete
  14. But Daniel you work in a culture where bonus is a huge chunk of pay and so materially affects performance. Lots of bonus schemes are window dressing that are too small to materially change behaviours. There's no way you could implement a "real" PRP scheme for teachers, it just wouldn't fly.

    ReplyDelete
  15. bonus is a huge chunk of pay and so materially affects performance.

    I'm not sure that the second part of this follows from the first and even less sure that if it does follow, it affects it in a good way.

    The trouble with this whole discussion is that it's one where there's a problem but not a solution. We know there are bad teachers, but we really, really don't know how to identify them specifically. If we could, I think we would likely have done so by now: teachers might in principle not like bonus and performance schemes but they'd like them a lot more if they put extra money in their pockets. But (as far as I can see) there's no getting round problems such as the most difficult pupil being the one whose results it is hardest to improve.

    I think Cian's comments above are hard to ignore and I also think that the more we have performance measurement, the more this sort of thing is likely to happen. What to do? If I were running an education service (or seeking to justify it to the nation) I'd certainly want it to be easier to sack poor teachers and reward good ones, and hence be able to identify both more easily. But suppose it's really not to be done?

    ReplyDelete
  16. "We know there are bad teachers"

    Do we? The DCFS and the teaching unions barely acknowledge this.

    ReplyDelete
  17. We know there are bad teachers, but we really, really don't know how to identify them specifically

    What is it about headmasters that makes them unable to do what nearly all other types of manager do, and provide objectives-based assessments of their staff on an annual basis? I would bet quids that at least 80% of headmasters in the land would be able to identify the stars and lemons among their staff, and that in the vast (>75%)majority of schools, the ones they would identify as the lemons would be the same teachers that all the other teachers, and the pupils, identified as the stars and lemons.

    Teachers really hate this idea, btw, and in my experience always say that it's bound to end up only rewarding sycophants and yes-men. The point of the post was to poke a little fun at this reaction, by suggesting that there's more than a little projection involved. (If I was in a more charitable frame of mind, I'd suggest that there's also an officer-class effect here; the heads and heads of department don't want to grade their NCOs for fear of undermining their status in the eyes of the children. But as I say, if we can invent microwavable garlic bread, this one ought to be soluble).

    ReplyDelete
  18. My understanding is that this bases teacher performance on improvement from their starting point. So, moving a student from a F to a C is better than moving a student from a B to an A.

    1) There's another assumption that would be tricky to back up - how do we know it's easier to turn an F student into a C student? Low hanging fruit and so on. After all, it's a lot easier to turn a 16 second 100m runner into a 15 second, than to turn an 11 second into a 10 second.

    2) PRP could undermine the public-service ethos which makes teachers (like, say, doctors) do so much unpaid overtime. As noted above, the financial industry is probably the one where bonuses and performance-related pay are commonest, and not coincidentally it has just been shown to be full of short-sighted bonus-obsessed destructive deceitful idiots. It's difficult to imagine what a school in the same disastrous shape as (say) Wachovia or Merrill Lynch would look like, but it would probably make Edward Tighlman Middle look like Harrow.

    ReplyDelete
  19. Should also add that you've linked to a post about university teaching and used it as a springboard to discuss school teaching.

    ReplyDelete
  20. I also have to confess that entering children for GCSEs early and allowing them a retake on any that they fail sounds like a really good idea to me, assuming that what we care about is that they reach the highest level of achievement they can, and that we measure that using exams. It doesn't seem like a failure to me at all; it looks more like the sort of thing that the system was designed to promote.

    In particular, Alan Smithers seems to have it backwards here:

    We're in danger of producing a set of statistics that no longer accurately reflect pupils' progression but the work the schools can do to improve their scores

    Surely the change in the statistics make them more of a measure of the pupils' progression (in so far as that progression is measured by the exam system, and if it isn't then we need much more fundamental reform) and less reflective of the quality of the teachers. And surely producing better results with the same quality of teachers has to be the basis of any productivity improvement in education at all.

    ReplyDelete
  21. University teaching and school teaching are very different things, so you can't really compare them. People generally become university teachers because they want to do research, so there's a motivation problem from the outset, which maybe should be addressed with bonuses. Secondly, for schools there is a standardised curriculum, whereas not so much in universities, so its very hard to compare the quality of teaching at more than quite a gross level.

    Bankers are extrinsically motivated, whereas teachers, police chiefs, etc will be intrinsically motivated. Your own experience at Credit Suisse is not terribly relivant. Bonuses don't seem to be incentivising for intrinsically motivated people, but can be quite demotivating, making their application problematic at best. This is an active research topic, so I suggest that you argue against the literature, rather than your own prejudices, biases, anecdotes.

    What is it about headmasters that makes them unable to do what nearly all other types of manager do, and provide objectives-based assessments of their staff on an annual basis?

    I'm sure they can with about the same level of effectiveness as other managers. Its easy to identify the outliers (though whether those perceived as stars are actually stars is another matter). Its the remaining 99.9% that's the problem. Its not that headmasters are worse, its simply that this is something that managers are not generally able to do in a reliable, or robust fashion, unless there are clear and unambiguous objectives to be met by all staff which are directly comparable (telesales being the best example). Yes there are jobs where this is true, but they're in the minority.

    If you want an example. Take computer programming. Should be an easy area to identify quality, right. You have actual tangible "stuff" to compare. In practice its almost impossible. Identifying stars is fairly easy (though quite often stars are probably best fired, as they can't work properly in teams and so cause more problems than they solve) as there are so few of them, but everyone else? What metric do you use. Lines of code - well better programmers write shorter programs. Who solved the toughest problems? Well the manager probably lacks the skills to really make that call. Who caused the most bugs? Not always easy to say (programming being a team effort), and anyway this is often related to who was working on the toughest stuff. The manager's perception...well do you really want to incentivise the best office politician... And the better programmers are far more motivated by the technical challenges, than money. So while they won't say no to bonuses, its probably not necessary so long as you're paying a competitive wage.

    ReplyDelete
  22. Nonsense, grading the teachers is easy; I'd done it a million times, back in the day. There are four grades: good, fair, one with serious errors, and running dog of capitalism.

    And believe me, you don't want to come out as a running dog of capitalism, no Sir. That's the incentive, right there.

    ReplyDelete
  23. What is it about headmasters that makes them unable to do what nearly all other types of manager do, and provide objectives-based assessments of their staff on an annual basis?

    The practical difficulties involved in observing and evaluating the work of their staff. They can have good hunches - and as Cian says, they can work out who the very worst and very best likely are - but beyond that they don't have the tools for the job.

    An awfl lot of your argument seems to me to be based on inference, so it might be a good idea to look at the argument-from-inference that if anybody had thought up a generally reliable scheme to do this, we'd have it already. After all, why would we not? Because trachers wouldn't like it? Successive governments have spent nearly thirty years doing what teachers don't like. So if they don't do this apparently useful thing, that you're sure must be possible.....

    ReplyDelete
  24. On the generalities of PRP... There's lots of places out there in the world who "do education" and some do it better than others.

    What do we learn from them? Be it a highly praised university, or the schools in Finland, it seems that you don't get better teachers through performance management bonus schemes, you get them by providing a mix of pay, benefits and social status that attracts high-aptitude people.

    What this in turn suggests is that rather than "attitude" (which is the basic factor that PRP claims to affect) quality may be much more an issue of "aptitude."

    As such, if you want real improvement you'd be better off investing in improving "aptitude" (training, etc.?) than "attitude."

    The relative contributions of attitude and aptitude likely vary from job to job. But the ones where aptitude predominates, PRP will likely have fewer useful effects.

    ReplyDelete
  25. entering children for GCSEs early and allowing them a retake on any that they fail sounds like a really good idea to me, assuming that what we care about is that they reach the highest level of achievement they can

    There's a useful concept I've come across that seems relevant here - it's called 'gaming the system'. You can read more about it in a number of places, including earlier comments to this thread.

    The real problems start when the kids know that that's what you're doing. At the university where I teach, we used to have a discretionary 'borderline' system, where somebody whose scores fell just below (or just above) a grade cutoff point could, quite legally, have their mark adjusted upward (or, theoretically, downward). That system's now been replaced by a system which makes some adjustments to the final result but does so automatically - no marker discretion. Coincidentally, the proportion of students claiming 'special circumstances' (which allows discretionary adjustments) has gone up from 5-10% to 40-50%.

    We've also had students asking if they can retake an exam. Not because of special circumstances, just because they were disappointed with the mark and they didn't think they'd done as well as they could.

    It's all Goodhart. If exams are going to be used as measures, you need to be sure you're not measuring

    a) the pupil's ability to take exams
    b) the pupil's ability to respond well to coaching in how to take exams
    c) the teacher's desire for improved results as compared to last year
    d) the government's desire for improved results

    At least, you need to guard against introducing changes that make it less likely you're measuring what you want to measure.

    ReplyDelete
  26. unless there are clear and unambiguous objectives to be met by all staff which are directly comparable (telesales being the best example)

    Disputatum! What's clearer and less ambiguous than an exam result?

    And if we're agreed on telesales, I have to point out that telephone sales prospects are not all created equal either, nor is advertising in one magazine just as easy to sell as advertising in another.

    ReplyDelete
  27. What's clearer and less ambiguous than an exam result?

    The part played by the teacher's labour in producing it. It measures the pupil, to a degree: it measures the tacher, rather worse. It may well measure the teacher in comparison to other teachers very poorly indeed.

    ReplyDelete
  28. Disputatum! What's clearer and less ambiguous than an exam result?

    It rather depends upon the question you're seeking to answer, doesn't it.
    1) As a measure of how well a student can do on a particular examination it is pretty accurate.
    2) As a measure of how well the student understands the course material it is less accurate, as this result will also be influenced by exam technique, degree to which the exam can be crammed, etc. Obviously in choosing a particular examination style we will inevitably favour some students over others. Change the style and some students will do better, while others do worse. Clearly then exams are not as reliable measurement of educational attainment as we pretend.
    3) As a measure of how successful a student will be in the workplace it will effectively filter no-hopers, but otherwise will be a fairly weak measure (again Alison Wolff has some good analysis of this).

    As a measure of teaching ability...well:
    1) How do we control for the quality of intake. Sure we have previous SAT results which gives you a coarse indicator, but value added while better, is not nearly as accurate as you'd need it to be.
    2) Is it a teacher's fault if his students are lazy and don't revise, or do homework?
    3) How do we control for the socio-economic status of the kids?
    4) How do we control for some kids having better educated parents who help them?
    5) How do we control for some kids having private tutors?
    6) Maybe some teachers are better at teaching particular ability ranges than others. Does this matter, should we control for this?

    ReplyDelete
  29. "Is it a teacher's fault if his students are lazy and don't revise, or do homework?"

    I would say that at the college level in about 3/4 of cases the answer to that is yes.

    At high school level students' potentially adverse backgrounds (not that they're lazy they just might have more important stuff than your class on their mind) play a bigger role so the variance will be greater. But at high schools attended by middle class students it's still about 3/4

    (of the remainder 1/8 still got other issues and 1/8 are just plain lazy bastards).

    ReplyDelete
  30. As a measure of teaching ability...

    Hang on, five minutes ago we agreed that advertising column inches sold was a valid measure for telesales clerks; in that case, we were (correctly) interested in providing incentives to get results, not in objectively measuring telesales ability.

    The point of a bonus scheme is to provide a positive reinforcement that's closely related enough to effort to make it worth the employee's while to expend more effort. It's not to reward intrinsic merit.

    ReplyDelete
  31. Daniel - people don't go into telesales for the love of the job. You have to motivate them somehow, and column inches sold is probably the most effective way of doing it (though you can get perverse affects when sales forces have interdependencies). Teachers don't, for the most part, do it for the money.

    A lot of work has been carried out looking at the affect of bonuses on intrinsically motivated employees, and its minimal at best. At worst it can transform behaviour in all kinds of negative ways. Now maybe you think there is something wrong with this research, or that you know better, but given you are completely ignoring actual work in favour of your own hypotheticals...

    Back to teachers. What do you want them expending more effort on? Okay, lets take an example. Better GCSE results. Is that a good thing? Why? Well why don't we make them easier, and that way we'll improve the GCSE results.
    Don't like that solution? So you want some way of improving results
    Well, we could focus on borderline cases. B/Cs. Lets say there's a 50/50 chance that they'll get a B, or a C, each time they take the exam. Well if we want to improve the school's results, we'll just enter those for GCSEs as many times as we can and take the highest result. Statistically we're onto a winner there. Results improve, the kids are obviously learning better (albeit rather more stressed than previous generations).
    But what has actually improved? Are the kids better, have they learned more? Well no, we just improved the odds of them getting a higher grade. Society hasn't benefited, we might as well have just implemented grade inflation.
    Unlikely. They might do better because they have a higtie affect with grade inflation, or changing how we report grades.
    But hang on, what about the other kid. More resource for borderline kids means less resources for them. In terms of GCSE results they haven't been harmed. So if you think that schools are simply factories for producing GCSE widgets, then everything is fine. Maybe that is your argument, but I think you'll find its a minority opinion.

    ReplyDelete
  32. Only literature I could find doesn't actually bear out this theory about intrinsically motivated employees.

    ReplyDelete
  33. How do we control for some kids having better educated parents who help them?

    "The working class child in the school system does not know that she or he is taking part in a competition designed by and for children from classes above. The working class child will take the defeat, as documented again and again in the studies of education, as a personal failure. It will be one more of the long row of signals that build up to the stable pattern of accepting rewards not so good as those at the top of society."
    - Nils Christie, "The ideal victim"

    Christie was writing in the early 80s, since when I think this has got worse rather than better. My daughter's homework assignments quite openly assume parental assistance ("ask a grownup about their childhood"/"read to a grownup for ten minutes"/"practise division and explain the technique to a grownup"...). There was one period when every week's homework seemed to include an open-ended, self-directed research project (e.g. "Find out a few facts about gemstones"). The kids in the class were seven at the time. That to me is pure social sorting:
    Parent A: "Let's have a look in the encyclopedia, shall we?"
    Parent B: "I don't know, it's your homework."

    (I also second Anon's invocation of Goodhart's Law, wrt using exam results both as a measure and as a target.)

    ReplyDelete
  34. I have to point out that Charles Goodhart did not actually use Goodhart's Law as a reason to give up on monetary policy entirely.

    ReplyDelete
  35. Daniel,

    Are you not suffering from something of a confirmation bias here, regarding your faith in PRP schemes as a means of incentivizing performance? PRP in the finance sector is a long established norm that people buy into when they chose the industry. Current blip aside (or maybe not), PRP has probably worked well to incentivize performance.

    As Cian says, however, people go into teaching for manifold reasons other than money and would very likely be resistant to any formal and overt PRP scheme, especially one that just affects salary at the margins. Teachers feel measured enough, without the additional burden of PRP measurement.

    It's also worth noting that schools already have a system of rewarding good teachers, by the back door, giving favoured teachers "more responsibility" with which comes more pay. This additional responsiblity is very often conferred on good performers, who, as you state, will be well known to the school leadership team.

    ReplyDelete
  36. ...providing incentives to get results, not in objectively measuring...

    Right, want to go faster - step on that pedal; and that's all there is to it. What can possibly go wrong?

    ReplyDelete
  37. Anon: Perhaps, but the study I found above does suggest that these effects aren't as big as you think. Also, if schools are able to reward good teachers informally with better jobs, why wouldn't it make sense to just hand out a chunk of cash?

    ReplyDelete
  38. D^2 - as the Anon who invoked Goodhart's law to begin with said, the point is not that we should never set targets based on measures, just that we should be aware of their tendency to become meaningless as a result, & sceptical about introducing changes which lead to greater reliance on measure-based targets.

    ReplyDelete
  39. Well there's "sceptical" and there's "dogmatically opposed". The evidence of the Bristol study (which as far as I can tell is all we have) is that the limited introduction of PRP into British secondary schools did work. (I also turned up a really good-looking, large-sample metastudy by John Hattie of Auckland University, and among other things he seems to think that PRP for teachers would be a good idea - not his top priority but a good idea. (Interestingly, he puts a lot of weight on pupil evaluations and self-assessments). So I don't think that the existing literature supports Cian's view.

    ReplyDelete
  40. Oh, I found the fucking comment box. I got around to spamming portfolio.com's market movers blog because he linked toyou yadda yadda.

    Anyway, I ended up blogging about it because there was no fucking comment box in the post-view page. Here's my fucking reply:

    http://dayvancowboy.org/?p=176

    (I'm trying to do a Harold Pinter schtick here. I wonder if it worked.)

    ReplyDelete
  41. why wouldn't it make sense to just hand out a chunk of cash?

    Is it your intention that the payments should be public or private?

    ReplyDelete
  42. I am personally very much in favour of having everyone's bonus publicly available, but the chances of getting this agreed in banking, let alone education, are microscopic so private.

    ReplyDelete
  43. I don't know about mad mental crazy but, presuming that we are having to do performance measurements constantly (and thus professionally and with lots of experience at it and, I hope, some reasonable skill), it's easy to see how we might take umbrage at proposals to evaluate our performance based on untrained, inexperienced people who often have different values and nothing riding on the evaluation.

    I suspect, sadly, that there's far too much arbitrariness in grading, but, y'know, people do try things to mitigate it (e.g., blind grading). I've been fascinate by the UK exam system (since coming here in 2006). You have to turn in your exams months before the exam date. The exam is checked by another member of faculty. The exams are moderated by other people. The exams are anonymous. The graded exams are moderated by a member of staff. There's a review board and a bunch of stuff I still don't quite know about. It's amazing and impressive.

    Perhaps this is idiosyncratic to the Unviersity of Manchester, but whatever, it's pretty amazing and really makes it hard to play favorites.

    ReplyDelete
  44. BTW, telesales prospects can be evenly distributed fairly easily. Schoolchildren cannot, not without all-comprehensive education and a truly gigantic bussing exercise.

    Mind you, I spent part of yesterday evening arguing with a French drag queen that allowing schools to choose teachers, rather than having the Ministry of Education allocate all the teachers, didn't actually constitute Anglo-Saxon capitalist consumer imperialism, and further that centralised administration in general was not necessarily or even often an egalitarian force. God knows what he'd have said about performance related pay!

    ReplyDelete
  45. I actually meant social psychology and business/management literature on incentives, rather than their effect on teachers. I don't think there's enough data on the effects of incentives on teaching, and many of the implementations are so flawed (or politicised) that they're pretty useless (but hey, that's educational policy for you).

    I'm suddenly really busy, so this is a bit flyby. However that Bristol study is a workshop paper, which means at best it will have had a very cursory peer review. Having skimmed the paper I can see several potential problems with it, which may be there:
    1) Its purely quantative, and the authors seem to have made very little attempt to work out what the data actually means using qualitive data and it is filled with assumptions that are not backed up by any data/citations.
    2) As with so many implementations of these schemes, the British one was flawed. They have made an honest attempt to deal with these flaws, but I'm not sure (and would require more time than I currently have to be sure) that they have successfully done so. The British scheme seems to have been seen by many headmasters as a way for them to increase the salaries of favoured staff, partly due to the strange way in which it was implemented. This may have affected how it worked on the ground, and it would be nice if they addressed this.
    3) We don't know what the teachers in the scheme did differently, and whether it is sustainable. Maybe all this demonstrates is that if you shift the emphasis of the job (which for years have been pulling teachers away from teaching towards admin), you get different outcomes. In other words this simply had an organisational affect where the incentives were incidental. The authors actually admit this in the paper, but then kind of wave it away as irrelivant (because?).
    4) There isn't a control group here. Would results have improved anyway? We don't know. Yes there are two groups, but we're effectively comparing experienced teachers to inexperienced teachers. Its quite possible that the incentive scheme was incidental.
    5) There are plenty of other causal factors and changes also happening during that year (admin changes, policy changes, curriculum changes, etc, etc) making it very hard to pin down a single causal factor here. I don't think they've adequately shown in this paper that these other factors would not have influenced the results.
    6) The Maths results got worse? This seems peculiar, and begs some questions as to what was really going on here.

    ReplyDelete
  46. Also, if schools are able to reward good teachers informally with better jobs, why wouldn't it make sense to just hand out a chunk of cash?

    Um, because teachers like a lot of people are motivated more by the quality of the work that they're doing, than the salary they're paid, so long as the salary is seen as reasonable.
    I used to work in the city, so I know where you're coming from. But not everyone's like that.

    ReplyDelete
  47. Frankly, given current circs, the notion that some two bit city boy is telling the rest of the world how to incentivize and reward its teaching staff is ridiculous, at best, and frankly pretty fucking offensive, at worst.

    Sure, it's an interesting debate to be had, but it's probably best had by people who actually have some skin in the game, and not those in a game where the PRP system has been, in no small part, responsible for the near collapse of the world economic system.

    ReplyDelete
  48. Thank you for your comment, anonymous. I value your input.

    ("I value your input" is about the rudest thing it's possible to say in the modern City, after all the professionals and Americans moved in.)

    ReplyDelete
  49. Teachers would invent some convulted assessment that would take a two in-service to understand which would prove they are all above average and in need of a pay raise. You must understand that modern day teacher education revolves around being a proefficiency bureaucrat than educator.

    When you deal with contemporary education you must take into consideration the law of unintended consequency.

    Danny L. McDaniel
    Lafayette, Indiana

    ReplyDelete
  50. Both Anons have more of a point than you want to admit.

    ReplyDelete
  51. To be honest I don't think the "two bit city boy" anon does. When you divide through by the ridiculous hyperbole, and the unsupported (and factually untrue) assertion that I don't have any "skin in the game", there's not any point there, except a claim that performance related pay has caused "the near collapse of the world economic system". Which I don't think is true. What's frankly insulting and not a little bit ridiculous is for someone to display such naked envy of the fact that other people get bonuses, combined with a flat-out refusal to consider a performance-related pay scheme for themselves.

    And at the end of the day, everyone seems to be ignoring the John Hattie metastudy, which can't be brushed off in the way that Cian does the Bristol one (it's a metastudy of 800 underlying studies all over the world). This really does look like Luddism to me. I may be ignoring unintended or unforeseen consequences, but all you lot are ignoring the intended and forseeable consequences. That stuff about performance-related pay risking the dread plague o'er the land of children being given extra chances to put in their best performance on exams was terribly weak.

    (btw, I love the way that I have to supply reams of empirical studies to support every single point, but everyone else is allowed to just randomly assert that teachers would not only attempt to game the system, but would do so to such an extent that it had no benefits at all.

    ReplyDelete
  52. "I value your input" is about the rudest thing it's possible to say in the modern City, after all the professionals and Americans moved in

    But surely the latter wouldn't understand irony?

    I think you do need to take up Cian's point about motivation, as well as, perhaps, mine about the surprising practical absence of such schemes given their apparent utility. Believe me, most teachers* think both that they're pretty good and a fair proportion of their colleagues aren't (nothing unusual in that, but that may be the point) and would have no intrinsic objection to a scheme which rewarded them and not the other lead-swinging wasters who clutter up the staff room. However, they suspect - with a great deal of rational and practical evidence, some of it coming under the rubric "Ed Fucking Balls" - that what this would mean is a lot of jam for whoever got to teach the sixth form in the easier schools and a lot of kicks for whoever got to "teach" 3 Set 4 at Bash Street. Yes, you can try to mitigate that by looking at improvments in scores rather than scores per se, but even then, trying to do it in relation to individual teachers is deeply problematic because you are not each starting out with the same material. Nor, indeed, with material whose level - and hence level of improvement - can be accurately measured. Bear in mind that you don't have them sit their GCSEs and then two years later sit them again so you can see the improvement, it's not like comparing company profits year-on-year (which I appreciate from experience is, itself, not so simple).

    It can't be said enough, by the way, that the public sector is not the same as the private sector. It does different things, in different ways, for different reasons, and by and large it does the things it does because the
    private sector does not want to do them or does not do the very well. Of course there's a measure of crossover and of course we're talking about the same basic human material, but nevertheless I am deeply suspicious of attempts to compare the two directly and to hoick the motivations and practices of one into the other. There's not going to be a smooth fit and where there's not a smooth fit, what you get is a bodge.

    Oh yeah. I left my last job in the UK because of the stress involved in trying to deal with lazy and disinterested colleagues and managers whose main function involved evading their responsibilities. Now I don't know whether the unviersity sector counts as public or private (I think of it as public, but Vice-Chancellors will tell you different) but I can't think of any bonus scheme that would have remedied the situation, not least because I can't see how performance would have been measured in order to enable such a scheme to operate. However, I personally think that even after four years of that I would have stayed in the job had I been allowed to select other members of staff to be dragged the length of the Fulham Palace Road behind a cart. It is possible that this may be the way forward.

    [* my mother was a teacher and I am marrying into a family of teachers - hence I'm not short of acquainatance in this field]

    ReplyDelete
  53. "What's frankly insulting and not a little bit ridiculous is for someone to display such naked envy"

    Nice "gotcha", as you would say. There's no envy, just amazement of the bare faced gall of a City worker trying to lecture the rest of the world about how to incentivize and reward its staff. To use your words, bankers tend to go "mad, mental, crazy" when anyone suggests they're overpaid, poorly incentivized to deliver long term performance and appalling judges of risk.

    It's not just me who thinks the City compensation culture was/is a crock of shit. To quote Michael Lewis:

    "At the bottom of the modern financial markets are the incentives that people who manage money have been allowed to create for themselves by investors who continue to place too much faith in their own wisdom. Our allocators of capital, when they make huge sums of money, are allowed to keep a huge chunk of the winnings; if they lose a huge sum of money, they walk away debt free - and create another hedge fund."

    Surely you have can appreciate the incongruity of a banker posting about teachers pay? I'd rather hear what you've got to say about the market failures in the City, but of course, your lips are sealed. The Crooked Timber "I'm compromised, my hands are tied" post wasn't enough explanation for me and, I'm sure, other readers of yours.

    ReplyDelete
  54. It occurred to me this morning, that given DD's long history of advocating antinomianism, that this is merely a roundabout way of giving children practical lessons in the virtues and applications of hypocrisy.

    ReplyDelete
  55. mine about the surprising practical absence of such schemes given their apparent utility.

    hmmm, the UK has just introduced one, piecemeal and in what doesn't look like a particularly sensible manner. They're not totally unknown in teaching, but they tend to encounter ferocious pushback from the employees, and since those employees are typically well-unionised, articulate and political, one can see why they don't get introduced.

    If I had my druthers, I'd bring them in on a per-school basis; I'd give the headmaster a budget based simply on a headcount of staff and let him divide it up how he liked (subject only to the normal law of the land on misappropriation, discrimination and nepotism). I just don't believe that headmasters, or even university heads of department, are as corrupt and cronyist as seems to be suggested. Since the purpose of the scheme is to encourage improvement, not to reward intrinsic merit, I don't see any need to get into complicated value-added accounting schemes or indeed link the thing to exam results at all.

    just amazement of the bare faced gall of a City worker trying to lecture the rest of the world about how to incentivize and reward its staff.

    Tell the truth, Anonymous; during the bull market between 2001 and 2007, were you loudly praising the City and saying that people like me ought to have our advice sought out on any question under the sun? Or is this just perhaps a teeny little bit of ad hominem, tacked onto a pre-existing view that you obviously feel passionate about, although apparently not sufficiently so to make a single argument about on the merits?

    The Crooked Timber "I'm compromised, my hands are tied" post wasn't enough explanation for me and, I'm sure, other readers of yours.

    Thank you; I value your input.

    ReplyDelete
  56. "During the bull market between 2001 and 2007, were you loudly praising the City and saying that people like me ought to have our advice sought out on any question under the sun?"

    Heh, hark at you. No, no I wasn't, I'm afraid, sorry to disappoint you. But your comment is a useful insight into your eagerness to proselytize on all and sundry.

    I've nothing against the City per se. Indeed, I'm entirely open to the idea that flexible and liquid capital markets are good things and the banking industry requires very bright people who will demand high salaries, for what is, I'm sure, a pretty stressful job.

    Is it an ad hominem? Not really, I just think the piece would have been more intellecutally honest if it had been framed along the lines of "Lessons learnt from PRP in the City and how they can be applied to other industries". But, of course, you were never going to do that.

    And are still yet to answer the basic question as to why it's OK for you to preach from the sidelines to other industries about how they incentivize their staff, when the industry you work in (and let's be honest, the wider global economy) is on its knees, in part, because banks didn't incentivize their staff properly. There's no escaping the dissonance of this.

    I'm sure I'll get a load of bluster or a "I value your input" style response, which will be nicely written, with perhaps even a witty bon mot, but I'd really rather have a straight answer, if that's OK?

    ReplyDelete
  57. they tend to encounter ferocious pushback from the employees, and since those employees are typically well-unionised, articulate and political, one can see why they don't get introduced

    Well yes, that's the "sectional interests" argument: I think teachers find it annoying because it tends to neglect the possibility that they actually know what they're talking about on this one. (It really pisses people off to have it assumed that they're doing something purely out of desire to evade scrutiny or hard work or what you will when in fact, it is possibly knowledge and understanding of the problems invovled which shapes their views.) It also neglects the reality that however well-unionised teachers may be (and having four different unions doesn't actually tend to help the workforce in this respect, to be honest) they do not in fact strike very often and there is no reason to think, given the number of unpopular-with-teachers initiatives that have actually been introduced, that this one is on the shelf because the unions wouldn't wear it.

    I'd give the headmaster a budget based simply on a headcount of staff and let him divide it up how he liked

    I very much doubt that headmasters and headmistresses would thank you for this, and not particularly because they're scared of offending their staff. It would cause them a great deal more trouble than it would be worth.

    ReplyDelete
  58. I'd really rather have a straight answer, if that's OK?

    Make a point (you can follow the example of ejh and Cian, who are making points) and you'll get one. As it is, you're just a) claiming that I have no standing to write about the subject and b) complaining about my tone. To which the straight answers are a) "Yes I do, it's my blog" and b) see a). Further responses to your comments are unlikely until you comply with this polite request.

    I think teachers find it annoying because it tends to neglect the possibility that they actually know what they're talking about on this one

    I'm not neglecting this possibility; but (and you know how much I hate repeating myself) it's not actually what the evidence says. John Hattie's work doesn't suggest that PRP would be a cure-all (although his main contention is that pupil self-assessments of how much they're learning are the best measures of achievement, which is not particularly popular either), but it does say, in the view of the man himself, that it shouldn't be rejected out of hand. And that's based on a significant body of work that can't be brushed off. There has to be a fine line trodden between respecting the tacit knowledge of people in a field, and just taking at face value their rationalisations for existing practices.

    I have a similar relationship with the evidence-based medicine literature, in which field I tend toward the belief that things have swung too far in the direction of scientific management and away from practitioners' knowledge. But on this issue, I really don't think it's the managerial side that's being dogmatic.

    It would cause them a great deal more trouble than it would be worth

    Depends on what you consider a smallish but definitely measurable improvement in standards to be worth. This could be right or could be wrong, but frankly my view is that measuring and managing the performance of employees is the very essence of management, and if this is "too much trouble" for headteachers, that's actually pretty poor. It was suggested above that schools already have performance-related *promotion*, which presumably isn't entirely hassle-free.

    ReplyDelete
  59. It's not that the judging employees bit would be too much trouble - it's that the subsequent resignations (not necessarily imediate, but pissing people off does tend to have this effect in the long run) would quite likely rather outweigh the retentions that the scheme is presumably supposed to assist. Now if you think it's just the bottom 10% who would be pisssed off enough to leave, then fine, but I don't think it will - I think it'd be the middle people, who could find jobs elsewhere and who you do not, in fact, want to lose. It's precisely these people who tend to leave the profession as it is and it's for this reason (among others) that people with knowledge of the field are liable to think that PRP schemes are likely to make things worse.

    It was suggested above that schools already have performance-related *promotion*, which presumably isn't entirely hassle-free.

    Nope, but you've got a lot more scope in that field: you can appoint from outside, for instance. Besides, at the end of the day, everybody accepts that if there's a post to be filled then somebody has to fill it. Nobody accepts that there has to be a bonus scheme. This is a gigantic difference.

    a smallish but definitely measurable improvement in standards

    I'll allow myself a large dose of scepticism about that "definitely".

    ReplyDelete
  60. I'll allow myself a large dose of scepticism about that "definitely".

    have you read the research paper linked above? If you have, fair enough, but scepticism on these things needs to be informed scepticism.

    On the subject of retention - is there any evidence that this has actually happened in Northern Ireland, where something like the system I suggested has apparently been in place since 2003? (It apparently had decent union support locally, possibly related to the fact that 80% of those applying for it got it. The NAS/UWT guy who was quoted in the report I read made the pretty decent point that pre- the Threshold scheme, classroom teachers tended to reach the top of the scale then top out, unable to get further salary increments without taking on administrative responsibilities which they didn't want.)

    ReplyDelete
  61. "Make a point (you can follow the example of ejh and Cian, who are making points) and you'll get one."

    Great body swerve. I can only assume you are either embarassed or your self awareness gene has gone on vacation. You don't strike me as the type to get embarrassed, so I'll assume the latter.

    I'd also really like to see the shrift the average banker would give to a school teacher who saw fit to comment on pay and conditions in the City, no matter well informed. You'd need a big microscope to see it.

    ReplyDelete
  62. Anonymous, you're not actually making a point, so I can't really respond to it. Nevertheless, I value your input, although at a somewhat diminishing rate with each repetition.

    EJH: further to the above, it's clear that they could have introduced the extra three "upper" payscale grades in Northern Ireland simply as additional increments to the salary scale and without a performance hurdle to reach them. Just to get clear here - you presumably think that this would have been a better thing to do?

    ReplyDelete
  63. the dread plague o'er the land of children being given extra chances to put in their best performance on exams

    I maintain that this is a really, really bad idea, on straightforward is-it-a-measure-is-it-a-target-or-what grounds. You teach, you stop teaching, you test, how well have they done? It's a simple and universally comprehensible measure, which fundamentally doesn't need any dicking around with. (They'll only ask for improvements next year anyway, until you deliver them and they tell you testing standards are falling.) Apart from anything else, under a generalised test/retest regime teaching in between test and retest would be explicitly geared to What's On The Test, which is no kind of education in anything except passing tests.

    H'mph, I say. (But I make a point first.)

    ReplyDelete
  64. Actually, the more I think about it, the only thing I can find wrong about the multiple test-entry idea is that it's apparently being done selectively. I'd be in favour of every kid being given two or three chances at the exams, best score wins. I suspect that this might not be practical in time or stress terms though.

    They'll only ask for improvements next year anyway, until you deliver them and they tell you testing standards are falling

    long term readers of my output will be aware that my enthusiasm for PRP comes in the context of a long history of giving short shrift to this (admittedly ubiquitous) kind of bollocks.

    Reading around the literature on Actually Existing PRP (and by the way, can we have some recognition that such a thing does exist please), I find an abstract of a paper suggesting that one effect of bringing in the Threshold regime is that it's mightily strengthened the relationship between teachers and their unions, as the teachers seek to get informed of their rights with respect to their PRP.

    ReplyDelete
  65. I'd be in favour of every kid being given two or three chances at the exams, best score wins.

    Why? I'd be in favour of as much time as possible being spent teaching the subject and as little as possible teaching to the test. It's not as if saying you're going to teach the subject means there's no room for improvement in how you do it.

    ReplyDelete
  66. Fair enough point, but given that teaching to the test is more or less what we're given to deal with (pending massive changes to the education system), I think a generous regime of retakes seems like the fairest way to compensate for this.

    ReplyDelete
  67. City good, City bad. Yawn.

    I got the shotgun. You got the briefcase. It's all in the game, though, right?

    I say: bring 'em teachers into The Game!

    ReplyDelete
  68. About this John Hattie paper... where's the link?

    ReplyDelete
  69. They're not totally unknown in teaching, but they tend to encounter ferocious pushback from the employees, and since those employees are typically well-unionised, articulate and political, one can see why they don't get introduced.

    I second Justin's point. Teachers have been against many of the changes of the last 20 years (sometimes with good reason, sometimes not) and have failed to prevent any of them. They have also been in favour of a recent proposal to reform 6th form education, which had impressive academic support, and yet the government chose not to implement it. What is so unique about Performance Related Pay?

    I'd give the headmaster a budget based simply on a headcount of staff and let him divide it up how he liked (subject only to the normal law of the land on misappropriation, discrimination and nepotism). I just don't believe that headmasters, or even university heads of department, are as corrupt and cronyist as seems to be suggested. Since the purpose of the scheme is to encourage improvement, not to reward intrinsic merit, I don't see any need to get into complicated value-added accounting schemes or indeed link the thing to exam results at all.

    Well yes, why not indeed. Why not give headmasters more control over what they pay their teachers, and give schools with poorer pupils more money that ones that take middle class kids. If the headmaster gets it wrong, well he's the one ultimately responsible for the school and so long as there is a way to make him responsible (which there decidedly isn't in the University system), this could work. Difficult to get it right, and difficult to get the balance right between accountability, responsibilty and measuring performance in a meaningful and lightweight fashion (currently we measure it in a fairly meaningless, and very heavyweight fashion - though it is starting to improve).

    ReplyDelete
  70. Fair enough point, but given that teaching to the test is more or less what we're given to deal with (pending massive changes to the education system), I think a generous regime of retakes seems like the fairest way to compensate for this.

    But what's the point. All you've achieved is grade inflation. The kids aren't getting higher grades because they've learnt more, or are smarter. How does society gain? What exactly has this achieved. I might be misunderstanding your argument, but you seem to think that if GCSE grades are higher, this is a good thing regardless. Why?

    ReplyDelete
  71. Well, it's a good thing for the kids. I just think that if we're going to use exams to measure educational achievement (and if you have a better way then great, but I might look at repurposing some arguments about "never been done before" from above), then best of three goes is a fairer way of doing the measurement than one-time-counts-for-all.

    ReplyDelete
  72. (and the benefit to society is that it's not occasionally deprived of the talents of someone who was actually bright enough to pass the exam, but who happened to be having a shitty week the week the exams were on).

    ReplyDelete
  73. Meanwhile, the promotion of rote-learning and generalised grade inflation costs society inasmuch as it loses the capacity to distinguish the bright from the bright-ish, and in particular the bright student who gets it from the efficient exam crammer.

    ReplyDelete
  74. But if you're worried about grade inflation (I'm not, not all that much), you can deal with that by making the exam or the marking scheme harder, and in any case a couple of years out of school, the bright student who gets it will distinguish himself from the efficient crammer anyway, as long as he isn't denied a chance by, say, a bad hay fever or being dumped by his girlfriend or something.

    ReplyDelete
  75. Here, another namealike for you

    ReplyDelete
  76. (Ah, bugger, I see you have him already - it was below the bottom of my screen. I do apologise. As you were.)

    ReplyDelete
  77. You seem to have fallen into the trap of trying to build a perfect system. All forms of education assessment are unfair. Exams are unfair on people who are unable to cram (which is largely useless in everyday life) and people who do not perform well under pressure; coursework addresses those problems, but is unfair on poorer kids (or those who can't buy an essay on the internet). There are always winners and losers when people tinker with the assessment system.

    (and the benefit to society is that it's not occasionally deprived of the talents of someone who was actually bright enough to pass the exam, but who happened to be having a shitty week the week the exams were on).

    You can't build a system that will be fair to everyone. Its impossible. A system that marginally decreases unfairness for some students (while at the same time giving other students higher grades than they deserve) at considerable cost (exams cost money) and resources (the time devoted to taking multiple exams, cramming, etc) and almost certainly reduces the time spent on actual education - seems like a poorly designed one. The best that you can do is balance fairness against resouces. I don't think you've done a very good job here.

    Anyway, there are far simpler solutions to this problem. One method is simply to move to a modular system with exams through the year. It may not be the optimum solution, but it has th benefit of being simpler, less stressful and cheaper; while also (at A-levels) solving a couple of other problems as well.

    ReplyDelete
  78. "I'd be in favour of as much time as possible being spent teaching the subject and as little as possible teaching to the test."

    I've never really understood this objection. For the most part you should design tests to test the things you want students to be learning about a subject, right? So teaching 'to the test' will involve some fairly large amount of learning about the subject if you design the test well.

    Will the test be a perfect mirror of learning? Probably not. But will it be better than the current "how the hell can anyone possibly know if students are learning" approach currently advocated? I would strongly suspect, yes.

    ReplyDelete
  79. I've never really understood this objection.

    Do you work in education?

    ReplyDelete
  80. So teaching 'to the test' will involve some fairly large amount of learning about the subject if you design the test well.

    All we have to do is design the test well. Genius. Job done.

    I think what most irritates me about these kinds of discussions is the assumption by people with almost no experience that it is easy.

    ReplyDelete
  81. I can't see anything wrong at all with generous retakes, and a lot right with it. It makes no sense to say: "If you didn't get it right the first time, you can't be similarly rewarded for getting it right the second or third time." The issue surely ought to be whether the kid has (finally) got it right or not, not how many goes it took. (And I write as someone who did well out of a certain animal cunning for exam-taking in a one-strike-and-you're-out regime.)

    ReplyDelete
  82. What if the kid scores worse and worse every time, like that poor bastard in 334 - should the best grade count or the last one?

    ReplyDelete
  83. cramming (which is largely useless in everyday life)

    I disagree. Cramming is the rapid reading, understanding and (short-term) memorization of facts. This is massively useful for many professionals: barristers, press secretaries, cabinet ministers, auditors, forensic accountants, consultants, stockbrokers, newsreaders, ghostwriters, middle managers. Anybody who ever has to digest a complex set of facts and then present them clearly. I think it's probably a useful skill for bloggers too.

    ReplyDelete
  84. "Do you work in education?"

    No but I've taken a lot of test, and many of them, in lots of disciplines, including in the 'hard to test' disciplines like English Lit, and found that the tests really did reflect the learning in the class to a large degree.

    "All we have to do is design the test well. Genius. Job done.

    I think what most irritates me about these kinds of discussions is the assumption by people with almost no experience that it is easy."

    This kind of response would deserve a 'Fail' in a test.

    First you might want to note that I didn't say anything about it being 'easy'. It might not be 'easy'. But the entire history of schooling suggests that it is 'possible'. As d-squared suggests it is awfully convenient that teachers seem to think that the only thing they can't use tests to measure is whether or not they are succeeding.

    Look, if you believe that learning can't be tested, you are going against pretty much the entire actual history of education in the last 200 or so years. You also have to come up with a very interesting version of 'learning' to largely divorce it from even moderately well designed testing.

    If you want to argue that generalized testing isn't great at making fine distinctions between students (like the difference between the top 0.1% performer and the top 0.05% performer) I suspect you may be correct. If you want to argue that some people don't test super well so that they are testing somewhat below their actual learning level, I'm willing to agree with that too.

    But you can't cling to either for the purposes of the discussion. Testing is perfectly adequate to detect the differences in learning that we want out of the average student on a year to year basis. And for basic literacy and math skills, teaching the test is closely tied to teaching the skills themselves. For broader analysis, testing may not be *as tightly* correlated to the learning. A careful reader might note that is not the same as saying *poorly* or *not* correlated. But we aren't even there.

    We aren't at the point where all the basic, easily testable skills are being mastered by nearly all the students. When we get there I'd be thrilled to revist the utility of testing at the highest levels of learning. I'm highly skeptical of the idea that we just can't test and track the acquisition of basic skills over time. That strikes me as a self-serving delusion that teachers don't even really believe. If they did, they wouldn't test students so often.

    ReplyDelete
  85. if you believe that learning can't be tested

    Strawman.

    We aren't at the point where all the basic, easily testable skills are being mastered by nearly all the students.

    Red herring.

    Getting back to your previous comment, the reason neither Cian nor I took it seriously is that it's trivially true: a test which doesn't test what you're trying to teach is a bad test. However, that doesn't address the difference between teaching the subject and teaching to the test.

    Here's an illustration, from a field fairly close to mine. Teaching the subject: teach the law on advertising in historical perspective, discussing the difference between an 'advertisement' and an 'invitation to treat', and spending some time on the Carbolic Smoke Ball case. Test this knowledge by setting a hypothetical case in which an advertiser made promises they didn't intend to keep, and asking how they would decide it and why.

    Teaching to the test: tell them all about Carbolic Smoke Ball - and when you've finished telling them about Carbolic Smoke Ball, tell them about it again.

    Both methods will get good marks on the test, but the second will get a higher mean and a narrower spread - and hence be less valid as a measure of how much people have actually learnt.

    ReplyDelete
  86. "if you believe that learning can't be tested

    Strawman.

    We aren't at the point where all the basic, easily testable skills are being mastered by nearly all the students.

    Red herring."

    How is that a red herring? Is it in fact the case that the easily testable skills are being mastered by nearly all of the students? Isn't the fact that such learning does not appear to be taking place an enormous part of the reason why we are having the discussion in the first place?

    As for your example, please note that in both cases the learning is testable.

    This seems to reinforce my point that good learning is in fact testable. You objection doesn't appear to be the "it isn't measurable" objection used by teachers unions in the US. You seem to have some other objection. That is progress of some sort, as many seem to believe that it just isn't testable at all (that isn't a strawman, despite the fact that the view is IMO ridiculous).

    To step ahead in the argument I will speculate about your actual objection. If I'm wrong, please tell me.

    I suspect you object that while things are testable, creating the test provides an incentive to teach the test instead of teaching the course. This is the teacher-side of the common objection that many students try to learn the test instead of learning the text. Teachers historically deal with it by making the test broader and mixing it up a bit. That type of solution seems available in the test design phase (see for example, your first example...)

    Further I think it makes the mistake of comparing only the negative possible effects of the test creation. It is important to recognize the negative possible effects of introducing such a test. But the objection would be much stronger if we were in a situation where we believed that most of the students in question were currently learning the subject matter the way you want them to learn it.

    But in actuality, a lot of them are either already just learning the tests that they do take (so introducing a test of the teachers doesn't change their position) or that they aren't EVEN learning enough for the test (in which case their position is improved).

    Now you can, if you want, argue that the number of students already just learning for their tests plus the number who aren't learning isn't a big number, or isn't a large enough number compared to the vast number who are learning the way you want to be worth risking the negative side of testing procedure.

    But I'd like you to affirmatively say things like that rather than just focus on an idealized situation.

    ReplyDelete
  87. How is that a red herring? Is it in fact the case that the easily testable skills are being mastered by nearly all of the students?

    No, it is in fact the case that we haven't been talking about testing basic numeracy and literacy skills up until the point when you decided your argument would be stronger if we were. Hence, red herring.

    Now you can, if you want, argue that the number of students already just learning for their tests plus the number who aren't learning isn't a big number, or isn't a large enough number compared to the vast number who are learning the way you want to be worth risking the negative side of testing procedure.

    I have no idea what, specifically, you're talking about, and I suspect you don't either.

    Rewind. We have tests and no one is saying we shouldn't have tests; the option of not having any tests is not on the table. My point - the specific point you objected to - was that introducing more tests, and specifically introducing repeat tests, was likely to result in more teaching-to-the-test and less teaching-the-subject, and that this was a bad thing. I don't really see anything in your comment that counters either of these points, except perhaps your suggestion that lots of kids probably aren't learning-the-subject as it is - but in that case presumably what we need is more teaching-the-subject, not less.

    ReplyDelete
  88. Two more. Introducing more tests means less actual time for teaching (because test take up time). Introducing more tests means less actual resources for teaching (Tests cost money to take, and also absorb teaching time in extra admin). Repeat tests mean less resources and money for teaching.

    ReplyDelete
  89. First you might want to note that I didn't say anything about it being 'easy'. It might not be 'easy'.

    I'm an engineer, I build things (in a fluffy human factors/industrial designer kind of way via academia - but the point stands). When people tell me that we all we have to do is to build a good solution my instinct is to punch them hard. Designing "good" tests is very difficult, and there's a huge literature on the various problems, compromises. There's an additional literature on the problem of non-transferrable skills - that is students who can pass tests, but are unable to use those skills in everyday life (which some would argue is the purpose of education); and how we can better design tests to address this problem. There is no silver bullet; implementation is hard.

    But the entire history of schooling suggests that it is 'possible'.

    And this is the point where I would fail your essay. How do you know this? Possible for who? Everyone? the top 15%? How do you know what contribution testing made to education, or what the various outcomes of this education were? By what criteria have you decided that education is succeeding? What is it succeding at? Clearly your education didn't teach you how to think very critically, but then that is a hard thing to test for in an exam.
    Does the fact that schools push students successfully through exams mean the kids are learning. Discuss. Maybe it does, but its hardly the truism you're suggesting. It needs, what's the word, an argument.

    As d-squared suggests it is awfully convenient that teachers seem to think that the only thing they can't use tests to measure is whether or not they are succeeding.

    I don't see why. Its pretty hard to use tests, statistics to measure whether most things are succeeding when they involve people. But I forgot, you're a theoritician rather than a doer. It's hard to find causation in complex social environments - and extremely difficult to find it with any granularity at the individual level, particularly when those individuals have agency. This is true of teachers just as it is of doctors, politicemen, etc. You can do quite a lot with tests and statistics, and nobody is suggesting that they should be done away with. But the idea that you can use them as a means to "grade" teachers is a fantasy.
    I have no problem with measurements, or testing. I just wish that people would use them acknowledging their error rate, and the distortions that are introduced into any system when you start measuring it.

    Look, if you believe that learning can't be tested, you are going against pretty much the entire actual history of education in the last 200 or so years. You also have to come up with a very interesting version of 'learning' to largely divorce it from even moderately well designed testing.

    Another fail, unless we're excluding informal testing. Its largely how learning has happened for centuries. Still is in some areas. Some would argue that testing is an artefact of increased bureacratisation and the increasing abstraction of education from practical skills. Not sure I'd necessarily agree with them, but the argument has been made pretty strongly.
    Oh and a fail for confusing informal and formal testing. There's probably a few more fails to come (there are quite a few categories of tests).

    Testing is perfectly adequate to detect the differences in learning that we want out of the average student on a year to year basis.
    This strong and convincing argument is based upon?

    And for basic literacy and math skills, teaching the test is closely tied to teaching the skills themselves.
    Actually they're not. Its a big problem, students who can pass the tests are unable to use the skills in everyday life, or on real problems. And not infrequently the inverse (particularly true for maths). These are the easy things to test, you say?

    We aren't at the point where all the basic, easily testable skills are being mastered by nearly all the students.

    So lets get this right. Your argument is that students are failing 'easy' tests, so we need (more?) tests?

    When we get there I'd be thrilled to revist the utility of testing at the highest levels of learning.

    What relevance does the failure of the bottom X% of students have to the education of the top X%? And actually we were talking about GCSEs, and I'm guessing those students retaking are around the C/D level, which means that they have they basic skills. So basically you don't know what you're talking about. Which would be fine if you were a little more polite.

    I'm highly skeptical of the idea that we just can't test and track the acquisition of basic skills over time. That strikes me as a self-serving delusion that teachers don't even really believe. If they did, they wouldn't test students so often.

    Why are you telling us this? Nobody has made this argument.

    I suspect you object that while things are testable, creating the test provides an incentive to teach the test instead of teaching the course. This is the teacher-side of the common objection that many students try to learn the test instead of learning the text. Teachers historically deal with it by making the test broader and mixing it up a bit. That type of solution seems available in the test design phase (see for example, your first example...)

    Teachers aren't responsible for the design of GCSEs. Exam boards are.

    Further I think it makes the mistake of comparing only the negative possible effects of the test creation. It is important to recognize the negative possible effects of introducing such a test. But the objection would be much stronger if we were in a situation where we believed that most of the students in question were currently learning the subject matter the way you want them to learn it.

    Given that we've had formal testing at the 16+ level in the UK for at least 60 years, the relevance of this point escapes me.

    ReplyDelete
  90. "Designing "good" tests is very difficult, and there's a huge literature on the various problems, compromises. "

    Yes I'm well aware. Which is not the same as saying we shouldn't do it, right?

    "There's an additional literature on the problem of non-transferrable skills - that is students who can pass tests, but are unable to use those skills in everyday life (which some would argue is the purpose of education); and how we can better design tests to address this problem. There is no silver bullet; implementation is hard."

    Which again, is not the same as saying that we shouldn't do it. Because you are aware the teachers through their unions are pretty much saying that we shouldn't do it, right? If you want to say that the problem is difficult and requires a lot of time and thought, I'm right there with you.


    "I don't see why. Its pretty hard to use tests, statistics to measure whether most things are succeeding when they involve people. But I forgot, you're a theoritician rather than a doer. It's hard to find causation in complex social environments - and extremely difficult to find it with any granularity at the individual level, particularly when those individuals have agency."

    You're getting deeply muddled here. I claim that teachers use tests to measure the learning in their students all the time. Students are people. So comments like "Its pretty hard to use tests, statistics to measure whether most things are succeeding when they involve people." as an explanation for why we can't judge teachers (who are people) but can judge students (who are people) can't be quite on spot. You then veer into the causation issue (maybe we can't tell if the outcome is due to teaching). That is of course possible in a one time test with one student. But you can track students over years. You can track multiple students. If you don't think you tease generally useful trends out of that, I suspect your knowledge of testing and statistics isn't as good as you are representing on the internet.


    "Testing is perfectly adequate to detect the differences in learning that we want out of the average student on a year to year basis.
    This strong and convincing argument is based upon?"

    It is based on the fact that one of the main legitimate complaints about testing is that it isn't great on super-fine levels of discrimination especially at the high levels. This is one of those facts that you would expect someone knowledgeable in the subject to know. There is all sorts of literature on the fact that the SAT for example is a pretty good predictor of college success at the middle levels (it discriminates well between the 30% and 40% student, the 60% and 70%, the 70% and 80%, but not the 96% and 97%, nor the 15% and 20% level test takers.) This is a legitmate criticism of such tests so far as it goes. But as I said, it doesn't go very far when you want to know how most of the students are doing since by definition most of the students aren't in the tail-end extremes.

    "And for basic literacy and math skills, teaching the test is closely tied to teaching the skills themselves.

    Actually they're not. Its a big problem, students who can pass the tests are unable to use the skills in everyday life, or on real problems. And not infrequently the inverse (particularly true for maths). These are the easy things to test, you say?"

    Now I'm beginning to think you are arguing in bad faith. Literacy is the skill of being able to read and comprehend the reading. That is very testable at the middle range of testing that we are talking about. And you are going to have to be specific about the types of 'real life' math skills for non-engineers that you think can't be duplicated in a test. You can have a test about making change, sticking to a budget, or figuring out how much interest you get after a year. I've seen them.

    "We aren't at the point where all the basic, easily testable skills are being mastered by nearly all the students.

    So lets get this right. Your argument is that students are failing 'easy' tests, so we need (more?) tests?"

    Huh? I'm arguing the we have tests that we currently believe test student learning to some level of precision (not perfect but welcome to the world). Many students aren't performing adequately on these tests. The fact that they aren't performing adequately suggests a failure on the part of the student to learn, a failure on the part of the teacher to teach, or some combination of both. A one time test with just one student wouldn't be able to discriminate between those. A number of tests over time, with many students as they pass through teachers might show patterns. Patterns like: "Teacher A has students who on average enter with low skills and exit with higher ones while Teacher B has students who generally don't improve over a year's stay". Or: "Teacher C has students who improve in math but nothing else, perhaps we should encourage him to change techniques in non-math areas but leave them alone in math". Or: "Teacher D seems to be really good with students who already have many skills, but students who are a little behind fall further behind". If we try to use it for Teacher A is exactly 1.2% better than Teacher B, I agree that we are misusing the imprecision of the test. But I would be surprised if the broad trends aren't reflected in even a test with only fair design.

    So given my actual argument, on to your objections. It may be possible to tweak the tests we are already giving to get that outcome. In fact I would strongly suspect it is likely as the general trend of students learning under a particular teacher is exactly what we want to know about. No 'further' tests required. No extra time taken away from the classroom that isn't already taken away from the classroom.

    "I'm highly skeptical of the idea that we just can't test and track the acquisition of basic skills over time. That strikes me as a self-serving delusion that teachers don't even really believe. If they did, they wouldn't test students so often.

    Why are you telling us this? Nobody has made this argument."

    Umm, you kind of have. The measure of a teacher's effectiveness on average is his ability to have students leave with more learning than they came in with. You argue that is untestable or nearly impossible to test. But if you think you can test student learning, you can also track that over time, from teacher to teacher. Individually, it wouldn't be wise to judge a teacher on a single student's performance. Maybe that one student had a bad year. Maybe that one student doesn't want to learn. But trends are likely to show up over time. If one student doesn't want to learn, he won't learn much under any teacher. The students won't all have bad years all the time under one teacher.

    So either you believe that students can't really be tested for performance well(and some people really believe that) or you think there is some problem with analyzing the data for trends like that. You have argued on the 'testing sucks' side of things. But if you think that testing for students is generally possible, you are barking up the wrong tree from a testing sucks point of view in this argument. You ought to be arguing about why student performance on average over a number of years shouldn't reflect on a teacher.

    "Teachers aren't responsible for the design of GCSEs. Exam boards are."

    So what? First, they have plenty of input into that through various means. Second, that has nothing to do with the fact that they use tests to measure students all the time. This suggests that the difficulty isn't in the impossibility of using tests to measure student performance.

    "Further I think it makes the mistake of comparing only the negative possible effects of the test creation. It is important to recognize the negative possible effects of introducing such a test. But the objection would be much stronger if we were in a situation where we believed that most of the students in question were currently learning the subject matter the way you want them to learn it.

    Given that we've had formal testing at the 16+ level in the UK for at least 60 years, the relevance of this point escapes me."

    I think you aren't following the thread of the conversation. It is rather long, so I guess that is understandable. I've had a number of arguments employed which suggest that introducing testing which applies to teachers could have certain negative effects. The part you quote is where I am attempting to say that such an objection would have more force if we believed that there was already adequate success in students acquiring the skills we want them to acquire. And maybe you believe that generally students are doing just fine. That is a perfectly acceptable argument, though not one that anyone seems to be employing here.

    But if you agree that they are not, on average, learning enough, the objection that teaching to the test will become prevalent holds less force as current student (under this view) aren't learning the test nor are they learning in an adequate way which just happens to escape the test.

    I thought we were arguing with the presupposition that students aren't doing adequately now.

    A) They aren't performing well enough on the tests we already ask them to take.

    B) That they aren't performing poorly on the tests but mastering the subjects anyway such that we shouldn't worry about it.

    If you believe that they are performing adequately on the tests, or that they aren't performing well on the tests but that they are learning adequately anyway, please just say so. That would represent totally different objections from the ones I thought we were talking about.

    ReplyDelete
  91. I thought we were arguing with the presupposition that students aren't doing adequately now.

    I must confess that I don't understand why Sebastian is claiming this to be an obvious factual truth; as far as I can see it's not.

    ReplyDelete
  92. "I thought we were arguing with the presupposition that students aren't doing adequately now.

    I must confess that I don't understand why Sebastian is claiming this to be an obvious factual truth; as far as I can see it's not."

    I'm not claiming it as an obvious factual truth. I thought it was one of the underlying, unstated assumptions and I wanted to raise it explicitly to make sure that it was.

    It obviously shapes the discussion. If you believe that in general most students are learning most of what we want them to learn, arguments in the "this will rock the boat" vein have lots of force. If you believe that they generally aren't "this will rock the boat" has force (as rocking the boat doesn't have to make even bad things better) but much less than the case where everything is going swimmingly.

    Now I took it to be an unstated and generally agreed assumption because I would expect the argument to go something like this if it weren't: "We already have good teaching. Trying to measure teacher performance is likely to screw up the already existing good teaching. Therefore we should be cautious about doing it". Most of the arguments didn't seem to be like that. Most of them seemed to be more like "Testing for this is impossible or nearly impossible, therefore why bother". But there have been recent hints about the first argument, so I thought I would see if that was what Phil and cian are saying.

    Interestingly I'm much more agnostic about whether or nor adequate learning is actually taking place than I am about the alleged difficulty of tying it to teachers.

    ReplyDelete
  93. I thought I would see if that was what Phil and cian are saying

    What Phil was saying wasn't either of those things, but simply that there's a difference between teaching the subject and teaching to the test, and that more time doing B would mean less time doing A. Your response to that was "I've never really understood this objection." I hope I've clarified it now.

    ReplyDelete
  94. Note to self: Do not post when running a fever, can make you cranky and boring.

    Sebastian, I was simply trying to make the point that formal testing (not the same as informal testing) is an art, rather than a science and all testing is flawed. That's not a reason not to use tests, but a reason to be cautious about how one uses the results. There's also a common misconception (which Daniel seems to have), that the purpose of education is to produce exam certificates. Well in some ways it is, but at a wider social and economic level we care less about whether somebody got an A, or a C, than whether they can do the job. There's a terrible temptation for politicians to focus on test results because they're easily measurable, rather than the quality of education which is a vague unquantifiable thing that nobody can really agree with. And when you start focusing on test results, you start focusing on improving test scores. Well there are some easy ways to get there, but we do we really want a generation educated in crammers?

    As for using student test results as a proxy for teacher performance, given the vast range of other factors that also affect student performance... I think we should rate teachers, but we should be realistic about how accurately we can do so, its limitations, what we can achieve and what the costs of doing so are likely to be. I'm making a pragmatic and practical argument, rather than a deeply philosophical one. The idea that you could somehow measure, with high accuracy and objectively, individual performance of teachers strikes me as improbable, but clearly I'm not the stats god that you are.

    Anyway, I don't really understand the arguments that you're trying to make and I think you've read a lot into my and Phil's comments that wasn't there. I get the impression that you're American, so bear in mind that the educational landscape in the UK is completely different and the arguments are different.

    ReplyDelete