Well, it’s Final Exam week for me, so tests have naturally been on my mind. In teaching undergrads, (especially pre-med students), I’ve gotten many anxious questions about exam grading and how the exams might be curved. While I have no control about how grades are curved or scaled for my students, I’ve been thinking about the many different ways to accomplish this task, both traditional, and some more off-the-wall ideas that I’ve had.

First, the traditional ones:

**What You See is What You Get**

The scale is set ahead of time, and the percentage of the points that you get is your score, plain and simple, no haggling. While this seems to leave no wiggle room for the grader, the grader will no doubt consciously unconsciously scale the grades as he goes along, particularly if there’s partial credit to be awarded (and there almost always is). In addition, the grader may find himself choosing the letter grade that the solution deserves, and then adjusting the points to fit that grade.

**Normalizing to Unity**

Take the highest score, and add the necessary points to get it to 100%. Then add the same number of points to everyone else’s score. Really, to make it a true normalization process, you should divide all scores by the highest score to get the new percentage, but adding points is more beneficial to the students, and is what is usually done. The good thing about this is it has some ring of fairness to it — students can’t complain about unfairness, because at least one student got an A.

However, this approach can lead to civil unrest in the case of a “curve-breaker,” a student who scores significantly higher than the rest of the class, thus reducing the benefit of the curve. There are ways around this: either curve the outlier’s score above 100, or, if that offends you, just give that student less of a curve than anyone else. If they complain about getting “only” 100%, then tough noogies.

**Normalizing the Average**

Instead of rescaling the top score, rescale the average score to whatever you’ve decided the average grade to be, usually a B or a C. This can again be accomplished by either addition or multiplication, and faces the same potential problem of putting students above 100%. Also, what if the students score better than expected? Are you really going to scale them down to a C?

**The Good ‘Ol Bell Curve**

Find the average, find the standard deviation. Everybody within one sigma is a C. Two sigma is a B or a D, beyond that is an A or an F, where appropriate. I don’t know if this is really done much, and I always thought it was unfair, but I was in a class where the Central Limit Theorem really applied. However, in a huge state school like mine, scores might actually approach a normal distribution in the large lecture classes.

But do we really want to give 68% of our students a grade of C? I don’t think C means average to most people anymore, so if you were to use this one, you probably would want to make the average correspond to a different grade.

And now, for some more creative ideas….

**Take the Square Root**

Take the score as a percentage, then take the square root. This will shift everyone up, but some more than others. Students near the extremes of 0 and 1 get the least benefit, and students at .5 get the greatest benefit. However, if you write the test such that everyone gets above 50%, then you are squishing the scores together closer to the top. I’m not sure what tangible benefit this would have, other than helping out the poorer students.

**Weight Your Own Grade**

Let the students put a star next to the question that they are most confident about, and count that question double, renormalizing appropriately. In most things, it’s good to get the right answer, but it’s worth a lot more to get the right answer and know for certain that it is right. Otherwise, you’re just guessing and asking someone else to check your work, which is significantly less valuable.

This scheme rewards the best students, the ones who arrive at the correct answer confidently. You can also make the weighting optional, so that less confident students don’t have to star any of the questions.

There are other possible variations of this: you can have the students rank their confidence in all the questions on a scale and weight appropriately, you can allow them to star more than one, etc.

**Partial-Credit Multiple Choice**

Many multiple choice tests have some wrong answers that are plausible (off by a small factor, for example), and others that show that the student has no understanding of what he or she is doing (saying that the Mississippi River is 2 inches long). It should be fairly simple to devise a method for giving partial credit for reasonable but wrong answers, and it might make multiple choice exams more viable in higher levels of education.

**“No One’s Perfect”**

Normalize to 99%, even if someone would otherwise have a perfect score. I actually had a professor in undergrad who did this, shrugged and said “Hey, no one’s perfect.”

**Oral Exam to Earn Back Your Points**

The same professor also had an oral part to our exams, where we had to come into his office, having studied the problems that we got wrong on the exam, and he would ask a question about one of them. We had to answer on the chalkboard, and he would give back points as appropriate.

**The 0% Gamble**

A multiple-choice final, graded normally. But if you get EVERY question WRONG, you get an A for the entire semester. How many students would take this gambit? Would it really be as difficult as it sounds? The world may never know…

**The Time-Saver**

Give everyone a B+. See if anyone notices or complains. Try the same thing with an A+. Compare notes.

**The Stair Method**

Take the box of papers to the top of a stairwell, and toss them in the air. The papers that land at the top are A’s, the papers at the bottom are F’s, and scale the ones in between appropriately.

Good luck to everyone on exams! And to my fellow TA’s, have fun grading. If you have any other good (or interesting) ideas about how to grade tests, leave them in the comments.

For more of thoughts on grading, see here and here.

UPDATE:

Another opinion on curving grades here.

It is problematic that under the standard deviation method that 68% of the grades will be C, but part of that comes from the very wide band for C. Assuming, for instance, grades with a mean of 60 and a SD of 10, C students would get grades between 50 and 70, while D students would be between 40 and 50 and B students would be between 70 and 80. So not only do most students cluster around 60, but the window for C is twice as big as the window for B and D.

Perhaps a better solution would be to have C students be between -0.5 and 0.5 SD over the mean, B students be 0.5-1.5SD over the mean, D students be 0.5-1.5SD below the mean, etc.

Another option would be quintiles: Give 20% as F, 20% as D, etc.

Ha! Nice post. Write he who so far has avoided grading.

Blaise raises the obvious point about normalizing with the normal. Of course, you could set the cut-offs to be anywhere along the curve. And the square root idea is really just doing this, but in an obscure way.

The national exams in England and Wales use to be graded like this. At that level I think it might make sense, but for a single class perhaps less so, as there will be variation between classes (e.g. one year happens to be particularly stupid).

I’ve used a few of those. Most often I will add enough points to bring the top grade up to 95% if it wasn’t already. If some of the low grades are very low I will apply a linear function to the grades that raises the lowest grade to 60% and the highest to 95%. I very much like the square root curve, though… and may try it out on this semester’s finals if they need some help.

In using curves one has to be careful. Are you sure the students comprise a normal distribution? Would Business/English Lit/Anthropology majors, who presumably aren’t highly represented in the class, do as well as pre-med students if they were all required to take it?

I always thought the stairs method was the other way around. The papers that fall the farthest are assumed to be the heaviest (due to more writing or extra pages). They obviously put in more work, so they get the A.

Chris,

Yeah, that makes sense. I just always thought that the best scores should be near the top, both literally and figuratively. The other advantage of the stair method is that you can adjust the median just by throwing with more or less force.

From someone who has never had to grade, and therefore doesn’t care how much work it would be…

You could try a “weight your own grade” twist to really mess with their heads… you get more points for being confident in your correct answers… but only if that *wasn’t* the most popular starred question.

That is, if you’re right you get more points for being confident… but not if it’s an easy question that everybody agrees on. Something about the value of confidence when one is going against popular opinion.

Maybe for a quiz it would be managable.

I wonder if it would make a difference if the students knew it was going to work this way beforehand, and could talk over strategies?

I think that any prof who on principle normalizes to 99% should make it their policy that if a factual error is found in the exam, then it becomes renormalized to 100%.