KPIs in Education: Rank versus Raw Numbers
Educators have been trying for decades to agree on key performance indicators for public schools or even whether KPIs can be used at all. One entrepreneur who envisioned a business intelligence-like service for schools thinks he's figured it out.
- By Ted Cuzzillo
- February 20, 2008
When Steve Rees began work on a business intelligence-like system to rank schools, he looked around to see how others had learned to read any key performance indicators. Soon he felt as if he'd arrived at a crime scene where no one would talk.
For decades, the mere idea of standard metrics has flown in the face of long-standing traditions -- in particular, methods of teacher evaluation. The result has been a culture clash between "sensate, feeling types and the new, racetrack-bettor types," said Rees. The problem should ring in the ears of those in business who have grappled with performance metrics.
Today, his company, San Francisco-based School Wise Press, offers school administrators a simple bargain: at a good price, the company provides all school-records disclosure mandated by "No Child Left Behind" and other laws, plus smart reports that reveal each school's relative vital signs.
Simple disclosure was easy, but comparisons were difficult. School districts had traditionally regarded themselves as statistical islands, and test scores reflected widely varied assumptions. "I was not about to proceed, cocksure and full of certainty," Rees said, "into a field where a lot of good people who had done smart work before had failed to get people to use" new methods of measuring.
"I went to conference after conference," said the data-savvy veteran of magazine publishing management. "There was almost nothing to be shared."
He admits that "what I learned was there were damn good reasons" for the apparent failure. He found, for example, a general lack of skill to deal with quantitative measurement. Teachers balked at systems they could see would produce invalid results, which could taint their schools' reputation and limit their own salaries. "This is hard-ball stuff," said Rees.
Eventually, he did find insight and successes, such as from one notable statistician. William Sanders had begun work in 1982 on education metrics while a professor of statistics at the University of Tennessee. On his way to class one day, he read a newspaper article on why metrics couldn't be used to judge teachers. He recalls thinking, "There may be reasons, but those are not good reasons." He now heads the SAS education value-added assessment program based in North Carolina.
The system he developed starts with one question: how much has each student progressed? A student who starts at a low level and progresses well is more impressive than one who starts high and progresses poorly, no matter how the each one's final place compares with the other. The system tests and tracks each student in four subjects over every grade. It uses all data available on each child, no matter how sparse, to surmount the "selection bias" that can occur when low-scoring students have more fractured records than those that score high.
The system also dampens errors in measurement. "When any one kid takes any one test on any one day, there's a huge error of measurement," Rees said. To overcome that error, the system runs "massive multivariate longitudinal analysis," often putting a "humongous" load on the processor, to exploit the relationship among all test scores for all students over all subjects and grades.
Transience is another issue. "Kids do not stay put," he said. "Kids move, kids get sick. Kids from some neighborhoods move more than from other neighborhoods." If a system can't handle the fractured records, results will be distorted and trust will deteriorate. Ultimately, after all the processing, scores tumble out at the end that educators can trust.
Critics complain that standardized measurement isn't necessary. Real educators know just by observation whether a teacher is effective, they say. Sanders responds that in about two-thirds of cases that's true, but in many cases it's not.
He cites two extreme examples: a classroom that dazzles, with activities, colors, and interested, involved kids -- who prove in tests that they're not learning much. On the other extreme is the teacher who "doesn't give a hoot what the principal thinks of her, and she closes the door and kids learn like you wouldn't believe."
To those who question whether standardized testing is necessary, he responds, "Compared to what?" If judgments of teachers' and kids' progress are left to subjective judgment, there's a risk of political bias and favoritism. Quick observation can fail to properly assess the whole picture.
Sanders said he's under constant pressure to simplify his system. "People, under the banner of transparency, argue that it can be simpler, but they don't realize the assumptions that they're sweeping under the rug."
Simpler means unreliable, he said, which would produce results that would not be trusted for long. "I've spent a quarter of a century working on this damn stuff," he said. "People think that this is simple and that they can do this on an Excel spreadsheet."
He fears that if states and others begin using simplistic, unreliable methodology, it won't be long before people begin to see through it. "They'll say, 'We tried that value-added stuff and it doesn't work.'"
Though Rees could not base his methodology on Sanders' (Rees' system uses school level data while Sanders' uses student-level data), the influence is strong. "He enabled us to avoid misusing testing data and to aim for proper interpretations of change over time," wrote Rees in an e-mail.
In Rees's research, he said, "I learned that the relative rank of things is sometimes more important than the absolute values." By ranking, such as in deciding which baseball teams get to play in a division, or when U.S. News and World Report issues its popular college rankings, mere numbers gain meaning. "People regard them as useful handles."
If lists work, he concluded, why not make intelligent lists for specific customers for specific uses?
School Wise Press has already seen the effects of its comparative ranking. One client with 30 schools in California had a group of middle schools on the federal watch list. When the superintendent pushed for improvement, the principals roared back, "You give me the kids down the street and I'll give you the results like down the street."
In fact, the teachers and principals weren't teaching to standards set by the state. With School Wise rankings, the superintendent was able to show that those schools ranked at the bottom by every measure available in a county with roughly similar students and teachers.
"The clustering of those troubled schools at the bottom of a number of lists," said Rees, "was enough evidence to let the superintendent compel the principals to own the problem."
Still, he is cautious. "I've been careful not to be too brash or too quick," he said. So far, School Wise Press has been rolled out in pilot releases. "Time and place and circumstance matter immensely in giving the user control," he said. "It's just critical."