Berkeley Students Break Computer Sorting Record
By Robert Sanders, Public Affairs
One of the key benchmarks in the world of commercial computing is how fast you can sort data -- alphabetically, numerically or in other ways.
Berkeley undergraduates Joshua Coates and Spencer Low, along with graduate student Philip Buonadonna, recently broke the world record. Working with Berkeley's Millennium Project, they completed a standard benchmark sort, the Datamation sort, in 1.18 seconds -- less than half the time of the previous record holder, another campus group.
Compaq is rumored to have recently broken the previous record, too, but the new Berkeley time is even faster.
A trophy is given to the champion sorter each spring at the top meeting for database geeks, the SIGMOD meeting, or Special Interest Group on Management of Data of the Association for Computing Machinery. That trophy will go to Berkeley if no one tops its record before this year's conference to be held May 31 in Philadelphia.
Sorting is one of the most common operations done by computers. Banks, credit card companies and government agencies all rely on a fast sorting mechanism to keep data flowing and information available to their systems. In fact, it is estimated that one-third of all processing or CPU (central processing unit) time in the world is spent on sorting data.
"In commercial computing, that's where the money is at," said David Culler, professor of computer science and head of the Millennium Project, which sponsored the students' work.
The feat was achieved using a cluster of 16 off-the-shelf PCs, each with two Pentium processors and two hard disks, running the Windows NT 4.0 operating system. The computers were networked together using an experimental protocol developed by Intel, Microsoft and Compaq.
"It's really something that you can do this with commodity PCs -- they give you more bang for the buck," Coates said.
Most computer sorting today uses high-end machines, though clusters of computers are becoming more popular. Now most database companies produce software for clusters, said Culler.
"This shows that clusters are continuing to play an attractive role in sorting," he said. "The attractiveness of clusters is going to change the industry."
The Datamation sort has been used for more than a decade to determine the fastest sorter. It involves reading one million, 100-byte records off a disk, sorting them and then writing the data back to disk. This amount of data is about 100 Megabytes, or an estimated 20 million English words.
The previous record, 2.41 seconds, was set by the NOW (Network of Workstations) group at Berkeley in 1996. Recently, Compaq allegedly surpassed that Datamation sort record at 2.2 seconds.
In December Berkeley students cut that time nearly in half. When first proposed as a test of sorting systems in the mid 1980s, the sort took about 980 seconds.
The main goal of the Millennium Project is to demonstrate that you can take a large number of mass-market, commodity PCs and networks, harness them together with some special software, and get a powerful supercomputer for less than the current prices of supercomputers. Funded by Intel, maker of Pentium processors, its Web home page is now.cs.berkeley.edu/Millennium/.