In today's computers, moving data to and from main memory consumes so much time and energy that
microprocessors1 have their own small, high-speed memory banks, known as "caches," which store frequently used data. Traditionally, managing the caches has required fairly simple algorithms that can be hard-wired into the chips. In the 21st century, however, in order to meet consumers' expectations for
steadily2 increasing computational power, chipmakers have had to begin equipping their chips with more and more cores, or processing units. And as cores
proliferate3(增殖,扩散), cache management becomes much more difficult.
Daniel Sanchez, an assistant professor in MIT's Department of Electrical Engineering and Computer Science, believes that it's time to turn cache management over to software. This week, at the International Conference on Parallel Architectures and
Compilation4 Techniques, Sanchez and his student Nathan Beckmann presented a new system,
dubbed5 Jigsaw6, that monitors the computations being performed by a multicore chip and manages cache memory accordingly.
In experiments simulating the execution of hundreds of applications on 16- and 64-core chips, Sanchez and Beckmann found that Jigsaw could speed up execution by an average of 18 percent -- with more than twofold improvements in some cases -- while actually reducing energy consumption by as much as 72 percent. And Sanchez believes that the performance improvements offered by Jigsaw should only increase as the number of cores does.
Location, location, location
In most multicore chips, each core has several small, private caches. But there's also what's known as a last-level cache, which is shared by all the cores. "That cache is on the order of 40 to 60 percent of the chip," Sanchez says. "It is a significant fraction of the area because it's so crucial to performance. If we didn't have that cache, some applications would be an order of magnitude slower."
Physically7, the last-level cache is broken into separate memory banks and distributed across the chip; for any given core, accessing the nearest bank takes less time and consumes less energy than accessing those farther away. But because the last-level cache is shared by all the cores, most chips assign data to the banks
randomly8.
Jigsaw, by contrast, monitors which cores are accessing which data most frequently and, on the fly, calculates the most efficient assignment of data to cache banks. For instance, data being used exclusively by a single core is stored near that core, whereas data that all the cores are accessing with equal frequency is stored near the center of the chip, minimizing the average distance it has to travel.
Jigsaw also varies the amount of cache space
allocated9 to each type of data, depending on how it's accessed. Data that is reused frequently receives more space than data that is accessed infrequently or only once.
In principle,
optimizing10 cache space allocations requires evaluating how the chip as a whole will perform given every possible allocation of cache space to all the computations being performed on all the cores. That calculation would be prohibitively time-consuming, but by ignoring some particularly
convoluted11 scenarios12 that are extremely unlikely to arise in practice, Sanchez and Beckmann were able to develop an approximate
optimization13 algorithm that runs
efficiently14 even as the number of cores and the different types of data increases dramatically.