Cache Memory

Computers store data using a similar hierarchy. When applications start, data and instructions are moved from the slow hard disk into main memory (dynamic RAM, or DRAM), where the CPU can get them more quickly. DRAM acts as a cache for the disk.

Levels Upon Levels

Although DRAM is faster than the disk, it's still pokey. So data that's needed more often is moved up to the next faster memory, called the Level 2 (L2) cache. This may be located on a separate, high-speed static RAM chip next to the CPU, but new CPUs usually incorporate the L2 cache directly on the processor chip.

More

Computerworld
QuickStudies

At the highest level, the most frequently used information - say, the instructions in a loop which execute repeatedly - is stored directly on a special section of the processor chip, called Level 1 (L1) cache. This is the fastest memory of all.

Intel Corp.'s Pentium III processor has 32KB of L1 cache on the processor chip and either 256KB of L2 on-chip or 512KB of L2 off-chip. The L2 cache on the CPU chip can be accessed four times faster than if it were on a separate chip.

When the processor needs to execute an instruction, it looks first in its own data registers. If the needed data isn't there, it goes to the L1 cache and then to the L2 cache. If the data isn't in any cache, the CPU calls out to the main RAM. It might not even be there, in which case the system has to retrieve it from the disk.

When the CPU finds data in one of its cache locations, it's called a "hit"; failure to find it is a "miss." Every miss introduces a delay, or latency, as the processor tries a slower level. In a well-designed system with software algorithms that prefetch data before it's requested, the hit rate can reach 90%.

For high-end processors, it can take from one to three clock cycles to fetch information from L1, while the CPU waits and does nothing. It takes six to 12 cycles to get data from an L2 on the processor chip, and dozens or even hundreds of cycles for off-CPU L2.

Caches are more important in servers than in desktop PCs because servers have so much traffic between processor and memory generated by client transactions. Intel turned a 50-MHz, 80486-based PC into a server in 1991 by adding a 50-MHz cache to the processor chip. Although the bus connecting processor and memory ran only at 25 MHz, this cache let many programs run entirely within the 486 chip at 50 MHz.

This hierarchical arrangement of memory helps bridge a widening gap between processor speeds, which are increasing at roughly 50% per year, and DRAM access rates, which are climbing at only 5% per year. As this performance mismatch grows, hardware makers will add a third and possibly fourth level of cache memory, says John Shen, a professor of electrical and computer engineering at Carnegie Mellon University in Pittsburgh.

Indeed, later this year, Intel will introduce Level 3 (L3) cache in its 64-bit server processors, called Itanium. The 2MB or 4MB cache will connect to the processor over a bus that runs as fast as the processor - 800 MHz.

IBM is also developing its own L3 cache for 32- and 64-bit Intel-based Netfinity servers. At first, it will be placed on the memory controller chip and will be available toward the end of next year, says Tom Bradicich, director of Netfinity architecture and technology.

IBM's L3 will be a system-level cache available to the server's four to 16 processors. Intel's L3 can help only the processor to which it's attached, but IBM says its L3 can improve throughput for the whole system. Bradicich says IBM's L3 also will aid high-availability computing for e-commerce by enabling main memory swap-outs and upgrades as the system is running.

Bigger Isn't Necessarily Better

The frequency of cache misses can be reduced by making caches bigger. But big caches draw a lot of power, generate a lot of heat and reduce the yield of good chips in manufacturing, Shen says.

One way around these difficulties may be to move the cache-management logic from hardware to software. "The compiler could potentially analyze program behavior and generate instructions to move data up and down the memory hierarchy," Shen says.

Software-managed caches are currently confined to research labs. Potential obstacles include the need to rewrite compilers and recompile legacy code for every new CPU generation, Shen says.

Where's my data?
When the CPU needs data, it first looks in its own data registers. If the data isn't there, the CPU looks to see if it's in the nearby Level 1 cache. If that fails, it's off to the Level 2 cache. If it's nowhere in cache, the CPU looks in main memory. Not there? The CPU gets it from disk. All the while, the clock is ticking, and the CPU is sitting there waiting.

See additional Computerworld QuickStudies

Copyright © 2000 IDG Communications, Inc.

It’s time to break the ChatGPT habit
Shop Tech Products at Amazon