Memory Hierarchy
Chip multiprocessors (CMP) have emerged as the de facto processor architecture with the number of cores increasing rapidly in recent years. Because of the large speed gap between on chip memory access time and off chip access time, modern CMPs employ large and multilevel cache hierarchy on chip. A typical multi core CMP consists of tens or hundreds of mesh-connected processor tiles each of which has a CPU core, a private L1 cache, a router, and an L2 cache slice. L2 cache slices of all processor tiles collectively form a large level 2 cache that can be either private or shared cache. Private L2 caches provide low on-chip latency but waste cache capacity. To keep cache coherence among the private L2 caches, the space overhead is prohibitively large. Shared L2 caches utilize cache capacity better and hence low off chip miss rate. However, because of non uniform distances between requesting cores and shared L2 slices in the mesh interconnection network, on chip cache access latencies vary greatly and can be very large due to wire delays. Extensive research has been reported in the literature dealing with such non uniform cache architectures (NUCA). A number of innovative cache architectures have been developed in this lab including: