文档介绍:DDM - A Cache-Onlv
Memory Architecture
Erik Hagersten, Anders Landin, and Seif Haridi
Swedish Institute puter Science
ultiprocessors providing a shared memory view to the programmer are
M typically implemented as such - with a shared memory. We introduce
an architecture with large caches to reduce latency work load.
Because all system memory resides in the caches, a minimum number work
accesses are needed. Still, it presents a shared-memory view to the programmer.
Single bus. Shared-memory systems based on a single bus have some tens of
processors, each one with a local cache, and typically suffer from bus saturation.
A cache-coherence protocol in each cache snoops the traffic on mon bus
and prevents inconsistencies in cache contents.' Computers manufactured by
Sequent and Encore use this kind of architecture. Because it provides a uniform
access time to the whole shared memory, it is called a uniform memory architecture
(UMA). The contention for mon memory and mon bus limits the
scalability of UMAs.
Distributed. Computers such as the BBN Butterfly and the IBM RP3 use an
architecture with distributed shared memory, known as a nonuniform memory
architecture (NUMA). Each processor node contains a portion of the shared
memory, so access times to different parts of the shared address space can vary.
NUMAs often works other than a single bus, and work delay can
vary to different nodes. The earlier NUMAs did not have coherent caches and left
A new architecture has the problem of maintaining coherence to the programmer. Today, researchers are
the programming striving toward coherent NUMAs with directory-based cache-coherence proto-
col~.~By statically partitioning the work and data, programmers can optimize
paradigm of shared- programs for NUMAs. A partitioning that enables processors to make most of
their accesses to their part of the shared memory achieves a better scalability than
memory