跳转至

Memory Hierachy

Introduction

Memory

内存层次:

  • Register
  • Cache
  • Memory
  • Storage

存储技术:

  • Mechanical Memory
  • Electronic Memory
    • SRAM
    • DRAM
      • SDRAM
      • DDR
    • GDRAM
      • GDDR
    • HBM
    • EPPROM
      • NAND
      • NOR
  • Optical Memory

Cache Concept

Cache: a safe place for hiding or storing things. (现在也不安全)

  • Cache Hit/Miss: When the processor can/cannot find a requested data item in the cache

    Cache Miss 会带来额外的开销:由 Latency, Bandwith 决定。

  • Cache Block/Line: A fixed-size collection of data containing the requested word, retrieved from the main memory and placed into the cache.

  • Cache Locality:

    • Temporal locality: need the requested word again soon

      访问过这个数据,之后很可能再次访问这个数据。

    • Spatial locality: likely need other data in the block soon

      访问了这个位置,之后很可能访问下一个位置。

36 terms of Cache

Four Questions for Cache Designers

这部分内容可见计组笔记

Caching is a general concept used in processors, operating systems, file systems, and applications.

  • Q1: Where can a block be placed in the upper level/main memory? (Block placement)
    • Fully Associative, Set Associative, Direct Mapped
  • Q2: How is a block found if it is in the upper level/main memory? (Block identification)
    • Tag/Block
  • Q3: Which block should be replaced on a Cache/main memory miss? (Block replacement)
    • Random, LRU,FIFO
  • Q4: What happens on a write? (Write strategy)
    • Write Back or Write Through (with Write Buffer)

Q1: Block Placement

  • Direct mapped

    一个块在 cache 中有一个固定的位置(通常通过取模得到)。

  • Fully associative

    块可以放在 cache 里的任意位置。(不好找)

  • Set associative

    • 块可以在一个组里的任何位置,组里可以放若干个块。
    • 直接映射相当于一路组相联,全相联相当于 n 路组相联(n 是 cache 的块数)

一般情况,\(n\leq 4\)

Q2: Block Identification

Q3: Block Replacement

  • Random replacement - randomly pick any block
  • Least-Recently Used (LRU) - pick the block in the set which was least recently accessed

    需要额外的位数来记录访问的时间。一般我们用的是近似的 LRU。

  • First In, First Out (FIFO) - Choose a block from the set which was first came into the cache

Strategy of Block Replacement

Suppose:

  • Cache block size is 3, and access sequence is shown as follows.

    2, 3, 2, 1, 5, 2, 4, 5, 3, 4

  • FIFO, LRU and OPT are used to simulate the use and replacement of cache block. (OPT 是一种理想情况,用来衡量算法性能)

    • FIFO

    • LRU

    • OPT

Hit rate is related to the replacement algorithm, the access sequence, the cache block size.

Stack replacement algorithm

有些算法随着 N 增大命中率非下降,有些算法随着 N 增大命中率反而会下降。
我们把随着 N 增大命中率非下降的算法称为 stack replacement algorithm。

\(B_t(n)\) represents the set of access sequences contained in a cache block of size \(n\) at time \(t\).

  • \(B_t(n)\) is the subset of \(B_t(n+1)\).

LRU replacement algorithm is a stack replacement algorithm, while FIFO is not.
For LRU algorithm, the hit ratio always increases with the increase of cache block.

Using LRU

用栈来模拟 LRU,栈顶是最近访问的,栈底是最久未访问的,每次要替换的时候,替换栈底的元素。通过下面的图可以快速看到栈大小为 n 时的命中率。

LRU Implementation - Comparison Pair Method

如何只通过门和触发器来实现 LRU 算法?—— Comparison Pair Method

  • Basic idea

    Let each cache block be combined in pairs, use a comparison pair flip-flop to record the order in which the two cache blocks have been accessed in the comparison pair, and then use a gate circuit to combine the state of each comparison pair flip-flop, you can find the block to be replaced according to the LRU algorithm.

    让任何两个 cache 块之间两两结对,用一个触发器的状态来代表这两个块的先后访问顺序(比如 1 表示 A 刚被访问,0 表示 B 刚被访问)。通过门电路对触发器的状态进行逻辑组合,找到最久未被访问的块。

Comparison Pair Method

这里有 3 个 cache blocks A, B, C。那么我们需要 3 个触发器来记录之间的状态。假设 \(T_{AB}=1\) 表示 A 被更近访问,\(T_{AC}, T_{BC}\) 同理。

  • Hardware usage analysis

    假设有 p 个 cache blocks, 我们需要 \(C_p^2=p\cdot (p-1)/2\) 个触发器。
    \(p\) 超过 8 时,需要的触发器过多,这个算法就不适用了。

Q4: Write Strategy

  • Write Hit

    • Write Through:直接写回到内存。

      写到内存的时间较长,这个过程需要 Write Stall,或者使用 Write Buffer

    • Write Back:在 Cache 中写,同时通过一个额外的 dirty bit 表示这个块已经被修改。

  • Write Miss

    • Write Allocate:将要写的块先读到 Cache 中,再写。
    • Write Around:直接写到内存。
  • In general, write-back caches use write-allocate , and write-through caches use write-around.
Example

Memory System Performance

这部分也可见计组笔记

How to improve

  • Reduce the miss penalty
  • Reduce the miss rate
  • Reduce the time to hit in the cache
  • Reduce the miss penalty and miss rate via parallelism

Virtual Memory

物理内存有限,虚拟内存让用户体验到一个抽象的更大的内存。

  • Why virtual memory?

    可以让进程使用不连续的物理内存空间(虚拟地址上是连续的);更好地隔离不同进程。

  • virtual-physical address translation

  • memory protection/sharing among multi-program

Virtual Memory = Main Memory + Secondary Storage

  • Virtual Memory Allocation

    • Paged virtual memory

      page: fixed-size block

    • Segmented virtual memory

      segment: variable-size block

Paging vs Segmentation

分页式的易于实现,方便替换。现在常用段页式结合,或者纯页式。

How virtual memory works?

Cache 的四个问题在虚拟内存中都有对应。

  • Q1. Where can a block be placed in main memory?

    缺失代价很高,因此我们采用全相联的方式,以降低 miss rate。

  • Q2. How is a block found if it is in main memory?

    虚拟地址分两部分,偏移量和页号。页号是页表的索引。

  • Q3. Which block should be replaced on a virtual memory miss?

    Least Recently Used (LRU) block, with use/reference bit.

  • Q4. What happens on a write?

    Write-back strategy, with diry bit.

Page Table

  • Page tables are often large

    e.g. 32-bit virtual address, 4KB pages, 4 bytes per page table entry.
    page table size: \((2^{32}/2^{12}) \times 2^2 = 2^{22}\) bytes = \(4\) MB

  • Logically two memory accesses for data access:

    • one to obtain the physical address from page table;
    • one to get the data from the physical address;

正常来说页表需要两次内存访问,访问效率低下,因此我们需要 cache page table,即 TLB。

Translation lookaside buffer (TLB)

  • tag: portions of the virtual address (VPN);
  • data: a physical page frame number (PPN), protection field, valid bit, use bit, dirty bit;

Example

发送 tag (VPN) 尝试匹配,并看访问类型是否违规。如果匹配成功,就把对应的 PPN 送到 Mux,将偏移量加上 PPN 得到物理地址。

Page Size Selection

  • Pros of larger page size

    • Smaller page table, less memory (or other resources used for the memory map);

      页更少,所以页表更小。

    • Larger cache with fast cache hit;

      页更大,所以 cache 命中的时间更短(因为我们需要遍历的页更少)。

    • Transferring larger pages to or from secondary storage is more efficient than transferring smaller pages;

      一次搬运更多的数据,所以更高效,小页可能需要搬运多次。

    • Map more memory, reduce the number of TLB misses;

      TLB miss 次数更少。

  • Pros of smaller page size

    • Conserve storage

      When a contiguous region of virtual memory is not equal in size to a multiple of the page size, a small page size results in less wasted storage.

      减少对内存的使用,内部碎片更少。

Use both: multiple page sizes

Address Translation

Summary

Summary

  • Memory hierarchy

    • From single level to multi level
    • Evaluate the performance parameters of the storage system (average price per bit C; hit rate H; average memory access time T)
  • Cache basic knowledge

    • Mapping rules
    • Access method
    • Replacement algorithm
    • Write strategy
    • Cache performance analysis
  • Virtual Memory (the influence of memory organization structure on Cache failure rate)

评论