Physical Storage Systems¶
约 1106 个字 6 张图片 预计阅读时间 4 分钟
Abstract
- Classification of Physical Storage Media
- Storage Hierarchy
- Magnetic Disks
- Disk Interface Standards
- Performance Measures of Disks
- Optimization of Disk-Block Access
- Flash Storage & SSD
- Storage Class Memory(NVM)
Storage Hierarchy¶
- volatile storage( 易失存储 )
 loses contents when power is switched off.
- non-volatile storage(非易失存储)
 Contents persist even when power is switched off.
主要从 speed, cost, reliability 衡量
 
 从高层往低层走,存储设备变得更慢,更便宜和更大。
- primary storage: Fastest media but volatile (cache, main memory).
- secondary storage: next level in hierarchy, non-volatile, moderately fast access time
 also called on-line storage
- tertiary storage: lowest level in hierarchy, non-volatile, slow access time
 also called off-line storage
 常用来备份
NVM (non-volatile memory) 访问和内存一样,以字节寻址,而且掉电能保持数据。
Magnetic Disks¶
 
 一个磁盘有上十万个 track( 磁道 ), 一个磁道又有上千个 sector( 扇区,是计算机和磁盘交换数据的最小单位 ).
arm assemly 用来寻道,读写头共进退,寻找数据在哪个磁道上。
等对应扇区旋转到读写头,才开始传输数据。
同样磁道组成的柱面。对于大文件,最好存在同一个柱面上,这样可以并行读写。
- Read-write head
- Surface of platter divided into circular tracks(磁道)
- Each track is divided into sectors(扇区)
- To read/write a sector- disk arm swings to position head on right track
- platter spins continually; data is read/written as sector passes under head
 
- Cylinder(柱面) i consists of ith track of all the platters
- Disk controller( 磁盘控制器 )– interfaces between the computer system and the disk drive hardware.
Performance Measures of Disks¶
- Access time( 访问时间 ) – the time it takes from when a read or write request is issued to when data transfer begins.    Consists of: - Seek time(寻道时间)– time it takes to reposition the arm over the correct track. - Average seek time is ½ the worst case seek time.
- 4 to 10 milliseconds on typical disks
 
- Rotational latency(旋转延迟) – time it takes for the sector to be accessed to appear under the head. - Average latency is ½ of the worst case latency.
- 4 to 11 milliseconds on typical disks (5400 to 15000 r.p.m.)
 
 
- Seek time(寻道时间)– time it takes to reposition the arm over the correct track. 
- Data-transfer rate(数据传输率) – the rate at which data can be retrieved from or stored to the disk.
内存传输是以块为单位的。即使是想要访问一个 byte, 也需要把这个 byte 所在的 4k 内存读进来。
- 
Disk block is a logical unit for storage allocation and retrieval - Smaller blocks: more transfers from disk
- Larger blocks: more space wasted due to partially filled blocks
 
- 
Sequential access pattern( 顺序访问模式 ) 
 连续的读写请求只需要第一次访问磁盘
- 
Random access pattern(随机访问模式) 
 慢,希望尽量多一些顺序访问。
 可以用一个日志把要修改的数据记录下来,后面再进行修改,尽量用顺序访问替换随机访问。
- 
I/O operations per second (IOPS ,每秒 I/O 操作数 ) 
 Number of random block reads that a disk can support per second.
 每秒可以支持随机读的次数。
- Mean time to failure (MTTF,平均故障时间 ) 
 the average time the disk is expected to run continuously without any failure.
Optimization of Disk-Block Access¶
- Buffering: in-memory buffer to cache disk blocks
 数据读进来就丢,比较可惜,所以我们把它放在一个地方,万一后面需要使用可以不用再读。
- Read-ahead(Prefetch): Read extra blocks from a track in anticipation that they will be requested soon 
 预取,读某块时预测邻近几块也会被访问,于是就一起取到内存中。要有依据地预取,不然无用的数据会占用缓存。
- 
Disk-arm-scheduling algorithms re-order block requests so that disk arm movement is minimized 
 elevator algorithm
   
- 
File organization - Allocate blocks of a file in as contiguous a manner as possible
 预先分配得到的内存是连续的
- Files may get fragmented - Sequential access to a fragmented file results in increased disk arm movement
- Some systems have utilities to defragment the file system, in order to speed up file access
 
 
- Allocate blocks of a file in as contiguous a manner as possible
- Nonvolatile write buffers(非易失性写缓存)
 speed up disk writes by writing blocks to a non-volatile RAM buffer immediately
 把要写的数据先写到一个快速的非易失的缓存里,如 NVM. 这时上面的程序可以继续执行了, NVM 再择机将数据写回到磁盘。
- Log disk(日志磁盘)
 a disk devoted to writing a sequential log of block updates
Flash Storage¶
- NAND flas- requires page-at-a-time read (page: 512 bytes to 4 KB)
 顺序读写和随机读写差不多
- Page can only be written once
 像黑板,写了数据如果要再写需要把之前的擦掉。
 
- requires page-at-a-time read (page: 512 bytes to 4 KB)
- SSD(Solid State Disks)
 Use standard block-oriented disk interfaces, but store data on multiple flash storage devices internally
 
 可能有这样的情况:我们反复读写、擦去某几个块,这会导致它们坏的很快。
- Remapping of logical page addresses to physical page addresses avoids waiting for erase
- 
Flash translation table tracks mapping - also stored in a label field of flash page
- remapping carried out by flash translation layer
   
 
- 
wear leveling( 磨损均衡 ) 
 evenly distributed erase operators across physical blocks
 
 Persistence 即掉电是否能保持原数据。