Fundamentals of Computer Design¶
Introduction¶
Von Neumann Structure
Classes of Computers
- Desktop computers
PC: Personal Computers - Servers computers
更强大的处理速度,容量(用于冗余备份) - Embedded computers
不能随意安装第三方应用的,与系统一体,称为嵌入式(不太符合国情x - Personal Mobile Devices
如手机,iPad - Supercomputer
Classed by Flynn
按照指令流和数据流进行分类
- SISD
单指令流单数据流,如早期的单核 PC - SIMD
一条指令有多条数据流动(如向量数据),方便做流水线 - MISD
多指令流单数据流,并不实际存在 - MIMD
多指令流多数据流
Performance
- Alogrithm
- Programming language, compiler, architecture
- Processor and memory system
- I/O system (including OS)
Summary
According to the process of using data, computers are developing in three fields:
- speed up processing (parallel)
- speed up transmission (accuracy)
- Increase storage capacity and speed up storage (reliability)
Performance¶
这里有很多因素会影响性能:体系结构,硬件实现,编译器,OS...
We need to be able to define a measure of performance.
- Single users on a PC -> a minimization of response time
- Large data -> a maximization of throughput
为了衡量性能,我们有响应时间和吞吐量两个指标:
- Latency (Response time 响应时间)
一个事件开始到结束的时间 - Throughput (bandwidth 带宽)
给定时间范围内完成了多少的工作量
这部分可见计组笔记
The main goal of architecture improvement is to improve the performance of the system.
Technology Trend¶
The improvement of computer architecture
- Improvement of input / output
- The development of memory organization structure
- Two directions of instruction set development
- CISC / RISC
- Parallel processing technology
不同层次、粒度的并行
Quantitative approaches¶
CPU Performance¶
- CPU 执行时间 = CPU 时钟周期数 * CPU 时钟周期时间 = CPU 时钟周期数 / CPU 时钟频率
- IC:Instruction Count,指令数
- CPI:Cycle Per Instruction,每条指令的时钟周期数
- 由 CPU 硬件决定
- 不同的指令也会有不同的 CPI,平均 CPI 取决于指令的组合方式
- CPI = CPU 时钟周期数 / IC
- CPU 执行时间 = IC * CPI / CPU 时钟频率
Amdahl's Law¶
Amdahl's Law: the performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used.
当提升系统性能时,有多大的收益受限于被提升的部分所占的运行时间比例
\(T_{improved}=\dfrac{T_{affected}}{\text{improvement factor}}+T_{unaffected}\)
Make the common case fast!
也被用来分析可行性
-
加速比
\[ \begin{align*} \text{Speedup} & =\dfrac{\text{Performance for entire task}_\text{using Enhancement}}{\text{Performance for entire task}_\text{without Enhancement}}\\ & = \dfrac{\text{Total Execution Time}_\text{without Enhancement}}{\text{Total Execution Time}_\text{using Enhancement}} \end{align*} \]加速比 Sp = 改进后的性能 / 改进前的性能 = 改进前的时间 / 改进后的时间
-
执行时间
\(T_{new} = T_{old}\times \left((1-f)+\dfrac{f}{Sp}\right)\)
\(f\) 指改进的部分所占的比例 - \(Sp_{overall} = \dfrac{T_{old}}{T_{new}} = \dfrac{1}{(1-f)+\dfrac{f}{Sp}}\)
- 其中 \(Sp\) 为被优化部分的加速比,\(Sp_\text{overall}\) 为整体加速比,\(f\) 为被优化部分所占的运行时间比例
Great Architecture Ideas¶
- 摩尔定律
- 每过 18-24 个月,集成电路的晶体管数量将增加一倍
- 使用抽象来简化设计
- 让最常见的情况更快
- 通过并行来提高性能
- 由很多级别的并行,比如指令集并行、进程并行等
- 通过流水线来提高性能
- 将任务分为多段,让多个任务的不同阶段同时进行
- 通常用来提高指令吞吐量
- 通过预测来提高性能
- 使用层次化的内存
- 让最常访问的数据在更高层级,访问更快
ISA¶
- Instruction Set Architecture
Instruction Set Design Issues
- Where are operands stored?
registers, memory, stack, accumulator - How many explicit operands are there? (Classification of ISAs)
0, 1, 2, or 3 - How is the operand location specified? (Addressing Modes)
register, immediate, indirect, ... - What type & size of operands are supported? (Data Representation)
byte, int, float, double, string, vector, ... - What operations are supported? (Types of Instructions)
add, sub, mul, move, compare, ...
Basic Principles
- Compatibility
- Versatility
- High efficiency
- Security
ISA Classification Basis¶
这里主要指的是从哪里取数,存到哪里以及计算的规则。
- stack First operand removed from second op replaced by the result.
- accumulator
- One implicit operand: the accumulator; one explicit operand: mem location
- Accumulator is both an implicit input operand and a result
- register
- Register-memory architecture
任何指令都可以访存 - Load-store architecture
只有 load/store 的时候才能访存,其他时候都是基于寄存器操作
- Register-memory architecture
GPR Classification¶
A+B
More: try to do with \(D=A*B-(A+C*B)\)
GPR 速度快,但是 GPR 太多也会有资源的浪费和性能下降(如寻找对应的寄存器)