Instructions: Language of the Computer¶
Introduction¶
- Language of the machine
- Instructions (Statement)
- Instruction Set (Syntax)
- Design goal
- Maximize performance
同样资源的情况下性能大 - Minimize cost
同样性能的情况下成本低 - Reduce design time
指令集简单,易于理解
- Maximize performance
- 我们使用的是 RISC-V 架构
Instruction Characteristics
data:image/s3,"s3://crabby-images/8c663/8c66342de6a436d532481243b02fba2662652e7d" alt=""
指令集基本的结构:Operation 操作; Operand 操作数
- 不同指令集,指令的编码可以不同。如 000 表示加法,这也叫指令的 Encoding.
- 操作数位宽可以不同,可以是立即数/寄存器/内存。
冯诺依曼架构: 指令由数字方式被存储,可以进行读写。??
Operation¶
-
Every computer must be able to perform arithmetic.
- Only one operation per instruction
- Exactly 3 variables e.g.
add a, b, c
即 \(a\leftarrow b+c\)
注意结果放在第一个位置,这样易于解码
-
Design Principle 1 - Simplicity favors regularity.
Operands of the Computer Hardware¶
Register Operands¶
- Arithmetic instructions use register operands.
-
RISC-V has a \(32\times 64\)-bit register file
- Use for frequently accessed data
- 32-bit data is called a word. 64-bit is called a doubleword.
- we call them
x0
tox31
-
Design Principle 2 - Smaller is faster.
寄存器不是越多越好,多了之后访问慢。Name Register Name Usage Preserved or call? x0 0 The constant value 0 n.a. x1(ra) 1 Return address(link register) yes x2(sp) 2 Stack pointer yes x3(gp) 3 Global pointer yes x4(tp) 4 Thread pointer yes x5-x7 5-7 Temporaries no x8-x9 8-9 Saved yes x10-x17 10-17 Arguments/results no x18-x27 18-27 Saved yes x28-x31 28-31 Temporaries noW 为什么寄存器
x0
一直为 0Make the common fast. 因为经常有 0 参与计算,将其存在一个寄存器中,便于计算。
Memory Operands¶
- Data transfer instructions
- Load: Load values from memory to register
- Store: Store result from register to memory; store doubleword
- Memory is byte addressed.
-
RISC-V is Little Endian
Little vs Big Endian
小端:低位放在地址较小处;大端相反
RISC-V dose not require words to be aligned in memory
words align: 一个字是 4 字节,我们要求字的起始地址一定要是 4 的倍数。Memory Alignment
- 不对齐的好处是省空间
Memory Operand Example
(默认数组是双字的, h inx21
, base address of A in x22
)翻译为 RISC-V 代码得到 地址是以 byte 为单位,所以要偏移 \(8\times 8=64\) bytes.
load
和 store
是唯二可以访问存储器的指令。
Registers vs. Memory¶
- Registers are faster to access than memory
- Operating on memory data requires loads and stores
- Compiler must use registers for variables as much as
possible
编译器尽量使用寄存器存变量。只有在寄存器不够用时,才会把不太用的值放回内存。
Constant or Immediate Operands¶
Immediate: Other method for adding constant
- Avoids the load instruction
- Offer versions of the instruction
e.g.addi x22, x22, 4
- Design Principle 3 - Make the common case fast.
Summary
data:image/s3,"s3://crabby-images/8fad6/8fad6fb3f7ecbe6977d7bed7ddfa2ef997ff18ae" alt=""
- 为什么内存是 \(2^{61}\) 个 doublewords?
可以表示的地址有这么多,因为我们以 64 位寄存器为基址,可以表示的双字就是 \(2^{64}/2^3=2^{61}\) (这里 \(2^3\) 表示 8 个字节,即双字). 即我们的load
指令可以访问的范围有这么大。
Signed and Unsigned Number¶
Representing Instructions in the Computer¶
- All information in computer consists of binary bits.
- Instructions are encoded in binary
called machine code (机器码) - Mapping registers into numbers
0 for registerx0
, 31 for registerx31
. e.t.c. - RISC-V instructions
32 位指令编码。所有指令都是规则化的,即一部分是 opcode, 一部分是 operands 等等。
Summary of RISC-V architecture
data:image/s3,"s3://crabby-images/768c9/768c9ef3cfe63e66331e886366e81a32698ef7ab" alt=""
R-format¶
data:image/s3,"s3://crabby-images/f770b/f770bd489068ca90aaf3bddef5fcf247bffc0bbf" alt=""
- opcode: operaion code
- rd: destination register number
- funct3: 3-bit function code(additional opcode)
例如,我们加法减法可以做成一个 opcode, 然后利用 funct 进行选择。 - rs1/rs2: the first/second source register number
- funct7: 7-bit function code(additional opcode)
All instructions in RISC-V have the same length
Design Principle 4 - Good design demands good compromises
I-format¶
data:image/s3,"s3://crabby-images/20b28/20b28b025ae3aceafc038ebe9f326dff74e1a8e2" alt=""
- Immediate arithmetic and load instructions
e.g.addi
,ld
- rs1: source or base address register number
- immediate: constant operand, or offset added to base
address
将 rs2, funct7 合并了,得到 12 位立即数
S-format¶
data:image/s3,"s3://crabby-images/42f2a/42f2a9e1cdb4b4ea1d8f6dac759232af45c349d3" alt=""
- rs1: base address register number
- rs2: source opearand register number
- immediate:
Split so that rs1 and rs2 fields always in the same place.
Example
data:image/s3,"s3://crabby-images/11f42/11f42995ea8160e49fac1cb820aa0a5a3285f791" alt=""
data:image/s3,"s3://crabby-images/a7c12/a7c12028a1151a4005df2ad7febab38526f882be" alt=""
Stored Program Computer
data:image/s3,"s3://crabby-images/64746/647462d6b3eef6d2f77bacbb22b4e12b14208454" alt=""
- Instructions represented in binary, like data.
- Instructions and data stored in memory.
- Programs can operate on programs. e.g. compiplers, linkers.
- Binary compatibility allows compiled programs to work on different computers
Logical Operations¶
Operation | C | Java | RISC-V |
---|---|---|---|
Shift left | << | << | slli |
Shift right | >> | >>> | srli |
Bit-by-by AND | & | & | and, andi |
Bit-by-by OR | | | | | or, ori |
Bit-by-by XOR | ^ | ^ | xor, xori |
Bit-by-by NOT | ~ | ~ | - |
bitwise NOT 要用异或实现(与全 F 异或)
Shift¶
data:image/s3,"s3://crabby-images/ac796/ac796fd7b33f88532e5f71263f9693690e23976a" alt=""
- I 型指令
- 为什么还有
funct6
移位不需要这么多立即数,只要六位 (\(2^6=64\)) 即可。 - 左移 i 位相当于乘 \(2^i\), 右移 i 位相当于除 \(2^i\).
AND¶
data:image/s3,"s3://crabby-images/d12b7/d12b7bc23cad1d853cd90d31d8200fc0b8fe40be" alt=""
OR¶
data:image/s3,"s3://crabby-images/71a57/71a5788d56a2e2af354b793b3bec8564592305d3" alt=""
XOR¶
data:image/s3,"s3://crabby-images/3bd3a/3bd3a4cf06d136fd58403ef88d8ebd689e5833b1" alt=""
Instructions for making decisions¶
Branch instructions¶
beq reg1, reg2, Label
相等则跳转bne reg1, reg2, Label
不相等则跳转
Example
data:image/s3,"s3://crabby-images/bfb7d/bfb7d637a45de58d9f5aae37b7ae6dcca5c28bdf" alt=""
store 的立即数是作为数据的地址, beq 的立即数是作为运算的地址(加到 PC 上)因此二者的指令类型不同。
跳转的范围有限制,因为立即数只有 12 位。(PC 相对寻址,以当前程序位置为基准前后跳)
Loop
data:image/s3,"s3://crabby-images/be9ad/be9ad2fc296eea2c89c687092bb4e125aff0cc4f" alt=""
slt instruction¶
set on if less than.
slt x2, x3, x4 # x2=1 if x3 < x4
R 型指令
Example
data:image/s3,"s3://crabby-images/17066/17066d7a5fb7e781995d8f9aac3c72f5b931a782" alt=""
More Conditional Operations¶
blt rs1, rs2, L1
若rs1<rs2
则跳到 L1bge rs1, rs2, L1
若rs1>=rs2
则跳到 L1
Signed vs. Unsigned¶
默认是有符号数进行比较
- Signed comparison:
blt
,bge
- Unsigned comparison:
bltu
,bgeu
Case/Switch¶
Compiling a switch using jump address table
data:image/s3,"s3://crabby-images/6b547/6b547f19a1730a95f25882b7bab8ab5e5863aa1b" alt=""
data:image/s3,"s3://crabby-images/1c26a/1c26ae57c6ef396e28ae6936c868b36dfebaaa0c" alt=""
data:image/s3,"s3://crabby-images/11655/11655c283187ee4fb82d7fba15a59dc008fa703b" alt=""
\(x_6\) 是跳转表的基地址,\(x_7\leftarrow x_6+8*k\)
jalr x1, 0(x7)
把下一条指令的地址 PC+4放入
x1寄存器,随后跳向
[x7] + 0的地方。
这里我们
jalr x0, ...因为我们不能改变
x0` 寄存器,所以这里仅用作占位,并没有实际存储返回地址。
I 型指令
A basic block is a sequence of instructions with
- No embedded branches (except at end)
- No branch targets (except at beginning)
Supporting Procedures in Computer Hardware¶
Procedure/function --- be used to structure programs
为了完成特定任务。易于理解,可以复用。
调用函数的步骤
- Place Parameters in a place where the procedure can access them (in registers
x10~x17
)
传参 - Transfer control to the procedure
控制权给子程序 - Acquire the storage resources needed for the procedure
- Perform the desired task
- Place the result value in a place where the calling program can access it
- Return control to the point of origin (address in
x1
)
Procedure Call Instructions¶
- Procedure call: jump and link
jal x1, ProcedureLabel
- Address of following instruction put in
x1
- Jumps to target address
- Address of following instruction put in
- Procedure return: jump and link register
jalr x0, 0(x1)
- Like jal, but jumps to
0 + address in x1
- Use
x0
as rd (x0
cannot be changed) - Can also be used for computed jump
- Like jal, but jumps to
不能用 jal
跳回来,跳进函数的地址的是固定的, Label 一定。但是跳回来的地址不一定,要用 x1
存储才能跳回。
Using More Registers¶
- Registers for procedure calling
x10~x17
(a0~a7
): eight argument registers to pass parameters or return values
用来传参的x1
: one return address register to return to origin point
- Stack:Ideal data structure for spilling registers
- Push, pop
- Stack pointer (
sp
):x2
指向最栈顶,即最后一个有效数据所在的位置
- Stack grow from higher address to lower address
- Push:
sp = sp - 8
- Pop:
sp = sp + 8
- Push:
Compiling a leaf procedure
data:image/s3,"s3://crabby-images/56fa6/56fa6086f679c5503746a16298cb48bad5676f31" alt=""
data:image/s3,"s3://crabby-images/086a7/086a76093abe34b425c7eb9c4af9c4afb27337a8" alt=""
Name | Register Name | Usage | Preserved or call? |
---|---|---|---|
x0(zero) | 0 | The constant value 0 | n.a. |
x1(ra) | 1 | Return address(link register) | yes |
x2(sp) | 2 | Stack pointer | yes |
x3(gp) | 3 | Global pointer | yes |
x4(tp) | 4 | Thread pointer | yes |
x5-x7(t0-t2) | 5-7 | Temporaries | no |
x8(s0/fp) | 8 | Saved/frame pointer | yes |
x9(s1) | 9 | Saved | yes |
x10-x17(a0-a7) | 10-17 | Arguments/results | no |
x18-x27(s2-s11) | 18-27 | Saved | yes |
x28-x31(t3-t6) | 28-31 | Temporaries | no |
PC | - | Auipc(Add Upper Immediate to PC) | yes |
t0~t6
临时寄存器,不需要在函数中保存s0~s11
saved registers
标有 Preserved 表明我们需要在函数开始时保存该寄存器的值,并在离开函数前恢复寄存器的值。
data:image/s3,"s3://crabby-images/b1bcc/b1bcc5f2719fcbabbd75b0ea001dd3995b207d5b" alt=""
Nested Procedure¶
data:image/s3,"s3://crabby-images/addcb/addcb14772590a4dc210a05115b87660ffacb033" alt=""
Example
data:image/s3,"s3://crabby-images/225d9/225d9dd3ce257020d8a86a8c806665cbdc655a89" alt=""
data:image/s3,"s3://crabby-images/a3afb/a3afbca4e38cc96f3dade2e065cbb0b2fbd4a536" alt=""
Preserved or not
data:image/s3,"s3://crabby-images/588a5/588a5871153f0d8862357d70da865d54c858d7db" alt=""
寄存器一般靠堆栈保存, sp 靠加减保存。
Local Data on the Stack
data:image/s3,"s3://crabby-images/41ddf/41ddf35bb3d424ba14629a03bfe8822f63da183d" alt=""
Communicating with People¶
- Byte-encoded character sets
e.g. ASCII, Latin-1 - Unicode: 32-bit character set
e.g. UTF-8, UTF-16
编码中有不同长度的数据,因此也需要有不同长度的 load 和 store.
-
Load byte/halfword/word: Sign extend to 64 bits in rd
我们的寄存器是 64 位的,因此需要扩充。lb rd, offset(rs1)
lh rd, offset(rs1)
lw rd, offset(rs1)
ld rd, offset(rs1)
Example
同样是取 A[4] 的值,不同的数据类型 offset 不同。
char
为 4,short int
为 8,int
为 16. -
Load byte/halfword/word unsigned: 0 extend to 64 bits in rd
lbu rd, offset(rs1)
lhu rd, offset(rs1)
lwu rd, offset(rs1)
- Store byte/halfword/word: Store rightmost 8/16/32 bits
sb rs2, offset(rs1)
sh rs2, offset(rs1)
sw rs2, offset(rs1)
存储就不需要考虑扩充问题,我们不做运算,只是把对应部分放到对应位置。
offset 可以是 3. 因为 RISC-V 是可以不对齐的。(实际上 sh offset 一般是 2 的倍数, sw 是 4 的倍数)
Compiling a string copy procedure
data:image/s3,"s3://crabby-images/69d38/69d38ffd4a65cfdd89b568a9e9b047809ca5f7d4" alt=""
data:image/s3,"s3://crabby-images/6fe7e/6fe7efa711bb90422fd1a25f4406499230b107d7" alt=""
i 不应该分配给 s3, 分配给一个临时寄存器,就可以不用堆栈保存 s3 了。
对于一个 leaf procedure(不再调用其他 procedure) 编译器要尽可能用完所有的临时寄存器,再去用其他的寄存器。
为什么强调 leaf procedure? - 因为对于非 leaf 的函数,可能临时变量会被调用后的函数改变,这样就不能继续用了。
RISC-V Addressing for 32-Bit Immediate and Addresses¶
Wide Bit Immediate addressing¶
如何将一个寄存器初始化为一个任意的立即数。
lui reg, imm
可以把 20 位的常数放到寄存器中。(U-type)
data:image/s3,"s3://crabby-images/59879/5987926a813e71de2b625c671e8d35214b621d3f" alt=""
data:image/s3,"s3://crabby-images/27db9/27db91b41f69f2208feb7862db15037ea4d44a66" alt=""
注意这里,我们会把立即数放入寄存器的 [31:12] 位,低位会填充为 0.
Loading a 32-bit constant
data:image/s3,"s3://crabby-images/37aea/37aeabe9e1b688b1936f4438c3dc6311841aa7c3" alt=""
我们最终想放入寄存器的值是 32 位常数 0x003D0
. 先利用 lui
将高 20 位 976 放入寄存器中,随后利用加法指令加上 低 12 位,即 2304.
Branch Addressing¶
SB-type
data:image/s3,"s3://crabby-images/59b26/59b269062f03cfe2a01196d578ed36cc86630eee" alt=""
- PC-relative addressing
\(Target\ address = PC + Branch\ offset = PC + immediate \times 2\)
这里低位会默认补 0. 这样可以把地址范围扩大一倍。
Jump Addressing¶
UJ-type(只有 jal
指令)
20-bit immediate for larger range, 低位默认补 0, 故实际表示立即数为 [20:0] 共 21 位。
data:image/s3,"s3://crabby-images/99bb8/99bb8d350c0605c7d88fcb4a3ce44d29983c9102" alt=""
Example
data:image/s3,"s3://crabby-images/a349e/a349e729aae617c109f2127931c6df498f740935" alt=""
data:image/s3,"s3://crabby-images/e762d/e762d2ca75e841b27cee050e6e525b64fe6edf41" alt=""
RISC-V 直接用 PC 算 offset, 而非 PC+4.
- All RISC-V instructions are 4 bytes long
- PC-relative addressing refers to the number of halfwords
While branch target is far away
* Inserts an unconditional jump to target
Invert the condition so that the branch decides whether to skip the jump.
Branching far away
data:image/s3,"s3://crabby-images/d4963/d4963860149f026dbf93ef9851d186b57765bcf9" alt=""
RISC-V Addressing Summary¶
寻址方式是指令集的核心区别。
- 立即数寻址
addi x5, x6, 4
- 寄存器寻址
add x5, x6, x7
- 基址寻址
ld x5,100(x6)
- PC 相对寻址
beq x5,x6,L1
Example
data:image/s3,"s3://crabby-images/dc1fa/dc1fac7faa9a0a9cac0d92b43d1b4dfbd261e3ed" alt=""
RISC-V Disassembly¶
把机器码翻译为汇编指令。
- opcode
先看 opcode, 确定是哪类指令,随后就可以进行具体划分了。
Example
data:image/s3,"s3://crabby-images/75d0d/75d0d992dfef0b06bae5061cf91805a6b2882a89" alt=""
Synchronization in RISC-V¶
- Two processors sharing an area of memory
- P1 writes, then P2 reads
- Data race if P1 and P2 don’t synchronize
- Result depends of order of accesses
- Hardware support required
- Atomic read/write memory operation
- No other access to the location allowed between the read and write
- Could be a single instruction
- e.g. atomic swap* of register ↔ memory
- Or an atomic pair of instructions
Load reserved: lr.d rd,(rs1)
把地址 rs1 的值放到寄存器 rd 中,同时
Store conditional: sc.d rd,(rs1),rs2
把寄存器 rs2 的值放入地址 rs1.
如果成功那么 rd 里面是 0. 如果上条指令 load 后,这个地方的值被改变了,那么就失败了,返回 0.
atomic swap
data:image/s3,"s3://crabby-images/a20e3/a20e30d6f031ce487d41ec5d8c996b0ad82bbca0" alt=""
lock
data:image/s3,"s3://crabby-images/4da30/4da30f7e641c75df6af29b2bee948001e8fc9b2f" alt=""
地址 x20 放的是锁,如果锁为 0, 说明我们现在可以存入数据,则我们获得锁随后存入,并释放锁。否则需要等锁释放了才能存。
Translating and starting a program¶
data:image/s3,"s3://crabby-images/1b260/1b260e579cf0bce11a62a19413c592e4f0b29f99" alt=""
Producing an Object Module¶
Provides information for building a complete program from the pieces(Header).
data:image/s3,"s3://crabby-images/0409c/0409cec0e9245a9a9a71380b8c2b8e3846d52222" alt=""
- Text segment: translated instructions
- Static data segment: data allocated for the life of the program
- Relocation info: for contents that depend on absolute location of loaded program
- Symbol table: global definitions and external refs
- Debug info: for associating with source cod
Link¶
Object modules(including library routine) \(\rightarrow\) executable program
- Place code and data modules symbolically in memory
- Determine the addresses of data and instruction labels
- Patch both the internal and external references (Address of invoke)
Loading a Program¶
Load from image file on disk into memory
- Read header to determine segment sizes
- Create virtual address space
- Copy text and initialized data into memory
Or set page table entries so they can be faulted in - Set up arguments on stack
- Initialize registers (including sp, fp, gp)
- Jump to startup routine
Copies arguments to x10, … and calls main pWhen main returns, do exit syscall
Dynamic Linking¶
Only link/load library procedure when it is called.
静态链接已经编入文件了,动态链接是在运行时链接,可以用到最新的代码
- Requires procedure code to be relocatable
- Avoids image bloat caused by static linking of all (transitively) referenced libraries
- Automatically picks up new library versions
Arrays versus Pointers¶
指针是可以改变的,但是数组首地址不能改变,因此翻译成汇编的结果也有所不同。
Clearing an Array
data:image/s3,"s3://crabby-images/733d8/733d86b1eb73434ce3fdbaf8d374db02eb1c886c" alt=""
Real Stuff: MIPS Instructions¶
MIPS: commercial predecessor to RISC-V
- Similar basic set of instructions
- 32-bit instructions
- 32 general purpose registers, register 0 is always 0
- 32 floating-point registers
- Memory accessed only by load/store instructions
- Consistent use of addressing modes for all data sizes
Different conditional branches
- For <, <=, >, >=
- RISC-V:
blt, bge, bltu, bgeu
- MIPS:
slt, sltu
(set less than, result is 0 or 1) - Then use
beq
, bne to complete the branch
data:image/s3,"s3://crabby-images/99025/990253fa4add7fdf8d93474a6a4fc74a635cd570" alt=""
Real Stuff: The Intel x86 ISA¶
Other RISC-V Instructions¶
- Base integer instructions (RV64I)
- Those previously described, plus
auipc rd, immed
// rd = (imm<<12) + pc- follow by
jalr
(adds 12-bit immed) for long jump slt, sltu, slti, sltui
: set less than (like MIPS)addw, subw, addiw
: 32-bit add/subsllw, srlw, srlw, slliw, srliw, sraiw
: 32-bit shift
- 32-bit variant: RV32I
registers are 32-bits wide, 32-bit operations
RV64I 是最基本的。编译器需要知道当前执行是基于多少位的处理器。
Instruction Set Extensions
- M: integer multiply, divide, remainder
- A: atomic memory operations
- F: single-precision floating point
- D: double-precision floating point
- C: compressed instructions
16 位的指令,用于低成本产品(嵌入式)
Fallacies (谬误)
- Powerful instruction \(\Rightarrow\) higher performance
- Use assembly code for high performance
- Backward compatibility \(\Rightarrow\) instruction set doesn’t change
Pifalls (陷阱)
- Sequential words are not at sequential addresses (应该 +4)
- Keeping a pointer to an automatic variable after procedure returns