x86-64 ISA
x86-64 Instruction Set Architecture
1
2
3
gcc -Og -S code.c
gcc -Og -c code.c
objdump -d code.o
Registers
| 8 bytes | 4 bytes | 2 bytes | 1 byte | Purpose |
|---|---|---|---|---|
| %rax | %eax | %ax | %al | Return value |
| %rbx | %ebx | %bx | %bl | Callee-saved |
| %rcx | %ecx | %cx | %cl | 4th argument |
| %rdx | %edx | %dx | %dl | 3rd argument |
| %rsi | %esi | %si | %sil | 2nd argument |
| %rdi | %edi | %di | %dil | 1st argument |
| %rbp | %ebp | %bp | %bpl | Callee-saved |
| %rsp | %esp | %sp | %spl | Stack pointer |
| %r8 | %r8d | %r8w | %r8b | 5th argument |
| %r9 | %r9d | %r9w | %r9b | 6th argument |
| %r10 | %r10d | %r10w | %r10b | Caller-saved |
| %r11 | %r11d | %r11w | %r11b | Caller-saved |
| %r12 | %r12d | %r12w | %r12b | Callee-saved |
| %r13 | %r13d | %r13w | %r13b | Callee-saved |
| %r14 | %r14d | %r14w | %r14b | Callee-saved |
| %r15 | %r15d | %r15w | %r15b | Callee-saved |
The name follow multiple history conventions:
- The original 8086 had 8 16-bit registers from
%axto%bp. - The IA32 expanded the registers to 32 bits from
%eaxto%ebp. - The x86-64 further expanded the registers to 64 bits from
%raxto%rbp, and added 8 more registers from%r8to%r15.
Operand Specifiers
| Type | Form | Operand value | Name |
|---|---|---|---|
| Immediate | $Imm | Imm | Immediate |
| Register | r_a | R[r_a] | Register |
| Memory | Imm | M[Imm] | Absolute |
| Memory | (r_a) | M[R[r_a]] | Indirect |
| Memory | Imm(r_b) | M[Imm + R[r_b]] | Base + displacement |
| Memory | (r_b, r_i) | M[R[r_b] + R[r_i]] | Indexed |
| Memory | Imm(r_b, r_i) | M[Imm + R[r_b] + R[r_i]] | Indexed |
| Memory | (, r_i, s) | M[R[r_i] · s] | Scaled indexed |
| Memory | Imm(, r_i, s) | M[Imm + R[r_i] · s] | Scaled indexed |
| Memory | (r_b, r_i, s) | M[R[r_b] + R[r_i] · s] | Scaled indexed |
| Memory | Imm(r_b, r_i, s) | M[Imm + R[r_b] + R[r_i] · s] | Scaled indexed |
The scaling factor
scan only be 1, 2, 4, or 8.
In memory operands, the base register
r_band the index registerr_imust be 8-byte registers.%rspcannot be used as an index register.
Data Movement Instructions (mov)
Simple data movement instructions
- MOV S, D $\rightarrow$
D ← S- Movemovb- Move bytemovw- Move wordmovl- Move double wordmovq- Move quad word
Source operand can be immediate, register, or memory. Destination operand can be register or memory.
movdoes not support memory-to-memory moves, but we can use a register as an intermediate (requires 2 instructions).
For most cases, mov will only update the destination register bytes and memory bytes, and leave the other bytes unchanged. The only exception is when movl has a register destination, it will set the upper 4 bytes to 0. This is because the original 32-bit IA32 architecture only had 8 32-bit registers, and when they expanded to 64 bits, they wanted to maintain backward compatibility with existing code that used movl to move 32-bit values into registers. So they decided to zero-extend the upper 4 bytes when using movl with a register destination.
movabsqI, R $\rightarrow$R ← I— Move absolute quad word
The regular movq can only move a 32-bit immediate to a register, then sign-extend it to 64 bits. If we want to move a 64-bit immediate, we need to use movabsq instead.
1
2
3
4
5
movabsq $0x0011223344556677, %rax # %rax = 0011223344556677
movb $-1, %al # %rax = 00112233445566FF
movw $-1, %ax # %rax = 001122334455FFFF
movl $-1, %eax # %rax = 00000000FFFFFFFF
movq $-1, %rax # %rax = FFFFFFFFFFFFFFFF
Zero-extending data movement instructions
- MOVZ S, R $\rightarrow$
R ← ZeroExtend(S)- Move with zero extensionmovzbw- Move zero-extended byte to wordmovzbl- Move zero-extended byte to double wordmovzwl- Move zero-extended word to double wordmovzbq- Move zero-extended byte to quad wordmovzwq- Move zero-extended word to quad word
Source operand can be register or memory, but not immediate. Destination operand must be a register.
movzlqdoes not exist, becausemovlalready zero-extends the source to 64 bits when the destination is a register.
Sign-extending data movement instructions
- MOVS S, R $\rightarrow$
R ← SignExtend(S)- Move with sign extensionmovsbw- Move sign-extended byte to wordmovsbl- Move sign-extended byte to double wordmovswl- Move sign-extended word to double wordmovsbq- Move sign-extended byte to quad wordmovswq- Move sign-extended word to quad wordmovslq- Move sign-extended double word to quad word
cltq(no operand) $\rightarrow$%rax ← SignExtend(%eax)- sign-extend%eaxto%rax(equivalent tomovslq %eax, %rax)
1
2
3
4
5
movabsq $0x0011223344556677, %rax # %rax = 0011223344556677
movb $0xAA, %dl # %dl = AA
movb %dl, %al # %rax = 00112233445566AA
movsbq %dl, %rax # %rax = FFFFFFFFFFFFFFAA
movzbq %dl, %rax # %rax = 00000000000000AA
Pushing and Popping Stack Data
rsp is the stack pointer, which points to the top of the stack. The stack grows downwards, so pushing data onto the stack will decrease rsp, and popping data from the stack will increase rsp.
pushq S$\rightarrow$R[%rsp] ← R[%rsp] - 8; M[R[%rsp]] ← S- Push quad wordpopq D$\rightarrow$D ← M[R[%rsp]]; R[%rsp] ← R[%rsp] + 8- Pop quad word
Arithmetic and Logical Operations
Load Effective Address
leaq S, R$\rightarrow$R ← &S- Load effective address
It has the form of an instruction that reads from memory to a register, but it does not read from memory. Instead, it computes the effective address of the source operand and stores it in the destination register.
Example: if R[%rdx] = x, then leaq 7(%rdx, %rdx, 4), %rax will set %rax to 7 + x + 4x = 7 + 5x.
The destination operand must be a register, but the source operand can be an immediate, register, or memory operand.
leaqhas no variant for operands of different sizes (likelealorleaw), because the effective address is always a 64-bit value.
Compilers often find clever uses of leaq that have nothing to do with effective address computations.
Unary and Binary Operations
| Instruction | Effect | Description |
|---|---|---|
| INC D | $D \leftarrow D + 1$ | Increment |
| DEC D | $D \leftarrow D - 1$ | Decrement |
| NEG D | $D \leftarrow -D$ | Negate |
| NOT D | $D \leftarrow \sim D$ | Bitwise NOT |
| ADD S, D | $D \leftarrow D + S$ | Add |
| SUB S, D | $D \leftarrow D - S$ | Subtract |
| IMUL S, D | $D \leftarrow D \times S$ | Integer multiply |
| XOR S, D | $D \leftarrow D \oplus S$ | Bitwise XOR |
| OR S, D | $D \leftarrow D \mid S$ | Bitwise OR |
| AND S, D | $D \leftarrow D \& S$ | Bitwise AND |
Unary operand can be either register or memory, but not immediate.
Binary source operand can be immediate, register, or memory. Binary destination operand can be either register or memory. Operands cannot both be memory locations.
When the destination operand is a memory location, the processor will first read the value from memory, perform the operation, and then write the result back to memory.
Shift Operations
| Instruction | Effect | Description |
|---|---|---|
| SAL S, D | $D \leftarrow D << S$ | Arithmetic left shift |
| SHL S, D | $D \leftarrow D << S$ | Logical left shift (same as SAL) |
| SAR S, D | $D \leftarrow D >>_A S$ | Arithmetic right shift |
| SHR S, D | $D \leftarrow D >>_L S$ | Logical right shift |
The shift amount S can be an immediate value or the value in the single-byte %cl register. The destination operand D can be a register or a memory location, but not an immediate value.
1-byte shift amount would make the shift amount up to 255. However, with x86-64, a shift instruction on value that are $m$-bits long will only consider the lower $\log_2 m$ bits of the shift amount.
- For example, when
%cl = 0xFF, thensalbshifts by 7,salwshifts by 15,sallshifts by 31, andsalqshifts by 63.
No DIV or MOD instruction for normal division and modulus, but there are
idivanddivinstructions for signed and unsigned division respectively, which will be covered in the next section.
Oct word (16 bytes)
The x86-64 instruction set provides limited support for operations involving 128-bit (16-byte) numbers.
| Instruction | Effect | Description |
|---|---|---|
imulq S | R[%rdx]:R[%rax] ← S * R[%rax] | signed full multiply |
mulq S | R[%rdx]:R[%rax] ← S * R[%rax] | unsigned full multiply |
cqto | R[%rdx]:R[%rax] ← SignExtend(R[%rax]) | convert to oct-word |
idivq S | R[%rax] ← R[%rdx]:R[%rax] / S R[%rdx] ← R[%rdx]:R[%rax] % S | signed divide |
divq S | R[%rax] ← R[%rdx]:R[%rax] / S R[%rdx] ← R[%rdx]:R[%rax] % S | unsigned divide |
imulq has 2 different form, with this 1-operand form being the only way to multiply two 64-bit numbers to get a 128-bit result. The other form of imulq with 2 or 3 operands only produces a 64-bit result.
idivonly have 1 form with 1 operand, not 2 likeimul.
Control Flow Instructions
Conditional Codes
In addition to the integer registers, the CPU maintains a set of single-bit condition code registers describing attributes of the most recent arithmetic or logical operation.
CF: Carry flag. The most recent operation generated a carry out of the most significant bit. Used to detect overflow for unsigned operations.ZF: Zero flag. The most recent operation yielded zero.SF: Sign flag. The most recent operation yielded a negative value.OF: Overflow flag. The most recent operation caused a two’s-complement overflow—either negative or positive.
For example, suppose we used one of the add instructions to perform t = a+b, where a, b, and t are integers. Then the condition codes would be set according to the following C expressions:
| Code | C expression | Description |
|---|---|---|
| CF | (unsigned) t < (unsigned) a | Unsigned overflow |
| ZF | (t == 0) | Zero |
| SF | (t < 0) | Negative |
| OF | (a<0 == b<0) && (t<0 != a<0) | Signed overflow |
The
leaqinstruction does not alter any condition codes, since it is intended to be used in address computations.
Otherwise, all of the listed Arithmetic and Logical instructions cause the condition codes to be set.
For the logical operations, such as xor, CF = 0 and OF = 0. For the shift operations, CF is set to the last bit shifted out, while OF = 0.
The inc and dec instructions set OF and ZF, but they leave CF unchanged.
CMP and TEST instructions
- CMP S, D set flags based on
D - S- Comparecmpb,cmpw,cmpl,cmpq
- TEST S, D set flags based on
S & D- Testtestb,testw,testl,testq
Both cmp and test instructions do not store the result of the operation, but they only set the condition codes based on the result.
The
cmpoperands is listed in reverse order, so if we want to compareaandb, we should writecmp b, a, which will set the condition codes based ona - b.
For TEST, typically the same operand is repeated (e.g., testq %rax,%rax to see whether %rax is negative, zero, or positive). Or one of the operands is a mask indicating which bits should be tested.
Accessing the Condition Codes
There are three common ways of using the condition codes:
- Set byte on condition instructions
- Conditional jump instructions
- Conditional move instructions
Set byte on condition instructions (SET instructions)
A set instruction has either one of the low-order single-byte register elements or a single-byte memory location as its destination.
To generate a 32-bit or 64-bit result, we must also use a movzb instruction to zero-extend the result to the desired size.
- SET D $\rightarrow$
D ← 1if condition is true, elseD ← 0- Set byte on condition
| Instruction | Synonym | Effect | Set condition |
|---|---|---|---|
sete | setz | D ← ZF | Equal / zero |
setne | setnz | D ← ~ZF | Not equal / not zero |
sets | D ← SF | Negative | |
setns | D ← ~SF | Not negative | |
setg | setnle | D ← ~(SF ^ OF) & ~ZF | Greater |
setge | setnl | D ← ~(SF ^ OF) | Greater or equal |
setl | setnge | D ← SF ^ OF | Less |
setle | setng | D ← (SF ^ OF) | ZF | Less or equal |
seta | setnbe | D ← ~CF & ~ZF | Above |
setae | setnb | D ← ~CF | Above or equal |
Example: int comp(data_t a,data_t b) a in %rdi, b in %rsi
1
2
3
4
5
comp:
cmpq %rsi, %rdi # Compare a with b by computing a - b
setl %al # %al = 1 if a < b, else 0
movzbl %al, %eax # Zero-extend the 8-bit result to 32 bits
ret # Return the 0/1 result
There are multiple possible names for the same instruction. Compiler and diassembler make arbitrary choice of which names to use.
Conditional jump instructions (J instructions)
| Instruction | Synonym | Jump condition | Description |
|---|---|---|---|
jmp Label | 1 | Direct jump | |
jmp *Operand | 1 | Indirect jump | |
je Label | jz | ZF | Equal / zero |
jne Label | jnz | ~ZF | Not equal / not zero |
js Label | SF | Negative | |
jns Label | ~SF | Nonnegative | |
jg Label | jnle | ~(SF ^ OF) & ~ZF | Greater (signed >) |
jge Label | jnl | ~(SF ^ OF) | Greater or equal (signed >=) |
jl Label | jnge | SF ^ OF | Less (signed <) |
jle Label | jng | (SF ^ OF) | ZF | Less or equal (signed <=) |
ja Label | jnbe | ~CF & ~ZF | Above (unsigned >) |
jae Label | jnb | ~CF | Above or equal (unsigned >=) |
jb Label | jnae | CF | Below (unsigned <) |
jbe Label | jna | CF | ZF | Below or equal (unsigned <=) |
- Direct jump: where jump target is specified as a label in the code. The assembler will compute the offset from the jump instruction to the target label and encode it in the instruction.
- Indirect jump: where jump target is specified as a register or memory location. The processor will read the target address from the specified register or memory location at runtime and jump to that address.
Example:
1
2
jmp *%rax # Jump to the address stored in %rax
jmp *(%rax) # Jump to the address stored at the memory location pointed to by %rax
Conditional move instructions (CMOV instructions)
Similar to conditional jump instructions, but instead of jumping to a different location, they conditionally move a value from the source operand to the destination register based on the condition codes.
Destination operand must be a register, while source operand can be a register or memory location.
The source and destination values can only be 16, 32, or 64 bits, but not 8 bits. And the operand length can be inferred from the register names, so there is only one form of cmov instruction for each condition, without the need for suffixes like cmovzb or cmovl.
cmov is not always faster than branching, because it may compute both expressions and discard one, which is wasteful if either expression is expensive.
Compilers like GCC are conservative: they usually use cmov only when both expressions are very cheap, since they cannot reliably know whether the branch will be predictable at runtime.
Switch statements
When a switch statement has multiple cases with values that are close together, the compiler optimizes it by generating a Jump Table rather than a long chain of if-else branches. This reduces the time complexity of the branch from $O(N)$ to $O(1)$.
Under the hood, this relies on a GCC compiler extension called Labels as Values. You can get the memory address of a code label using &&, and jump to it by dereferencing it with goto *.
1
2
3
4
5
6
7
8
// 1. Array of void pointers (the jump table)
static void *jt[7] = {
&&loc_A, &&loc_def, &&loc_B,
&&loc_C, &&loc_D, &&loc_def, &&loc_D
};
// 2. The computed goto (Indirect Jump)
goto *jt[index];
The Assembly Level (x86-64)
The actual memory traversal is handled beautifully in a few lines of assembly using the Offset(Base, Index, Scale) addressing mode.
1
2
3
4
# Assume %rsi holds our computed index
cmpq $6, %rsi # Compare index to max table size (6)
ja .loc_def # 'ja' (Jump Above) catches > 6 AND negative numbers!
jmp *.L4(,%rsi,8) # Indirect jump to Base(.L4) + (Index * 8 bytes)
- The Scale Factor is Always 8: In x86-64, memory addresses (pointers) are 64 bits. 64 bits = 8 bytes. The CPU steps through the jump table in 8-byte increments to find the next instruction pointer.
- The
*is Mandatory:jmp *.L4(...)is an indirect jump. Without the*, the CPU tries to execute the jump table itself as code. With the*, it reads the address inside the table and jumps there. - The Unsigned Bounds Trick: The bounds check uses
ja(unsigned greater than). If an index is negative, its two’s complement binary representation looks like a massive positive number when evaluated as unsigned. This single instruction cleverly catches both overflow and underflow, routing both to the default case. - Index Biasing: If your switch cases start at
100(e.g.,case 100:,case 101:), the compiler will inject asubq $100, %regbefore the table lookup to shift the index back to0.