Post

x86-64 ISA

x86-64 Instruction Set Architecture

x86-64 ISA
1
2
3
gcc -Og -S code.c
gcc -Og -c code.c
objdump -d code.o

Registers

8 bytes4 bytes2 bytes1 bytePurpose
%rax%eax%ax%alReturn value
%rbx%ebx%bx%blCallee-saved
%rcx%ecx%cx%cl4th argument
%rdx%edx%dx%dl3rd argument
%rsi%esi%si%sil2nd argument
%rdi%edi%di%dil1st argument
%rbp%ebp%bp%bplCallee-saved
%rsp%esp%sp%splStack pointer
%r8%r8d%r8w%r8b5th argument
%r9%r9d%r9w%r9b6th argument
%r10%r10d%r10w%r10bCaller-saved
%r11%r11d%r11w%r11bCaller-saved
%r12%r12d%r12w%r12bCallee-saved
%r13%r13d%r13w%r13bCallee-saved
%r14%r14d%r14w%r14bCallee-saved
%r15%r15d%r15w%r15bCallee-saved

The name follow multiple history conventions:

  • The original 8086 had 8 16-bit registers from %ax to %bp.
  • The IA32 expanded the registers to 32 bits from %eax to %ebp.
  • The x86-64 further expanded the registers to 64 bits from %rax to %rbp, and added 8 more registers from %r8 to %r15.

Operand Specifiers

TypeFormOperand valueName
Immediate$ImmImmImmediate
Registerr_aR[r_a]Register
MemoryImmM[Imm]Absolute
Memory(r_a)M[R[r_a]]Indirect
MemoryImm(r_b)M[Imm + R[r_b]]Base + displacement
Memory(r_b, r_i)M[R[r_b] + R[r_i]]Indexed
MemoryImm(r_b, r_i)M[Imm + R[r_b] + R[r_i]]Indexed
Memory(, r_i, s)M[R[r_i] · s]Scaled indexed
MemoryImm(, r_i, s)M[Imm + R[r_i] · s]Scaled indexed
Memory(r_b, r_i, s)M[R[r_b] + R[r_i] · s]Scaled indexed
MemoryImm(r_b, r_i, s)M[Imm + R[r_b] + R[r_i] · s]Scaled indexed

The scaling factor s can only be 1, 2, 4, or 8.

In memory operands, the base register r_b and the index register r_i must be 8-byte registers. %rsp cannot be used as an index register.

Data Movement Instructions (mov)

Simple data movement instructions

  • MOV S, D $\rightarrow$ D ← S - Move
    • movb - Move byte
    • movw - Move word
    • movl - Move double word
    • movq - Move quad word

Source operand can be immediate, register, or memory. Destination operand can be register or memory.

mov does not support memory-to-memory moves, but we can use a register as an intermediate (requires 2 instructions).

For most cases, mov will only update the destination register bytes and memory bytes, and leave the other bytes unchanged. The only exception is when movl has a register destination, it will set the upper 4 bytes to 0. This is because the original 32-bit IA32 architecture only had 8 32-bit registers, and when they expanded to 64 bits, they wanted to maintain backward compatibility with existing code that used movl to move 32-bit values into registers. So they decided to zero-extend the upper 4 bytes when using movl with a register destination.

  • movabsq I, R $\rightarrow$ R ← I — Move absolute quad word

The regular movq can only move a 32-bit immediate to a register, then sign-extend it to 64 bits. If we want to move a 64-bit immediate, we need to use movabsq instead.

1
2
3
4
5
movabsq $0x0011223344556677, %rax    # %rax = 0011223344556677
movb    $-1, %al                     # %rax = 00112233445566FF
movw    $-1, %ax                     # %rax = 001122334455FFFF
movl    $-1, %eax                    # %rax = 00000000FFFFFFFF
movq    $-1, %rax                    # %rax = FFFFFFFFFFFFFFFF

Zero-extending data movement instructions

  • MOVZ S, R $\rightarrow$ R ← ZeroExtend(S) - Move with zero extension
    • movzbw - Move zero-extended byte to word
    • movzbl - Move zero-extended byte to double word
    • movzwl - Move zero-extended word to double word
    • movzbq - Move zero-extended byte to quad word
    • movzwq - Move zero-extended word to quad word

Source operand can be register or memory, but not immediate. Destination operand must be a register.

movzlq does not exist, because movl already zero-extends the source to 64 bits when the destination is a register.

Sign-extending data movement instructions

  • MOVS S, R $\rightarrow$ R ← SignExtend(S) - Move with sign extension
    • movsbw - Move sign-extended byte to word
    • movsbl - Move sign-extended byte to double word
    • movswl - Move sign-extended word to double word
    • movsbq - Move sign-extended byte to quad word
    • movswq - Move sign-extended word to quad word
    • movslq - Move sign-extended double word to quad word
  • cltq (no operand) $\rightarrow$ %rax ← SignExtend(%eax) - sign-extend %eax to %rax (equivalent to movslq %eax, %rax)
1
2
3
4
5
movabsq $0x0011223344556677, %rax    # %rax = 0011223344556677
movb    $0xAA, %dl                   # %dl  = AA
movb    %dl, %al                     # %rax = 00112233445566AA
movsbq  %dl, %rax                    # %rax = FFFFFFFFFFFFFFAA
movzbq  %dl, %rax                    # %rax = 00000000000000AA

Pushing and Popping Stack Data

rsp is the stack pointer, which points to the top of the stack. The stack grows downwards, so pushing data onto the stack will decrease rsp, and popping data from the stack will increase rsp.

  • pushq S $\rightarrow$ R[%rsp] ← R[%rsp] - 8; M[R[%rsp]] ← S - Push quad word
  • popq D $\rightarrow$ D ← M[R[%rsp]]; R[%rsp] ← R[%rsp] + 8 - Pop quad word

Arithmetic and Logical Operations

Load Effective Address

  • leaq S, R $\rightarrow$ R ← &S - Load effective address

It has the form of an instruction that reads from memory to a register, but it does not read from memory. Instead, it computes the effective address of the source operand and stores it in the destination register.

Example: if R[%rdx] = x, then leaq 7(%rdx, %rdx, 4), %rax will set %rax to 7 + x + 4x = 7 + 5x.

The destination operand must be a register, but the source operand can be an immediate, register, or memory operand.

leaq has no variant for operands of different sizes (like leal or leaw), because the effective address is always a 64-bit value.

Compilers often find clever uses of leaq that have nothing to do with effective address computations.

Unary and Binary Operations

InstructionEffectDescription
INC D$D \leftarrow D + 1$Increment
DEC D$D \leftarrow D - 1$Decrement
NEG D$D \leftarrow -D$Negate
NOT D$D \leftarrow \sim D$Bitwise NOT
ADD S, D$D \leftarrow D + S$Add
SUB S, D$D \leftarrow D - S$Subtract
IMUL S, D$D \leftarrow D \times S$Integer multiply
XOR S, D$D \leftarrow D \oplus S$Bitwise XOR
OR S, D$D \leftarrow D \mid S$Bitwise OR
AND S, D$D \leftarrow D \& S$Bitwise AND
  • Unary operand can be either register or memory, but not immediate.

  • Binary source operand can be immediate, register, or memory. Binary destination operand can be either register or memory. Operands cannot both be memory locations.

When the destination operand is a memory location, the processor will first read the value from memory, perform the operation, and then write the result back to memory.

Shift Operations

InstructionEffectDescription
SAL S, D$D \leftarrow D << S$Arithmetic left shift
SHL S, D$D \leftarrow D << S$Logical left shift (same as SAL)
SAR S, D$D \leftarrow D >>_A S$Arithmetic right shift
SHR S, D$D \leftarrow D >>_L S$Logical right shift

The shift amount S can be an immediate value or the value in the single-byte %cl register. The destination operand D can be a register or a memory location, but not an immediate value.

1-byte shift amount would make the shift amount up to 255. However, with x86-64, a shift instruction on value that are $m$-bits long will only consider the lower $\log_2 m$ bits of the shift amount.

  • For example, when %cl = 0xFF, then salb shifts by 7, salw shifts by 15, sall shifts by 31, and salq shifts by 63.

No DIV or MOD instruction for normal division and modulus, but there are idiv and div instructions for signed and unsigned division respectively, which will be covered in the next section.

Oct word (16 bytes)

The x86-64 instruction set provides limited support for operations involving 128-bit (16-byte) numbers.

InstructionEffectDescription
imulq SR[%rdx]:R[%rax] ← S * R[%rax]signed full multiply
mulq SR[%rdx]:R[%rax] ← S * R[%rax]unsigned full multiply
cqtoR[%rdx]:R[%rax] ← SignExtend(R[%rax])convert to oct-word
idivq SR[%rax] ← R[%rdx]:R[%rax] / S
R[%rdx] ← R[%rdx]:R[%rax] % S
signed divide
divq SR[%rax] ← R[%rdx]:R[%rax] / S
R[%rdx] ← R[%rdx]:R[%rax] % S
unsigned divide

imulq has 2 different form, with this 1-operand form being the only way to multiply two 64-bit numbers to get a 128-bit result. The other form of imulq with 2 or 3 operands only produces a 64-bit result.

idiv only have 1 form with 1 operand, not 2 like imul.

Control Flow Instructions

Conditional Codes

In addition to the integer registers, the CPU maintains a set of single-bit condition code registers describing attributes of the most recent arithmetic or logical operation.

  • CF: Carry flag. The most recent operation generated a carry out of the most significant bit. Used to detect overflow for unsigned operations.
  • ZF: Zero flag. The most recent operation yielded zero.
  • SF: Sign flag. The most recent operation yielded a negative value.
  • OF: Overflow flag. The most recent operation caused a two’s-complement overflow—either negative or positive.

For example, suppose we used one of the add instructions to perform t = a+b, where a, b, and t are integers. Then the condition codes would be set according to the following C expressions:

CodeC expressionDescription
CF(unsigned) t < (unsigned) aUnsigned overflow
ZF(t == 0)Zero
SF(t < 0)Negative
OF(a<0 == b<0) && (t<0 != a<0)Signed overflow

The leaq instruction does not alter any condition codes, since it is intended to be used in address computations.

Otherwise, all of the listed Arithmetic and Logical instructions cause the condition codes to be set.

For the logical operations, such as xor, CF = 0 and OF = 0. For the shift operations, CF is set to the last bit shifted out, while OF = 0.

The inc and dec instructions set OF and ZF, but they leave CF unchanged.

CMP and TEST instructions

  • CMP S, D set flags based on D - S - Compare
    • cmpb, cmpw, cmpl, cmpq
  • TEST S, D set flags based on S & D - Test
    • testb, testw, testl, testq

Both cmp and test instructions do not store the result of the operation, but they only set the condition codes based on the result.

The cmp operands is listed in reverse order, so if we want to compare a and b, we should write cmp b, a, which will set the condition codes based on a - b.

For TEST, typically the same operand is repeated (e.g., testq %rax,%rax to see whether %rax is negative, zero, or positive). Or one of the operands is a mask indicating which bits should be tested.

Accessing the Condition Codes

There are three common ways of using the condition codes:

  1. Set byte on condition instructions
  2. Conditional jump instructions
  3. Conditional move instructions

Set byte on condition instructions (SET instructions)

A set instruction has either one of the low-order single-byte register elements or a single-byte memory location as its destination.

To generate a 32-bit or 64-bit result, we must also use a movzb instruction to zero-extend the result to the desired size.

  • SET D $\rightarrow$ D ← 1 if condition is true, else D ← 0 - Set byte on condition
InstructionSynonymEffectSet condition
setesetzD ← ZFEqual / zero
setnesetnzD ← ~ZFNot equal / not zero
sets D ← SFNegative
setns D ← ~SFNot negative
setgsetnleD ← ~(SF ^ OF) & ~ZFGreater
setgesetnlD ← ~(SF ^ OF)Greater or equal
setlsetngeD ← SF ^ OFLess
setlesetngD ← (SF ^ OF) | ZFLess or equal
setasetnbeD ← ~CF & ~ZFAbove
setaesetnbD ← ~CFAbove or equal

Example: int comp(data_t a,data_t b) a in %rdi, b in %rsi

1
2
3
4
5
comp:
    cmpq %rsi, %rdi     # Compare a with b by computing a - b
    setl %al            # %al = 1 if a < b, else 0
    movzbl %al, %eax    # Zero-extend the 8-bit result to 32 bits
    ret                 # Return the 0/1 result

There are multiple possible names for the same instruction. Compiler and diassembler make arbitrary choice of which names to use.

Conditional jump instructions (J instructions)

InstructionSynonymJump conditionDescription
jmp Label 1Direct jump
jmp *Operand 1Indirect jump
je LabeljzZFEqual / zero
jne Labeljnz~ZFNot equal / not zero
js Label SFNegative
jns Label ~SFNonnegative
jg Labeljnle~(SF ^ OF) & ~ZFGreater (signed >)
jge Labeljnl~(SF ^ OF)Greater or equal (signed >=)
jl LabeljngeSF ^ OFLess (signed <)
jle Labeljng(SF ^ OF) | ZFLess or equal (signed <=)
ja Labeljnbe~CF & ~ZFAbove (unsigned >)
jae Labeljnb~CFAbove or equal (unsigned >=)
jb LabeljnaeCFBelow (unsigned <)
jbe LabeljnaCF | ZFBelow or equal (unsigned <=)
  • Direct jump: where jump target is specified as a label in the code. The assembler will compute the offset from the jump instruction to the target label and encode it in the instruction.
  • Indirect jump: where jump target is specified as a register or memory location. The processor will read the target address from the specified register or memory location at runtime and jump to that address.

Example:

1
2
jmp *%rax    # Jump to the address stored in %rax
jmp *(%rax)   # Jump to the address stored at the memory location pointed to by %rax
Conditional move instructions (CMOV instructions)

Similar to conditional jump instructions, but instead of jumping to a different location, they conditionally move a value from the source operand to the destination register based on the condition codes.

Destination operand must be a register, while source operand can be a register or memory location.

The source and destination values can only be 16, 32, or 64 bits, but not 8 bits. And the operand length can be inferred from the register names, so there is only one form of cmov instruction for each condition, without the need for suffixes like cmovzb or cmovl.

cmov is not always faster than branching, because it may compute both expressions and discard one, which is wasteful if either expression is expensive.
Compilers like GCC are conservative: they usually use cmov only when both expressions are very cheap, since they cannot reliably know whether the branch will be predictable at runtime.

Switch statements

When a switch statement has multiple cases with values that are close together, the compiler optimizes it by generating a Jump Table rather than a long chain of if-else branches. This reduces the time complexity of the branch from $O(N)$ to $O(1)$.

Under the hood, this relies on a GCC compiler extension called Labels as Values. You can get the memory address of a code label using &&, and jump to it by dereferencing it with goto *.

1
2
3
4
5
6
7
8
// 1. Array of void pointers (the jump table)
static void *jt[7] = {
    &&loc_A, &&loc_def, &&loc_B, 
    &&loc_C, &&loc_D, &&loc_def, &&loc_D
};

// 2. The computed goto (Indirect Jump)
goto *jt[index]; 

The Assembly Level (x86-64)

The actual memory traversal is handled beautifully in a few lines of assembly using the Offset(Base, Index, Scale) addressing mode.

1
2
3
4
# Assume %rsi holds our computed index
cmpq    $6, %rsi          # Compare index to max table size (6)
ja      .loc_def          # 'ja' (Jump Above) catches > 6 AND negative numbers!
jmp     *.L4(,%rsi,8)     # Indirect jump to Base(.L4) + (Index * 8 bytes)
  • The Scale Factor is Always 8: In x86-64, memory addresses (pointers) are 64 bits. 64 bits = 8 bytes. The CPU steps through the jump table in 8-byte increments to find the next instruction pointer.
  • The * is Mandatory: jmp *.L4(...) is an indirect jump. Without the *, the CPU tries to execute the jump table itself as code. With the *, it reads the address inside the table and jumps there.
  • The Unsigned Bounds Trick: The bounds check uses ja (unsigned greater than). If an index is negative, its two’s complement binary representation looks like a massive positive number when evaluated as unsigned. This single instruction cleverly catches both overflow and underflow, routing both to the default case.
  • Index Biasing: If your switch cases start at 100 (e.g., case 100:, case 101:), the compiler will inject a subq $100, %reg before the table lookup to shift the index back to 0.
This post is licensed under CC BY 4.0 by the author.