x86-64 ISA

x86-64 Instruction Set Architecture

Posted Feb 23, 2026 Updated Jun 2, 2026

By marvinthang

15 min read

x86-64 ISA

  
gcc -Og -S code.c
gcc -Og -c code.c
objdump -d code.o

Registers

8 bytes	4 bytes	2 bytes	1 byte	Purpose
%rax	%eax	%ax	%al	Return value
%rbx	%ebx	%bx	%bl	Callee-saved
%rcx	%ecx	%cx	%cl	4th argument
%rdx	%edx	%dx	%dl	3rd argument
%rsi	%esi	%si	%sil	2nd argument
%rdi	%edi	%di	%dil	1st argument
%rbp	%ebp	%bp	%bpl	Callee-saved
%rsp	%esp	%sp	%spl	Stack pointer
%r8	%r8d	%r8w	%r8b	5th argument
%r9	%r9d	%r9w	%r9b	6th argument
%r10	%r10d	%r10w	%r10b	Caller-saved
%r11	%r11d	%r11w	%r11b	Caller-saved
%r12	%r12d	%r12w	%r12b	Callee-saved
%r13	%r13d	%r13w	%r13b	Callee-saved
%r14	%r14d	%r14w	%r14b	Callee-saved
%r15	%r15d	%r15w	%r15b	Callee-saved

The name follow multiple history conventions:

The original 8086 had 8 16-bit registers from %ax to %bp.
The IA32 expanded the registers to 32 bits from %eax to %ebp.
The x86-64 further expanded the registers to 64 bits from %rax to %rbp, and added 8 more registers from %r8 to %r15.

Operand Specifiers

Type	Form	Operand value	Name
Immediate	`$Imm`	`Imm`	Immediate
Register	`r_a`	`R[r_a]`	Register
Memory	`Imm`	`M[Imm]`	Absolute
Memory	`(r_a)`	`M[R[r_a]]`	Indirect
Memory	`Imm(r_b)`	`M[Imm + R[r_b]]`	Base + displacement
Memory	`(r_b, r_i)`	`M[R[r_b] + R[r_i]]`	Indexed
Memory	`Imm(r_b, r_i)`	`M[Imm + R[r_b] + R[r_i]]`	Indexed
Memory	`(, r_i, s)`	`M[R[r_i] · s]`	Scaled indexed
Memory	`Imm(, r_i, s)`	`M[Imm + R[r_i] · s]`	Scaled indexed
Memory	`(r_b, r_i, s)`	`M[R[r_b] + R[r_i] · s]`	Scaled indexed
Memory	`Imm(r_b, r_i, s)`	`M[Imm + R[r_b] + R[r_i] · s]`	Scaled indexed

The scaling factor s can only be 1, 2, 4, or 8.

In memory operands, the base register r_b and the index register r_i must be 8-byte registers. %rsp cannot be used as an index register.

Data Movement Instructions (mov)

Simple data movement instructions

MOV S, D $\rightarrow$ D ← S - Move
- movb - Move byte
- movw - Move word
- movl - Move double word
- movq - Move quad word

Source operand can be immediate, register, or memory. Destination operand can be register or memory.

mov does not support memory-to-memory moves, but we can use a register as an intermediate (requires 2 instructions).

For most cases, mov will only update the destination register bytes and memory bytes, and leave the other bytes unchanged. The only exception is when movl has a register destination, it will set the upper 4 bytes to 0. This is because the original 32-bit IA32 architecture only had 8 32-bit registers, and when they expanded to 64 bits, they wanted to maintain backward compatibility with existing code that used movl to move 32-bit values into registers. So they decided to zero-extend the upper 4 bytes when using movl with a register destination.

movabsq I, R $\rightarrow$ R ← I — Move absolute quad word

The regular movq can only move a 32-bit immediate to a register, then sign-extend it to 64 bits. If we want to move a 64-bit immediate, we need to use movabsq instead.

  
movabsq $0x0011223344556677, %rax    # %rax = 0011223344556677
movb    $-1, %al                     # %rax = 00112233445566FF
movw    $-1, %ax                     # %rax = 001122334455FFFF
movl    $-1, %eax                    # %rax = 00000000FFFFFFFF
movq    $-1, %rax                    # %rax = FFFFFFFFFFFFFFFF

Zero-extending data movement instructions

MOVZ S, R $\rightarrow$ R ← ZeroExtend(S) - Move with zero extension
- movzbw - Move zero-extended byte to word
- movzbl - Move zero-extended byte to double word
- movzwl - Move zero-extended word to double word
- movzbq - Move zero-extended byte to quad word
- movzwq - Move zero-extended word to quad word

Source operand can be register or memory, but not immediate. Destination operand must be a register.

movzlq does not exist, because movl already zero-extends the source to 64 bits when the destination is a register.

Sign-extending data movement instructions

MOVS S, R $\rightarrow$ R ← SignExtend(S) - Move with sign extension
- movsbw - Move sign-extended byte to word
- movsbl - Move sign-extended byte to double word
- movswl - Move sign-extended word to double word
- movsbq - Move sign-extended byte to quad word
- movswq - Move sign-extended word to quad word
- movslq - Move sign-extended double word to quad word
cltq (no operand) $\rightarrow$ %rax ← SignExtend(%eax) - sign-extend %eax to %rax (equivalent to movslq %eax, %rax)

  
movabsq $0x0011223344556677, %rax    # %rax = 0011223344556677
movb    $0xAA, %dl                   # %dl  = AA
movb    %dl, %al                     # %rax = 00112233445566AA
movsbq  %dl, %rax                    # %rax = FFFFFFFFFFFFFFAA
movzbq  %dl, %rax                    # %rax = 00000000000000AA

Pushing and Popping Stack Data

rsp is the stack pointer, which points to the top of the stack. The stack grows downwards, so pushing data onto the stack will decrease rsp, and popping data from the stack will increase rsp.

pushq S $\rightarrow$ R[%rsp] ← R[%rsp] - 8; M[R[%rsp]] ← S - Push quad word
popq D $\rightarrow$ D ← M[R[%rsp]]; R[%rsp] ← R[%rsp] + 8 - Pop quad word

Arithmetic and Logical Operations

Load Effective Address

leaq S, R $\rightarrow$ R ← &S - Load effective address

It has the form of an instruction that reads from memory to a register, but it does not read from memory. Instead, it computes the effective address of the source operand and stores it in the destination register.

Example: if R[%rdx] = x, then leaq 7(%rdx, %rdx, 4), %rax will set %rax to 7 + x + 4x = 7 + 5x.

The destination operand must be a register, but the source operand can be an immediate, register, or memory operand.

leaq has no variant for operands of different sizes (like leal or leaw), because the effective address is always a 64-bit value.

Compilers often find clever uses of leaq that have nothing to do with effective address computations.

Unary and Binary Operations

Instruction	Effect	Description
INC D	$D \leftarrow D + 1$	Increment
DEC D	$D \leftarrow D - 1$	Decrement
NEG D	$D \leftarrow -D$	Negate
NOT D	$D \leftarrow \sim D$	Bitwise NOT
ADD S, D	$D \leftarrow D + S$	Add
SUB S, D	$D \leftarrow D - S$	Subtract
IMUL S, D	$D \leftarrow D \times S$	Integer multiply
XOR S, D	$D \leftarrow D \oplus S$	Bitwise XOR
OR S, D	$D \leftarrow D \mid S$	Bitwise OR
AND S, D	$D \leftarrow D \& S$	Bitwise AND

Unary operand can be either register or memory, but not immediate.
Binary source operand can be immediate, register, or memory. Binary destination operand can be either register or memory. Operands cannot both be memory locations.

When the destination operand is a memory location, the processor will first read the value from memory, perform the operation, and then write the result back to memory.

Shift Operations

Instruction	Effect	Description
SAL S, D	$D \leftarrow D << S$	Arithmetic left shift
SHL S, D	$D \leftarrow D << S$	Logical left shift (same as SAL)
SAR S, D	$D \leftarrow D >>_A S$	Arithmetic right shift
SHR S, D	$D \leftarrow D >>_L S$	Logical right shift

The shift amount S can be an immediate value or the value in the single-byte %cl register. The destination operand D can be a register or a memory location, but not an immediate value.

1-byte shift amount would make the shift amount up to 255. However, with x86-64, a shift instruction on value that are $m$-bits long will only consider the lower $\log_2 m$ bits of the shift amount.

For example, when %cl = 0xFF, then salb shifts by 7, salw shifts by 15, sall shifts by 31, and salq shifts by 63.

No DIV or MOD instruction for normal division and modulus, but there are idiv and div instructions for signed and unsigned division respectively, which will be covered in the next section.

Oct word (16 bytes)

The x86-64 instruction set provides limited support for operations involving 128-bit (16-byte) numbers.

Instruction	Effect	Description
`imulq S`	`R[%rdx]:R[%rax] ← S * R[%rax]`	signed full multiply
`mulq S`	`R[%rdx]:R[%rax] ← S * R[%rax]`	unsigned full multiply
`cqto`	`R[%rdx]:R[%rax] ← SignExtend(R[%rax])`	convert to oct-word
`idivq S`	`R[%rax] ← R[%rdx]:R[%rax] / S` `R[%rdx] ← R[%rdx]:R[%rax] % S`	signed divide
`divq S`	`R[%rax] ← R[%rdx]:R[%rax] / S` `R[%rdx] ← R[%rdx]:R[%rax] % S`	unsigned divide

imulq has 2 different form, with this 1-operand form being the only way to multiply two 64-bit numbers to get a 128-bit result. The other form of imulq with 2 or 3 operands only produces a 64-bit result.

idiv only have 1 form with 1 operand, not 2 like imul.

Control Flow Instructions

Conditional Codes

In addition to the integer registers, the CPU maintains a set of single-bit condition code registers describing attributes of the most recent arithmetic or logical operation.

CF: Carry flag. The most recent operation generated a carry out of the most significant bit. Used to detect overflow for unsigned operations.
ZF: Zero flag. The most recent operation yielded zero.
SF: Sign flag. The most recent operation yielded a negative value.
OF: Overflow flag. The most recent operation caused a two’s-complement overflow—either negative or positive.

For example, suppose we used one of the add instructions to perform t = a+b, where a, b, and t are integers. Then the condition codes would be set according to the following C expressions:

Code	C expression	Description
CF	`(unsigned) t < (unsigned) a`	Unsigned overflow
ZF	`(t == 0)`	Zero
SF	`(t < 0)`	Negative
OF	`(a<0 == b<0) && (t<0 != a<0)`	Signed overflow

The leaq instruction does not alter any condition codes, since it is intended to be used in address computations.

Otherwise, all of the listed Arithmetic and Logical instructions cause the condition codes to be set.

For the logical operations, such as xor, CF = 0 and OF = 0. For the shift operations, CF is set to the last bit shifted out, while OF = 0.

The inc and dec instructions set OF and ZF, but they leave CF unchanged.

CMP and TEST instructions

CMP S, D set flags based on D - S - Compare
- cmpb, cmpw, cmpl, cmpq
TEST S, D set flags based on S & D - Test
- testb, testw, testl, testq

Both cmp and test instructions do not store the result of the operation, but they only set the condition codes based on the result.

The cmp operands is listed in reverse order, so if we want to compare a and b, we should write cmp b, a, which will set the condition codes based on a - b.

For TEST, typically the same operand is repeated (e.g., testq %rax,%rax to see whether %rax is negative, zero, or positive). Or one of the operands is a mask indicating which bits should be tested.

Accessing the Condition Codes

There are three common ways of using the condition codes:

Set byte on condition instructions
Conditional jump instructions
Conditional move instructions

Set byte on condition instructions (`SET` instructions)

A set instruction has either one of the low-order single-byte register elements or a single-byte memory location as its destination.

To generate a 32-bit or 64-bit result, we must also use a movzb instruction to zero-extend the result to the desired size.

SET D $\rightarrow$ D ← 1 if condition is true, else D ← 0 - Set byte on condition

Instruction	Synonym	Effect	Set condition
`sete`	`setz`	`D ← ZF`	Equal / zero
`setne`	`setnz`	`D ← ~ZF`	Not equal / not zero
`sets`		`D ← SF`	Negative
`setns`		`D ← ~SF`	Not negative
`setg`	`setnle`	`D ← ~(SF ^ OF) & ~ZF`	Greater
`setge`	`setnl`	`D ← ~(SF ^ OF)`	Greater or equal
`setl`	`setnge`	`D ← SF ^ OF`	Less
`setle`	`setng`	`D ← (SF ^ OF) \| ZF`	Less or equal
`seta`	`setnbe`	`D ← ~CF & ~ZF`	Above
`setae`	`setnb`	`D ← ~CF`	Above or equal

Example: int comp(data_t a,data_t b) a in %rdi, b in %rsi

  
comp:
    cmpq %rsi, %rdi     # Compare a with b by computing a - b
    setl %al            # %al = 1 if a < b, else 0
    movzbl %al, %eax    # Zero-extend the 8-bit result to 32 bits
    ret                 # Return the 0/1 result

There are multiple possible names for the same instruction. Compiler and diassembler make arbitrary choice of which names to use.

Conditional jump instructions (`J` instructions)

Instruction	Synonym	Jump condition	Description
`jmp Label`		`1`	Direct jump
`jmp *Operand`		`1`	Indirect jump
`je Label`	`jz`	`ZF`	Equal / zero
`jne Label`	`jnz`	`~ZF`	Not equal / not zero
`js Label`		`SF`	Negative
`jns Label`		`~SF`	Nonnegative
`jg Label`	`jnle`	`~(SF ^ OF) & ~ZF`	Greater (signed `>`)
`jge Label`	`jnl`	`~(SF ^ OF)`	Greater or equal (signed `>=`)
`jl Label`	`jnge`	`SF ^ OF`	Less (signed `<`)
`jle Label`	`jng`	`(SF ^ OF) \| ZF`	Less or equal (signed `<=`)
`ja Label`	`jnbe`	`~CF & ~ZF`	Above (unsigned `>`)
`jae Label`	`jnb`	`~CF`	Above or equal (unsigned `>=`)
`jb Label`	`jnae`	`CF`	Below (unsigned `<`)
`jbe Label`	`jna`	`CF \| ZF`	Below or equal (unsigned `<=`)

Direct jump: where jump target is specified as a label in the code. The assembler will compute the offset from the jump instruction to the target label and encode it in the instruction.
Indirect jump: where jump target is specified as a register or memory location. The processor will read the target address from the specified register or memory location at runtime and jump to that address.

Example:

  
jmp *%rax    # Jump to the address stored in %rax
jmp *(%rax)   # Jump to the address stored at the memory location pointed to by %rax

Conditional move instructions (`CMOV` instructions)

Similar to conditional jump instructions, but instead of jumping to a different location, they conditionally move a value from the source operand to the destination register based on the condition codes.

Destination operand must be a register, while source operand can be a register or memory location.

The source and destination values can only be 16, 32, or 64 bits, but not 8 bits. And the operand length can be inferred from the register names, so there is only one form of cmov instruction for each condition, without the need for suffixes like cmovzb or cmovl.

cmov is not always faster than branching, because it may compute both expressions and discard one, which is wasteful if either expression is expensive.
Compilers like GCC are conservative: they usually use cmov only when both expressions are very cheap, since they cannot reliably know whether the branch will be predictable at runtime.

Switch statements

When a switch statement has multiple cases with values that are close together, the compiler optimizes it by generating a Jump Table rather than a long chain of if-else branches. This reduces the time complexity of the branch from $O(N)$ to $O(1)$.

Under the hood, this relies on a GCC compiler extension called Labels as Values. You can get the memory address of a code label using &&, and jump to it by dereferencing it with goto *.

  
// 1. Array of void pointers (the jump table)
static void *jt[7] = {
    &&loc_A, &&loc_def, &&loc_B, 
    &&loc_C, &&loc_D, &&loc_def, &&loc_D
};

// 2. The computed goto (Indirect Jump)
goto *jt[index]; 

The Assembly Level (x86-64)

The actual memory traversal is handled beautifully in a few lines of assembly using the Offset(Base, Index, Scale) addressing mode.

  
# Assume %rsi holds our computed index
cmpq    $6, %rsi          # Compare index to max table size (6)
ja      .loc_def          # 'ja' (Jump Above) catches > 6 AND negative numbers!
jmp     *.L4(,%rsi,8)     # Indirect jump to Base(.L4) + (Index * 8 bytes)

The Scale Factor is Always 8: In x86-64, memory addresses (pointers) are 64 bits. 64 bits = 8 bytes. The CPU steps through the jump table in 8-byte increments to find the next instruction pointer.
The * is Mandatory: jmp *.L4(...) is an indirect jump. Without the *, the CPU tries to execute the jump table itself as code. With the *, it reads the address inside the table and jumps there.
The Unsigned Bounds Trick: The bounds check uses ja (unsigned greater than). If an index is negative, its two’s complement binary representation looks like a massive positive number when evaluated as unsigned. This single instruction cleverly catches both overflow and underflow, routing both to the default case.
Index Biasing: If your switch cases start at 100 (e.g., case 100:, case 101:), the compiler will inject a subq $100, %reg before the table lookup to shift the index back to 0.

assembly

x86-64 assembly

This post is licensed under CC BY 4.0 by the author.