Obsess over every detail. Ask why it works. Ask why it isn’t built another way.

REPEATABLE INSTRUCTIONS!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
push    rdi
sub     rsp, 0x90

lea     rax, [rsp + 0x10]
mov     rdi, rax

xor     eax, eax
mov     ecx, 0x80
rep     stosb      ; new instruction 

mov     dword [rsp], 0x05EAF00D

mov     eax, 2
imul    rax, rax, 1

movzx   ecx, word [rsp]

mov     word [rsp + rax + 0x10], cx

mov     eax, 2
imul    rax, rax, 1

movzx   eax, word [rsp + rax + 0x10]

add     rsp, 0x90
pop     rdi
ret

REP STOS - THE MEMORY PAINT BUCKET TOOL

REP STOS is a Repeat STore String. STOS is one of a number of instructions that can have the “rep” prefix added to it, which repeat a single instruction multiple times. All rep operations use the rcx/ecx/cx register as a counter to determine how many times to loop through the instruction. Each time it executes, it decrements the counter by 1. Once the counter == 0, it continues to the next instruction.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
short main() 
{
    short a;
    int b[6];
    long long c;

    a = 0xbabe;
    c = 0xba1b0ab1edb100d;
    b[1] = a;
    b[4] = b[1] + c;

    return b[4];
}

What does it do?

Either stores 1, 2, 4, or 8 bytes at a time into memory pointed to by RDI:

After each store, RDI automatically increments to point to the next position!

Syntactically:

1
2
3
4
mov rdi, [address]     ; RDI = where to start writing
mov rax, [value]       ; RAX/EAX/AX/AL = what value to write
mov rcx, [count]       ; RCX/ECX/CX = how many times to repeat
rep stosb              ; Fill memory! (byte-sized)

Real example - Let’s break down this code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
push    rdi
sub     rsp, 0x90

; rax = &var_88
lea     rax, [rsp + 0x10]
mov     rdi, rax

; memset(var_88, 0, 0x80)
xor     eax, eax
mov     ecx, 0x80
rep     stosb

; *(dword*)rsp = 0x05EAF00D
mov     dword [rsp], 0x05EAF00D

; rax = 2
mov     eax, 2
imul    rax, rax, 1

; ecx = *(word*)rsp
movzx   ecx, word [rsp]

; *(word*)(rsp + rax + 0x10) = cx
mov     word [rsp + rax + 0x10], cx

; rax = 2
mov     eax, 2
imul    rax, rax, 1

; eax = *(word*)(rsp + rax + 0x10)
movzx   eax, word [rsp + rax + 0x10]
add     rsp, 0x90
pop     rdi
ret

Step 1: Setup the stack

1
2
push    rdi              ; Save RDI (we'll need it later)
sub     rsp, 0x90        ; Allocate 144 bytes on the stack

Step 2: Get address and prepare to fill memory

1
2
lea     rax, [rsp + 0x10]  ; RAX = address of buffer (starting at rsp+0x10)
mov     rdi, rax           ; RDI = destination address for STOS

Step 3: THE REP STOS MAGIC HAPPENS HERE!

1
2
3
xor     eax, eax          ; EAX = 0 (this is what we'll fill with)
mov     ecx, 0x80         ; ECX = 128 (how many bytes to fill)
rep     stosb             ; Fill 128 bytes with 0!

What rep stosb does:

  1. Store AL (which is 0) at [RDI]
  2. RDI++ (move to next byte)
  3. ECX– (one less byte to fill)
  4. Repeat until ECX = 0

So this is basically doing: memset(buffer, 0, 128) in one instruction!

Step 4: Store a value at the beginning of the stack

1
mov     dword [rsp], 0x05EAF00D  ; Put 0x05EAF00D at rsp

Memory now looks like:

[rsp]      = 0x05EAF00D  (4 bytes)
[rsp+0x10] = 0x00000000  (128 bytes of zeros from rep stosb)

Step 5: Copy a word (2 bytes) from rsp to buffer

1
2
3
4
mov     eax, 2                        ; EAX = 2 (index)
imul    rax, rax, 1                   ; RAX = 2 * 1 = 2
movzx   ecx, word [rsp]               ; ECX = lower 2 bytes of 0x05EAF00D = 0xF00D
mov     word [rsp + rax + 0x10], cx   ; buffer[2] = 0xF00D

So now our buffer that was all zeros has 0xF00D stored at position 2:

buffer[0] = 0x00
buffer[1] = 0x00
buffer[2] = 0x0D  (lower byte of 0xF00D)
buffer[3] = 0xF0  (upper byte of 0xF00D)
buffer[4] = 0x00
...

Step 6: Read that value back

1
2
3
mov     eax, 2                        ; EAX = 2
imul    rax, rax, 1                   ; RAX = 2
movzx   eax, word [rsp + rax + 0x10] ; EAX = buffer[2] = 0xF00D (return value!)

Step 7: Cleanup

1
2
3
add     rsp, 0x90    ; Deallocate stack space
pop     rdi          ; Restore RDI
ret                  ; Return (EAX = 0xF00D)

Why is REP STOS fast?

Instead of writing a loop like:

1
2
3
for(int i = 0; i < 128; i++) {
    buffer[i] = 0;
}

REP STOS does it all in ONE instruction. The CPU handles the loop internally, which is way more efficient!

Common use cases:

Quick reference:

1
2
3
4
rep stosb    ; Fill RCX bytes with AL
rep stosw    ; Fill RCX words (2 bytes) with AX
rep stosd    ; Fill RCX dwords (4 bytes) with EAX
rep stosq    ; Fill RCX qwords (8 bytes) with RAX

Remember: RDI = destination, RAX/EAX/AX/AL = value to fill, RCX/ECX/CX = count!

REP MOVS - Repeat Move Data String to String

MOVS is one of a number of instructions that can also have the rep prefix added to it, which also repeats a single instruction multiple times. MOVS is its own instruction which can be called without the REP instruction prefix. Again, all rep operations use the cx register as a counter to determine how many times to loop through the instruction. Each time it executes, it decrements cx. Once cx == 0, it continues to the next instruction.

Unlike MOV, MOVS can move memory to memory but only between SI and DI, which is the source and destination.

Quick reference:

1
2
3
4
rep movsb    ; Copy RCX bytes from [RSI] to [RDI]
rep movsw    ; Copy RCX words (2 bytes) from [RSI] to [RDI]
rep movsd    ; Copy RCX dwords (4 bytes) from [RSI] to [RDI]
rep movsq    ; Copy RCX qwords (8 bytes) from [RSI] to [RDI]

Remember: RSI = source, RDI = destination, RCX/ECX/CX = count!

DF (Direction Flag)

The direction flag controls the direction of copies with a rep movs. This is a control flag, so based on the direction flag it could be incrementing RDI and RSI, or it could actually be decrementing, so it could kind of be copying backwards down towards lower addresses.

Why this matters for security: If an attacker can control DF and start it copying backwards when the programmer is expecting it to copy forward, that can lead to some memory corruption!

How to control DF:

1
2
cld    ; Clear Direction Flag (forward: increment RSI/RDI)
std    ; Set Direction Flag (backward: decrement RSI/RDI)

Always make sure you know which direction you’re going! Most of the time you want cld to copy forward.