Symbolic Execution with Angr - part 2

Table of Contents

Overcoming path explosion, injecting bitvectors into symbolic registers using angr

Introduction #

In the previous blog, we looked into what Symbolic Execution is and how we can utilize the Angr framework in finding execution paths to reach desirable states in programs. In this blog, we will look at the limitations that Symbolic Execution imposes and how we can use Angr to efficiently navigate through these limitations. We shall also introduce symbolic registers and how to inject symbols directly into registers.

Path Explosion #

In our previous example, we had a simple program with only 2 possible execution paths, one printing out “good job” while the other “try again”.

In a more practical case with a complex program with very many states, execution paths grow exponentially. In the case of unbounded loops in the program, we could even have infinite execution paths making it unfeasible to analyze all these paths. This is the problem of path explosion

Let us see how we can overcome path explosion in the second binary example using Angr

02_angr_find_condition #

This is the 3rd binary file in the exercise

➜  02_angr_find_condition git:(master) ✗ ls
02_angr_find_condition.c.jinja  generate.py  __init__.py  scaffold02.py
➜  02_angr_find_condition git:(master) ✗

Run the binary to see its functionality.

➜  dist git:(master) ✗ ./02_angr_find_condition
Enter the password: asdasd
Try again.

It is similar to the previous binary that we looked at. Let us disassemble the main function of the binary in Cutter to get more information.

Well, it’s a bloody mess.

There are tonnes of branches leading to several ‘Try again’ and ‘Good job’ states.

It would be tedious noting down the addresses of all Good job states. Instead, we can tell Angr to avoid or consider a state based on the output that the state gives.

There is a scaffold.py file in each binary file’s directory that contains skeleton code that will guide you in your solution. We will slightly modify the skeleton code to come up with our solution.

Solution #

Import necessary libraries & set up an Angr project.

import angr
import sys

def main():
    path_to_binary = '/opt/angr_ctf/dist/02_angr_find_condition'
    project = angr.Project(path_to_binary)
    print("[+] Loaded the binary to angr")
    initial_state = project.factory.entry_state()
    simulation = project.factory.simgr(initial_state)

Building out the logic

    def successful(state): #1
        stdout_output = state.posix.dumps(sys.stdout.fileno()) #2
        if b'Good Job.' in stdout_output: #3
            return True

    def abort(state):
        stdout_output = state.posix.dumps(sys.stdout.fileno())
        if b'Try again.' in stdout_output:
            return False

The 2 functions defined - successful and abort, carry out our program’s logic. They simply check if you have found the state that you are looking for.

We begin by defining our function which is going to take in a state#1. We then take what is printed to stdout and assign it to the stdout_output variable #2. The output of state.posix.dumps is a python bytes object, so we use b’ ‘ to capture the string ‘Good job’ from the stdout_output variable and return True if found #3.

We do the same for the abort function where we check for the ‘Try again’ string.

Exploring code paths


simulation.explore(find=successful, avoid=abort)

We can then tell Angr what states to explore and what to avoid exploring. simulation.explore has 2 arguments, find and avoid. They take in an address or a list of addresses. Our program is going to start exploring based on those functions that accept a state and return whether it matches or not

House-keeping

    if simulation.found:
        solution_state = simulation.found[0]
        solution = solution_state.posix.dumps(sys.stdin.fileno())
        print("[+] Solution found! : {}".format(solution.decode('utf-8')))
    else:
        raise Exception("Could not find the solution")

if __name__ == "__main__":
    main()

We finish up by printing the solution once the simulation manager finds the desired state.

Our final script:


import angr
import sys

def main():
    path_to_binary = '/opt/angr_ctf/dist/02_angr_find_condition'
    project = angr.Project(path_to_binary)
    print("[+] Loaded the binary to angr")
    initial_state = project.factory.entry_state()
    simulation = project.factory.simgr(initial_state)

    # define a function that checks if you have found the state you are looking for

    def successful(state):
        # dump what the binary has printed out into a string
        stdout_output = state.posix.dumps(sys.stdout.fileno())
        if b'Good Job.' in stdout_output:
            return True

    def abort(state):
        stdout_output = state.posix.dumps(sys.stdout.fileno())
        if b'Try again.' in stdout_output:
            return False

    simulation.explore(find=successful, avoid=abort)

    if simulation.found:
        solution_state = simulation.found[0]
        solution = solution_state.posix.dumps(sys.stdin.fileno())
        print("[+] Success! Solution is: {}".format(solution.decode('utf-8')))

    else:
        raise Exception("Could not find the solution")


if __name__ == "__main__":
    main()

Run the code:

We get our solution :)

03_angr_symbolic_registers #

Let’s crank up the difficulty and see more Angr functionality! We proceed to disassemble our binary.

Various functions are of interest. We can begin by inspecting the main() function. Cutter gives us a nice decompiled output that can help us understand better what our program does under the hood.

Program analysis #

First, we can let us get a general overview of what happens in the main function to understand the functionality of the program

There are 3 variables that get initialized in the main function. Technically 4, but we ignore int32_t var_4h as it is used for other procedures out of scope in this tutorial.

Immediately after initializing, there is a call to get_user_input() function which we will have a look at shortly. Our input is stored in the 3 variables which are placed in registers eax, ebx, edx and the complex_function functions are used to perform some operations on these variables.

Let us have a look at the get_user_input() function

In the scanf function, we can see these 3 badboys - “%x %x %x”, which means that we are taking 3 inputs as hexadecimal numbers. These are the values moved to the registers we noted earlier.

So how are we going to approach our exploit? Let us walk through step by step

Import necessary libraries & set up an Angr project

import angr
import sys
import claripy

def main():
    path_to_binary = '/opt/angr_ctf/dist/02_angr_find_condition'
    project = angr.Project(path_to_binary)
    print("[+] Loaded the binary to angr")

We have an additional library claripy different from before. We will see its usage later.

Initialize the starting address.

In the previous examples, we had Angr start at the beginning of the program. But due to Angr’s functionality, format strings are not handled well and we need to skip the call to scanf that specified the %x format strings to pick our hex input.

Right after the get_user_input(), we pick the address of the next instruction

which is 0x08048980

start_address = 0x08048980
initial_state = project.factory.blank_state(addr=start_address)

Note that we used blank_state() instead of entry_state(). This effectively creates a new address that will be the starting point for Angr

Symbol injection into registers.

Remember the 3 registers that we noted earlier (eax, ebx, edx)? We are going to replace the value inside the register (which we don’t know) with our symbol, by injecting into it. Before doing so, let us look at a key concept: Bitvectors - simply put, it is an array of bits.

Our registers are 32 bits in length, so our variables are going to be just as much long, represented as bitvectors.

This will look as follows:

    size = 32
    pass1 = claripy.BVS('pass1', size)
    pass2 = claripy.BVS('pass2', size)
    pass3 = claripy.BVS('pass3', size)

Note that we now make use of the claripy library to create our bitvectors which uses the BVS method(bitvector symbol).

Let us now take our symbols and inject them into the registers. We are going to set the registers based on the starting address initial_state, which essentially marks the registers.

    initial_state.regs.eax = pass1
    initial_state.regs.ebx = pass2
    initial_state.regs.edx = pass3

    simulation = project.factory.simgr(initial_state)

House-keeping

Our routine functions for finding the states we need and should avoid.

    def successful(state):
        stdout_output = state.posix.dumps(sys.stdout.fileno())
        if b'Good Job.' in stdout_output:
            return True;

    def abort(state):
        stdout_output = state.posix.dumps(sys.stdout.fileno())
        if b'Try again.' in stdout_output:
            return True

    simulation.explore(find=is_successful, avoid=should_abort)

Prepare the output

There is a possibility of getting multiple solutions. solution_state.solver.eval ensures that we pick one solution for each symbol and also, formats it into a hexadecimal value

   if simulation.found:
        solution_state = simulation.found[0]
        pass1 = format(solution_state.solver.eval(pass1), 'x')
        pass2 = format(solution_state.solver.eval(pass2), 'x')
        pass3 = format(solution_state.solver.eval(pass3), 'x')

        solution = pass1 + " " + pass2 + " " + pass3 + " "
        print("[+] Success! Solution is: {}".format(solution))

    else:
        raise Exception("Could not find the solution")

if __name__ == "__main__":
    main()

Solution #

Our final script

import angr
import sys
import claripy

def main():
    path_to_binary = '/home/saudi/Desktop/CTF/angrchalls/angr_ctf/dist/03_angr_symbolic_registers'
    project = angr.Project(path_to_binary)
    print("[+] Loaded the binary to angr")
    base_address = 0x8048980
    initial_state = project.factory.blank_state(addr=base_address)

    size = 32
    pass1 = claripy.BVS('pass1', size)
    pass2 = claripy.BVS('pass2', size)
    pass3 = claripy.BVS('pass3', size)


    initial_state.regs.eax = pass1
    initial_state.regs.ebx = pass2
    initial_state.regs.edx = pass3

    simulation = project.factory.simgr(initial_state)

    def successful(state):
        stdout_output = state.posix.dumps(sys.stdout.fileno())
        if b'Good Job.' in stdout_output:
            return True;

    def abort(state):
        stdout_output = state.posix.dumps(sys.stdout.fileno())
        if b'Try again.' in stdout_output:
            return True

    simulation.explore(find=successful, avoid=abort)
    if simulation.found:
        solution_state = simulation.found[0]
        pass1 = format(solution_state.solver.eval(pass1), 'x')
        pass2 = format(solution_state.solver.eval(pass2), 'x')
        pass3 = format(solution_state.solver.eval(pass3), 'x')

        solution = pass1 + " " + pass2 + " " + pass3 + " "
        print("[+] Success! Solution is: {}".format(solution))

    else:
        raise Exception("Could not find the solution")

if __name__ == "__main__":
    main()

Our solution :)

We come to the end of the tutorial having covered some more advanced concepts. See you in the next where I take you through hell :)