Running arm64 code on your Intel Mac 🖥 using Unicorn emulator

Unicorn is a lightweight multi-platform, multi-architecture CPU emulator framework™ - official website. How is it useful? I’ve used it to trace and analyze heavily obfuscated and deeply nested code parts in iOS arm64 binaries. So it can be a very nice tool to help with some dynamic code analysis. You can run the code compiled for architecture that differs from your host computer and instantly see the results.

Demo app

Here is a very basic app I’ve made for this demo. As you can see, it asks the user for a key and compares it with a pre-defined XOR-encrypted key. If they match, we have a “Success” message printed or a “Wrong key” message otherwise.

mbp:~ ./demo
Enter key:
AAAAAAAAAA
Wrong key.

The source code:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define KEY_LEN 11

const char enc_key[] = { 0x32, 0x24, 0x22, 0x33, 0x24, 0x35, 0x1e, 0x2a, 0x24, 0x38, 0x41 }; // "secret_key" xor 0x41

int check_key(char *key) {
    char dec_key[KEY_LEN];
    for (int i=0; i<KEY_LEN; i++) {
        dec_key[i] = enc_key[i] ^ 0x41;
    }
    return strcmp(dec_key, key);
}

int main(int argc, char* argv[]) {
    printf("Enter key:\n");
    char key[KEY_LEN];
    scanf("%10s", key);
    if (check_key(key) == 0) {
        printf("Success!\n");
    } else {
        printf("Wrong key.\n");
    }
    return 0;
}

To showcase the power of emulation, I will compile it as an arm64 binary using iOS SDK. My host machine is x86_64 Intel Mac. Xcode is needed for compilation. (In reality, the target platform such as iOS doesn’t matter much because we are emulating CPU and not the whole platform with a binary loader, dynamic linker, etc. But theoretically, calling convention may differ from platform to platform in generated assembly code.)

mbp:~ clang demo.c -o demo -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk -fno-stack-protector

I’ve added -fno-stack-protector option, which disables stack canaries, just to make this demo a bit easier.

If everything is done right, the result will look like this, fully functional iOS arm64 binary:

mbp:~ file demo
demo: Mach-O 64-bit executable arm64

Some assembly

Here is the disassembly of the check_key function (as seen by objdump)

mbp:~ objdump --disassemble-symbols=_check_key demo

0000000100007e78 <_check_key>:
100007e78: sub  sp, sp, #48
100007e7c: stp  x29, x30, [sp, #32]
100007e80: add  x29, sp, #32
100007e84: stur x0, [x29, #-8]
100007e88: str  wzr, [sp, #8]
100007e8c: ldr  w8, [sp, #8]
100007e90: subs w8, w8, #11
100007e94: b.ge 0x100007ed0 <_check_key+0x58>
100007e98: ldrsw    x9, [sp, #8]
100007e9c: adrp x8, 0x100007000 <_check_key+0x24>
100007ea0: add  x8, x8, #3972
100007ea4: ldrsb    w8, [x8, x9]
100007ea8: mov  w9, #65
100007eac: eor  w8, w8, w9
100007eb0: ldrsw    x10, [sp, #8]
100007eb4: add  x9, sp, #13
100007eb8: add  x9, x9, x10
100007ebc: strb w8, [x9]
100007ec0: ldr  w8, [sp, #8]
100007ec4: add  w8, w8, #1
100007ec8: str  w8, [sp, #8]
100007ecc: b    0x100007e8c <_check_key+0x14>
100007ed0: ldur x1, [x29, #-8]
100007ed4: add  x0, sp, #13
100007ed8: bl   0x100007f78 <_strcmp+0x100007f78>
100007edc: ldp  x29, x30, [sp, #32]
100007ee0: add  sp, sp, #48
100007ee4: ret

We will try to emulate this piece of code instead of doing static analysis to get the value of enc_key - our secret key that user input is compared against.

If I were using a debugger, I would typically try to put a breakpoint at address 0x100007ed8 - a strcmp function call that actually performs the strings comparison and analyze the registers. But here, we are analyzing binary of different target architecture, and we can’t run or debug it directly.

We know strcmp takes two arguments. According to arm64 calling convetion first 8 arguments are passed through the registers x0-x7.

As we can see right before the strcmp call, we have ldur x1, [x29, -8] instruction which loads a value from memory that x29 register points to decremented by 8 into x1 register and add x0, sp, #13 which adds 13 to the sp (stack pointer) value and stores it into x0. According to the calling convention, those should be the addresses of our dec_key and key variables from the source code above.

Let’s run this piece of the code in an emulator and dump contents of x0 and x1 right before strcmp call. We will not be loading the C runtime library into our emulator anyway, so strcmp will not point to the real function and so will not work. Also, it will require doing some function stubs re-binding, which is out of the scope of this post.

Emulator

Create a new virtual environment, install all the dependencies using pip:

mbp:~ python3 -m venv .venv/ && source .venv/bin/activate
(.venv) mbp:~ pip install unicorn capstone hexdump

Capstone is a multi-architecture disassembly framework. I will use it to disassemble and log instructions on the fly.

Here is a fully working emulator code. Let’s review it part by part.

#!/usr/bin/env python3

from hexdump import hexdump
from unicorn import *
from unicorn.arm64_const import *
from capstone import *

# 1
BASE_ADDR = 0x1_0000_0000 # base address
BASE_SIZE = 100 * 1024 # enough memory to fit the binary image

HEAP_ADDR = 0x5_0000_0000 # arbitrary address
HEAP_SIZE = 0x21_000 # some default heap size

STACK_ADDR = 0x9_0000_0000 # arbitrary address
STACK_SIZE = 0x21_000 # some default stack size
STACK_TOP = STACK_ADDR + STACK_SIZE # stack grows downwards

# 6
def hook_code(uc, address, size, user_data):
    code = BINARY[address-BASE_ADDR:address-BASE_ADDR+size]
    for i in md.disasm(code, address):
        print("0x%x:\t%s\t%s" % (i.address, i.mnemonic, i.op_str))
        # stop emulation when function returns
        if i.mnemonic == "ret":
            uc.emu_stop()
    return True


try:
    # 2
    print("[+] Init")
    md = Cs(CS_ARCH_ARM64, UC_MODE_ARM)
    mu = Uc(UC_ARCH_ARM64, UC_MODE_ARM)

    # 3
    print("[+] Create memory segments")
    mu.mem_map(BASE_ADDR, BASE_SIZE)
    mu.mem_map(STACK_ADDR, STACK_SIZE)
    mu.mem_map(HEAP_ADDR, HEAP_SIZE)

    # 4
    print("[+] Load and map binary")
    BINARY = open("./demo", "rb").read()
    mu.mem_write(BASE_ADDR, BINARY)

    # 5
    print("[+] Add hooks")
    mu.hook_add(UC_HOOK_CODE, hook_code)

    # 7
    print("[+] Setup stack pointer")
    mu.reg_write(UC_ARM64_REG_SP, STACK_TOP)

    # 8
    # write our input to heap
    mu.mem_write(HEAP_ADDR, b"A" * 10)
    mu.reg_write(UC_ARM64_REG_X0, HEAP_ADDR)

    # 9
    print("[+] Start emulation")
    start_addr = 0x1_0000_7e78 # check_key
    end_addr = 0x1_0000_7ed8 # strcmp
    mu.emu_start(start_addr, end_addr)

    # 10
    # print x0 and x1 values
    print("[+] x0: 0x%x" % (mu.reg_read(UC_ARM64_REG_X0)))
    hexdump(mu.mem_read(mu.reg_read(UC_ARM64_REG_X0), 16))

    print("[+] x1: 0x%x" % (mu.reg_read(UC_ARM64_REG_X1)))
    hexdump(mu.mem_read(mu.reg_read(UC_ARM64_REG_X1), 16))  

    print("[+] Done")
except UcError as err:
    print("[E] %s" % err)

Let’s break this down.

Here, I set up addresses of basic memory segments we will use in emulation. BASE_ADDR - address where our binary will be loaded at. BASE_SIZE - should be enough to hold the entire binary. HEAP_ADDR and STACK_ADDR - heap and stack addresses with some arbitrary size of 0x21000. If we ever exhaust heap or stack memory during emulation (and probably crash), we can always increase these values and restart emulation. Unicorn is a CPU emulator. It will not increase our stack or heap dynamically. That’s the job of the OS.
Initialize Unicorn and Capstone engines with *_ARCH_ARM64 architecture and UC_MODE_ARM mode.
Create our three memory segments: main binary, heap, and stack with corresponding sizes.
Read our compiled arm64 demo binary and write it into mapped memory at BASE_ADDR.
Setup hook. Here I’m using UC_HOOK_CODE to hook each instruction, disassemble and print in hook_code function. There are multiple hooks available: memory read/write hooks, CPU interruption hook (I’ve used this one to trace syscalls), etc.
Our hook function, which disassembles code using Capstone, also it checks it we reached a ret instruction. At that point we can probably stop emulation, which can be helpful if we are interested in the emulation of a single function.
Setup an initial value of a stack pointer, which should point to the top of the stack as the stack grows downwards.
Our check_key function takes a single argument which is passed thought x0 register. Here we simulate user input by writing AAAAAAAAAA (10 * A) into the heap and placing pointer to the start of the heap into x0
Start emulation. 0x100007e78 is the address where check_key starts and where we want to start the emulation. 0x100007ed8 is the address of the strcmp - address where we want our emulation to end.
After emulation ends, we want to inspect addresses at x0 and x1 and dump the memory at corresponding addresses.

Output

Here we can see a successful run of the emulator. And our secret_key value dumped into a console!

(.venv) mbp:~ ./demo_emu.py
[+] Init
[+] Map memory
[+] Load and map binary
[+] Add hooks
[+] Setup stack pointer
[+] Starting at: 0x100007e78
[+] x0: 0x900020fdd
00000000: 73 65 63 72 65 74 5F 6B  65 79 00 00 00 00 00 05  secret_key......
[+] x1: 0x500000000
00000000: 41 41 41 41 41 41 41 41  41 41 00 00 00 00 00 00  AAAAAAAAAA......
[+] Done

Demo app

Some assembly

Emulator

Output

Links