Running arm64 code on your Intel Mac đź–Ą using Unicorn emulator
Unicorn is a lightweight multi-platform, multi-architecture CPU emulator framework™ - official website. How is it useful? I’ve used it to trace and analyze heavily obfuscated and deeply nested code parts in iOS arm64 binaries. So it can be a very nice tool to help with some dynamic code analysis. You can run the code compiled for architecture that differs from your host computer and instantly see the results.
Demo app
Here is a very basic app I’ve made for this demo. As you can see, it asks the user for a key and compares it with a pre-defined XOR-encrypted key. If they match, we have a “Success” message printed or a “Wrong key” message otherwise.
mbp:~ ./demo
Enter key:
AAAAAAAAAA
Wrong key.
The source code:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define KEY_LEN 11
const char enc_key[] = { 0x32, 0x24, 0x22, 0x33, 0x24, 0x35, 0x1e, 0x2a, 0x24, 0x38, 0x41 }; // "secret_key" xor 0x41
int check_key(char *key) {
char dec_key[KEY_LEN];
for (int i=0; i<KEY_LEN; i++) {
dec_key[i] = enc_key[i] ^ 0x41;
}
return strcmp(dec_key, key);
}
int main(int argc, char* argv[]) {
printf("Enter key:\n");
char key[KEY_LEN];
scanf("%10s", key);
if (check_key(key) == 0) {
printf("Success!\n");
} else {
printf("Wrong key.\n");
}
return 0;
}
To showcase the power of emulation, I will compile it as an arm64
binary using iOS SDK. My host machine is x86_64
Intel Mac. Xcode is needed for compilation. (In reality, the target platform such as iOS doesn’t matter much because we are emulating CPU and not the whole platform with a binary loader, dynamic linker, etc. But theoretically, calling convention may differ from platform to platform in generated assembly code.)
mbp:~ clang demo.c -o demo -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk -fno-stack-protector
I’ve added -fno-stack-protector
option, which disables stack canaries, just to make this demo a bit easier.
If everything is done right, the result will look like this, fully functional iOS arm64 binary:
mbp:~ file demo
demo: Mach-O 64-bit executable arm64
Some assembly
Here is the disassembly of the check_key
function (as seen by objdump
)
mbp:~ objdump --disassemble-symbols=_check_key demo
0000000100007e78 <_check_key>:
100007e78: sub sp, sp, #48
100007e7c: stp x29, x30, [sp, #32]
100007e80: add x29, sp, #32
100007e84: stur x0, [x29, #-8]
100007e88: str wzr, [sp, #8]
100007e8c: ldr w8, [sp, #8]
100007e90: subs w8, w8, #11
100007e94: b.ge 0x100007ed0 <_check_key+0x58>
100007e98: ldrsw x9, [sp, #8]
100007e9c: adrp x8, 0x100007000 <_check_key+0x24>
100007ea0: add x8, x8, #3972
100007ea4: ldrsb w8, [x8, x9]
100007ea8: mov w9, #65
100007eac: eor w8, w8, w9
100007eb0: ldrsw x10, [sp, #8]
100007eb4: add x9, sp, #13
100007eb8: add x9, x9, x10
100007ebc: strb w8, [x9]
100007ec0: ldr w8, [sp, #8]
100007ec4: add w8, w8, #1
100007ec8: str w8, [sp, #8]
100007ecc: b 0x100007e8c <_check_key+0x14>
100007ed0: ldur x1, [x29, #-8]
100007ed4: add x0, sp, #13
100007ed8: bl 0x100007f78 <_strcmp+0x100007f78>
100007edc: ldp x29, x30, [sp, #32]
100007ee0: add sp, sp, #48
100007ee4: ret
We will try to emulate this piece of code instead of doing static analysis to get the value of enc_key
- our secret key that user input is compared against.
If I were using a debugger, I would typically try to put a breakpoint at address 0x100007ed8
- a strcmp
function call that actually performs the strings comparison and analyze the registers. But here, we are analyzing binary of different target architecture, and we can’t run or debug it directly.
We know strcmp
takes two arguments. According to arm64 calling convetion first 8 arguments are passed through the registers x0
-x7
.
As we can see right before the strcmp
call, we have ldur x1, [x29, -8]
instruction which loads a value from memory that x29
register points to decremented by 8
into x1
register and add x0, sp, #13
which adds 13
to the sp
(stack pointer) value and stores it into x0
. According to the calling convention, those should be the addresses of our dec_key
and key
variables from the source code above.
Let’s run this piece of the code in an emulator and dump contents of x0
and x1
right before strcmp
call. We will not be loading the C runtime library into our emulator anyway, so strcmp
will not point to the real function and so will not work. Also, it will require doing some function stubs re-binding, which is out of the scope of this post.
Emulator
Create a new virtual environment, install all the dependencies using pip
:
mbp:~ python3 -m venv .venv/ && source .venv/bin/activate
(.venv) mbp:~ pip install unicorn capstone hexdump
Capstone is a multi-architecture disassembly framework. I will use it to disassemble and log instructions on the fly.
Here is a fully working emulator code. Let’s review it part by part.
#!/usr/bin/env python3
from hexdump import hexdump
from unicorn import *
from unicorn.arm64_const import *
from capstone import *
# 1
BASE_ADDR = 0x1_0000_0000 # base address
BASE_SIZE = 100 * 1024 # enough memory to fit the binary image
HEAP_ADDR = 0x5_0000_0000 # arbitrary address
HEAP_SIZE = 0x21_000 # some default heap size
STACK_ADDR = 0x9_0000_0000 # arbitrary address
STACK_SIZE = 0x21_000 # some default stack size
STACK_TOP = STACK_ADDR + STACK_SIZE # stack grows downwards
# 6
def hook_code(uc, address, size, user_data):
code = BINARY[address-BASE_ADDR:address-BASE_ADDR+size]
for i in md.disasm(code, address):
print("0x%x:\t%s\t%s" % (i.address, i.mnemonic, i.op_str))
# stop emulation when function returns
if i.mnemonic == "ret":
uc.emu_stop()
return True
try:
# 2
print("[+] Init")
md = Cs(CS_ARCH_ARM64, UC_MODE_ARM)
mu = Uc(UC_ARCH_ARM64, UC_MODE_ARM)
# 3
print("[+] Create memory segments")
mu.mem_map(BASE_ADDR, BASE_SIZE)
mu.mem_map(STACK_ADDR, STACK_SIZE)
mu.mem_map(HEAP_ADDR, HEAP_SIZE)
# 4
print("[+] Load and map binary")
BINARY = open("./demo", "rb").read()
mu.mem_write(BASE_ADDR, BINARY)
# 5
print("[+] Add hooks")
mu.hook_add(UC_HOOK_CODE, hook_code)
# 7
print("[+] Setup stack pointer")
mu.reg_write(UC_ARM64_REG_SP, STACK_TOP)
# 8
# write our input to heap
mu.mem_write(HEAP_ADDR, b"A" * 10)
mu.reg_write(UC_ARM64_REG_X0, HEAP_ADDR)
# 9
print("[+] Start emulation")
start_addr = 0x1_0000_7e78 # check_key
end_addr = 0x1_0000_7ed8 # strcmp
mu.emu_start(start_addr, end_addr)
# 10
# print x0 and x1 values
print("[+] x0: 0x%x" % (mu.reg_read(UC_ARM64_REG_X0)))
hexdump(mu.mem_read(mu.reg_read(UC_ARM64_REG_X0), 16))
print("[+] x1: 0x%x" % (mu.reg_read(UC_ARM64_REG_X1)))
hexdump(mu.mem_read(mu.reg_read(UC_ARM64_REG_X1), 16))
print("[+] Done")
except UcError as err:
print("[E] %s" % err)
Let’s break this down.
-
Here, I set up addresses of basic memory segments we will use in emulation.
BASE_ADDR
- address where our binary will be loaded at.BASE_SIZE
- should be enough to hold the entire binary.HEAP_ADDR
andSTACK_ADDR
- heap and stack addresses with some arbitrary size of0x21000
. If we ever exhaust heap or stack memory during emulation (and probably crash), we can always increase these values and restart emulation. Unicorn is a CPU emulator. It will not increase our stack or heap dynamically. That’s the job of the OS. -
Initialize Unicorn and Capstone engines with
*_ARCH_ARM64
architecture andUC_MODE_ARM
mode. -
Create our three memory segments: main binary, heap, and stack with corresponding sizes.
-
Read our compiled arm64
demo
binary and write it into mapped memory atBASE_ADDR
. -
Setup hook. Here I’m using
UC_HOOK_CODE
to hook each instruction, disassemble and print inhook_code
function. There are multiple hooks available: memory read/write hooks, CPU interruption hook (I’ve used this one to tracesyscalls
), etc. -
Our hook function, which disassembles code using Capstone, also it checks it we reached a
ret
instruction. At that point we can probably stop emulation, which can be helpful if we are interested in the emulation of a single function. -
Setup an initial value of a stack pointer, which should point to the top of the stack as the stack grows downwards.
-
Our
check_key
function takes a single argument which is passed thoughtx0
register. Here we simulate user input by writingAAAAAAAAAA
(10 *A
) into the heap and placing pointer to the start of the heap intox0
-
Start emulation.
0x100007e78
is the address wherecheck_key
starts and where we want to start the emulation.0x100007ed8
is the address of thestrcmp
- address where we want our emulation to end. -
After emulation ends, we want to inspect addresses at
x0
andx1
and dump the memory at corresponding addresses.
Output
Here we can see a successful run of the emulator. And our secret_key
value dumped into a console!
(.venv) mbp:~ ./demo_emu.py
[+] Init
[+] Map memory
[+] Load and map binary
[+] Add hooks
[+] Setup stack pointer
[+] Starting at: 0x100007e78
[+] x0: 0x900020fdd
00000000: 73 65 63 72 65 74 5F 6B 65 79 00 00 00 00 00 05 secret_key......
[+] x1: 0x500000000
00000000: 41 41 41 41 41 41 41 41 41 41 00 00 00 00 00 00 AAAAAAAAAA......
[+] Done