MalwareTech’s vm1 and ransomeware1 Challenges with Radare2
It’s been a few weeks since the last post, but it’s finally time to finish vm1 and ransomeware1, two of MalwareTech’s beginner reverse engineering challenges. Check out the first part of this series if you missed it.
If you’re completely new to Radare2 but have a background in reverse engineering, I suggest first going through my CMU Binary Bomb series, as I will gloss over many basic commands which that series covers.
As always, you should work with these files in an isolated environment designed for reverse engineering. While all of MalwareTech’s challenges appear clear of any malicious activity, it’s a good idea to work in an isolated environment to minimize your own risk.
vm1
vm1 introduces the concept of code obfuscation using a simple virtual machine. From MalwareTech’s site:
vm1.exe implements a simple 8-bit virtual machine (VM) to try and stop reverse engineers from retrieving the flag. The VM’s RAM contains the encrypted flag and some bytecode to decrypt it.
Download vm1 here.
The archive contains ram.bin
and vm1.exe
. Open vm1.exe
with r2, analyze functions, and enter Graph mode with VV
.
Much of this should be familiar if you’ve done the other challenges. Some space on the heap is allocated with HeapAlloc
, and it returns a pointer (0x40423c
)which memcpy
uses as a destination buffer.
Next the function calls fcn.004022e0
. Seek there and use Graph mode’s minigraph view to get a better look. If you’re not already in Graph mode, enter it with VV
, then cycle to minigraph view by pressing p
.
At a glance, this looks like a while loop with only one path that exits the loop. It’s worth noting that before the loop itself begins, var_1h
is set to 0
. Let’s analyze the loop itself.
The loop begins by setting eax
to 1, then checks if eax
equals 0, exiting if true. This is likely a While (1)
loop, which always loops.
Looking to the end of the function, you can see another function call to fcn.00402270
. If the return value is 0, the current function returns and exits the loop. If not, it loops.
We can’t ignore the main body of the loop anymore, so let’s finally focus on that disassembly.
Math and Arguments
This loop isn’t too complicated if you break it down into chunks. There are three similar chunks of code, each assigning values to variables, then pushing those variables onto the stack as function arguments before calling fcn.00402270
.
Combining a few steps, this block essentially sets var_10
to the value at 0x40423c + var_1h + 0xff
.
Next var_1h
is incremented by 1, and the value at var_ch
is set to 0x40423c + var_1h + 0xff
.
Once again, var_1h
is incremented by 1, and this time the value ad var_8h is set to the value at 0x40423c + var_1h + 0xff
.
Finally, var_1h
is incremented one more time before fcn.00402270
is called with the variables that were just set up.
fcn.00402270(var_10h, var_ch, var_8h)
This code essentially indexes 0xFF + counter
into the buffer that was copied with memcpy
. Now for the fun part, disassembling fcn.00402270
!
Virtual Machines
At a glance, the fcn.00402270
checks var_4h
to see if it’s equal to 1, 2, or 3. If it’s none of those numbers, the function exits with a return value of 0 (which causes the caller function to return).
If var_4h
equals 1, ecx
is set to 0x40423c + arg_ch
, then arg_10h
is copied into that address.
If the value of var_4h
is 2, the value at 0x40423c + arg_ch
is moved into the address 0x404240
.
Finally, if the value of var_4h
is 3, address 0x40423c + arg_ch
is xor’d with the value at 0x404240
. The result is then moved 0x40423c + arg_ch
. Simple… right?
Name Your Variables
Since this function is executing instructions as a virtual machine, it’s safe to assume the check to see the value of var_4h
determines what action the VM takes in that cycle. We will rename var_4h
to opcode
.
Let’s also rename arg_8h
, arg_ch
, and arg_10h
toarg1
, arg2
, and arg3
(our function's arguments) using afvn
. The syntax is:
afvn <NEWNAME> <OLDNAME>
Use afvd
to view a function’s variables.
The opcode comparisons we worked through earlier should now be slightly more legible. You’ll also notice some patterns. When arg2
is used, it’s first being added to 0x40423c
, then used as a pointer into memory. Instead of writing that out every time, I’ll just refer to 0x40423c + arg2
as *arg2
. There’s also a new memory address 0x404240
, which can have data moved into it, or used in the xor
. We’ll call that offset
.
Let’s strip down opcode
equals 1 operation:
if opcode == 1:
*arg2 = arg3
This sets the value at arg2
(0x40423c + arg2
) to arg3
.
if opcode == 2:
offset = *arg2
This sets offset
to the value pointed to by arg2
.
if opcode == 3:
*arg2 ^= *arg2 ^ offset
This sets the value pointed to by arg2
to the value at arg2
xor
’d with the value at offset
. Let’s use this knowledge to write a script which reads from
Final Script
First, we need to read from the ram.bin
file. We’ll read it, then create two arrays, one for the code segment at base address
+0xff
and one for the data which contains the flag.
with open("ram.bin", "rb") as f:
mem_dump = f.read()
code = list(mem_dump[0xff:])
data = list(mem_dump[:0xff])
Next we need to set up a loop, which will:
- Assign instructions based on the code section
- Increment through the code section
- Implement the opcode instructions outlined in pseudocode
index = 0
offset = 0
while True:
opcode = code[index]
arg2 = code[index+1]
arg3 = code[index+2] # To DO - Implement opcode stuff
index += 3
Here’s the final script with some fancy printing.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import time
import syswith open("ram.bin", "rb") as f:
mem_dump = f.read()
code = list(mem_dump[0xff:])
data = list(mem_dump[:0xff])index = 0
offset = 0
while True:
opcode = code[index]
arg2 = code[index+1]
arg3 = code[index+2]
valid_letters = "".join([chr(x) for x in data if chr(x).isprintable()])
sys.stdout.write(
"\rop: {:03} arg2: {:03} arg3: {:03} offset: {:03}\n{}".format(
opcode, arg2, arg3, offset, valid_letters[:32]))time.sleep(0.1)
if opcode == 1:
data[arg2] = arg3
elif opcode == 2:
offset = data[arg2]
elif opcode == 3:
data[arg2] ^= offset
else:
break
index += 3
ransomeware1
Download ransomeware1 here.
This challenge requires a bit of reversing, and a little thinking outside of the box. After extracting the archive, you’ll see the ransomeware1.exe_
binary, a flag.txt_encrypted
, and some images from the Sample Pictures
folder. Let’s start with the binary.
Starting from the entry point at 0x00401150
, the first function call is to section..text
. Seek to it for the defacto main
function. This function is a bit complex, so it may be easier to look through it in graph mode with VV
, and going chunk-by-chunk.
First snprintf
is called with a few arguments. This function composes a string based on a format string and additional arguments. Its prototype is:
int snprintf ( char * s, size_t n, const char * format, ... );
The first argument is a buffer which holds the final string. The second argument is the maximum number of bytes to copy. The third is a format string, and final is an argument which the format string modifies.
char * s = var_108h
size_t = 0x104
const char * = str.s_encrypted
arg = [hTemplateFile]
Note: r2 displays the format string without the %s
in graph mode, but you can view the actual value with ps @ str.s_encrypted
, and see it is in fact a format string.
The value of var_108
will be the value at hTemplateFile
, with _encrypted
appended to the end.
Next CreateFileA
is called twice and the return value is assigned to a variable. CreateFileA
’s prototype is:
HANDLE CreateFileA(
LPCSTR lpFileName,
DWORD dwDesiredAccess,
DWORD dwShareMode,
LPSECURITY_ATTRIBUTES lpSecurityAttributes,
DWORD dwCreationDisposition,
DWORD dwFlagsAndAttributes,
HANDLE hTemplateFile
);
The first call to CreateFileA
looks something like:
lpOverlapped = CreateFileA(hTemplateFile, 0x80000000, 0, 0, 3, 0x80, 0)
The second call to CreateFileA is very similar:
hObject = CreateFileA(var_108h, 0x40000000, 0, 0, 2, 0x80, 0)
The main difference is the first function uses the original hTemplateFile
value, while the second uses var_108h
which is the address holding the string generated with snprintf
. Both return a handle to a file, stored in lpOverlapped
and hObject
. It’s safe to assume these handles are used for reading in the original file, and writing an encrypted version.
The next part of the function consists of a few loops. From Graph mode (VV
) , press p
to cycle between views until you reach minigraph
view.
Cycle back to default view with p, and seek to 0x401071
.
This block sets up arguments, then calls ReadFile
.
BOOL ReadFile(
HANDLE hFile,
LPVOID lpBuffer,
DWORD nNumberOfBytesToRead,
LPDWORD lpNumberOfBytesRead,
LPOVERLAPPED lpOverlapped
);
Easy enough. The function call looks like:
ReadFile(lpOverlapped, lpNumberofBytesWritten, 0x1000, nNumberofBytesToWrite, 0
That’s… confusing. r2 automatically named some variables, but they are a bit misleading. Let’s fix that by renaming them. Remember, the syntax is:
afvn <NEWNAME> <OLDNAME>
First of all, we know that snprintf
put the value of hTemplateFile
with _encrypted
appended, into var_108h
. Rename that to encryptedFileName
. We knowlpOverlapped
is a handle to the original file, so we will rename it hOriginalFile
. hObject
is a handle to the file with _encrypted
appended to the end, lLet’s rename that hEncryptedFile
.
Next let’s fix lpNumberOfBytesWritten
and nNumberOfBytesToWrite
. When called with ReadFile
, the value of lpNumberOfBytesWritten
seems to be used as a buffer. Let’s rename that buffer
. nNumberOfBytesToWrite
holds the value of total bytes read. Let’s rename that bytesRead
.
There are only two other variables in this function which need naming, var_111ch
and arg_ch
. Looking to the next few blocks, var_111ch
is set to 0, used in some calculations, and then incremented. It’s safe to say this is the counter for a loop. Let’s rename that to be counter
, and worry about arg_ch
later since its value is currently unknown.
Now the first block’s call to ReadFile makes a little more sense. The function is being called with:
ReadFile(hOriginalFile, buffer, 0x1000, bytesRead, 0);
Back to the Loop
After the call to ReadFile
, the return value in eax
is compared to 1, which checks for a failed read attempt and exits the program. If successful, it sets counter
to 0, then begins a loop which runs until counter
is equal to bytesRead
. Until that comparison evaluates as true, a small loop is executed. This appears to be an encryption loop using xor
. After each iteration, counter
is incremented by 1
Breaking this down, eax
is set to the value of counter
, ecx
is set to 0x20, then eax
is divided by ecx
. This puts the result in eax
, and the remainder in edx
. Next eax
is set to the value of arg_ch
, suggesting the reason for the div
operation was just to put the remainder (modulus) into edx
.
Next the value of arg_ch + edx
is used to index into arg_ch
, and the result is placed into ecx
. Then edx
is set to the value of counter
, and used as an index, to set eax
to the value at ebp + edx - 0x1118
. Next eax
and ecx
are xor
’d together, and the value the lower byte of eax
(al
) is placed into the value at ebp + ecx - 0x1118
.
You might be asking yourself how 0x1118
relates to all this. If you look at the function’s prologue, you’ll see that ebp - 0x1118
is the same address as our buffer
variable, which contains the contents read from the ReadFile
function call.
Let’s try to make sense of all this by writing out the simplified assembly pseudocode of the xor
loop, now knowing 0x1118
is the buffer
address:
ecx = byte[arg_ch + counter % 0x20]
eax = byte[buffer + counter]
eax = eax ^ ecx
byte[buffer + counter] = al
As you can see, the counter
is being used to index into the entire buffer
buffer, as well as the arg_c
buffer, but only for the first 32 bytes. The entire loop in Pythonic pseudocode looks something like:
for counter in range(len(buffer):
buffer[counter] = buffer[counter] ^ arg_ch[counter % 0x20]
We have almost everything we need to decrypt the flag. We know that running the same xor
loop over encrypted data will decrypt it, and we have an encrypted flag in desperate need of decryption. What we don’t have is arg_ch
, which was used to xor
the flag... but we can get it.
Known Plaintext Attack
Despite seeming irrelevant initially, those Sample Images
included with the challenge are finally useful. We don’t know the value of the key used to xor
them, but these are generic images included with Windows 7, so we can easily find the originals.
Once again, since xor
encrypted data can be decrypted by running the same xor
function over the encrypted data, all we need to do is xor
the encrypted image with the original, and we should get the key.
I’m feeling lazy, so I’ll do it in Python using the Chrysanthemum.jpg_encrypted
image and originalChrysanthemum.jpg
, which I pulled from a Windows 7 machine. You can find a copy here, but if that 404’s, try Googling the md5sum: 076e3caed758a1c18c91a0e9cae3368f
This is a simple 1:1 xor
of two files of the same size, so the script is pretty straight forward.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
with open("Chrysanthemum.jpg_encrypted", "rb") as f:
enc = f.read()with open("Chrysanthemum.jpg", "rb") as f:
org = f.read()encryption_key = ""
for i in range(len(enc)):
encryption_key += chr(enc[i] ^ org[i])with open("encryption_key", "wb") as f:
f.write(encryption_key.encode())
Great, now we have the key value. Looking at the key in hexdump, it seems to be 32 bytes long, repeating over and over.
That makes sense, since in the xor
loop, the modulus of counter
and 32 is used to get the index for the key. Let’s use xxd
and sed
to take one iteration of the key, and output it in Python friendly hex.
We have everything we need — a xor
loop, the key, and an encrypted file. Let’s finish this.
Decrypting the Flag
All we have to do is read in the encrypted flag, loop through it applying the xor
function we laid out earlier in pseudocode, then write the result to a file. Since you’ve made it this far, I’m guessing I don’t need to break down the details any further, so here’s my solution.
#!/usr/bin/env python3key = (b"\x6e\xf7\x9b\x0e\x45\x55\x5e\x95\x96\xfe\xab\x75"
b"\x80\xb4\x0e\x40\x3d\xf5\xa7\x1b\xed\xd5\x5b\x80"
b"\xa9\xd3\x8d\x2c\xb8\x0a\x40\x0f")with open("flag.txt_encrypted", "rb") as f:
encrypted = f.read()decrypted = ""
for i in range(len(encrypted)):
decrypted += chr(encrypted[i] ^ (key[i % 0x20]))with open("flag.txt", "wb") as f:
f.write(decrypted.encode() + b"\n")
Oof, that was a long one. Once again, kudos to MalwareTech for the fun challenges, and stay tuned for more thinly veiled excuses to reverse engineer things with r2.