It’s been a few weeks since the last post, but it’s finally time to finish vm1 and ransomeware1, two of MalwareTech’s beginner reverse engineering challenges. Check out the first part of this series if you missed it.

If you’re completely new to Radare2 but have a background in reverse engineering, I suggest first going through my CMU Binary Bomb series, as I will gloss over many basic commands which that series covers.

As always, you should work with these files in an isolated environment designed for reverse engineering. While all of MalwareTech’s challenges appear clear of any malicious activity, it’s a good idea to work in an isolated environment to minimize your own risk.

vm1

vm1 introduces the concept of code obfuscation using a simple virtual machine. From MalwareTech’s site:

vm1.exe implements a simple 8-bit virtual machine (VM) to try and stop reverse engineers from retrieving the flag. The VM’s RAM contains the encrypted flag and some bytecode to decrypt it.

Download vm1 here.

The archive contains ram.bin and vm1.exe. Open vm1.exe with r2, analyze functions, and enter Graph mode with VV.

You can turn off comments in Graph mode with the single quote key ( ‘ )

Much of this should be familiar if you’ve done the other challenges. Some space on the heap is allocated with HeapAlloc, and it returns a pointer (0x40423c)which memcpy uses as a destination buffer.

Next the function calls fcn.004022e0. Seek there and use Graph mode’s minigraph view to get a better look. If you’re not already in Graph mode, enter it with VV, then cycle to minigraph view by pressing p.

At a glance, this looks like a while loop with only one path that exits the loop. It’s worth noting that before the loop itself begins, var_1h is set to 0. Let’s analyze the loop itself.

The loop begins by setting eax to 1, then checks if eax equals 0, exiting if true. This is likely a While (1) loop, which always loops.

Looking to the end of the function, you can see another function call to fcn.00402270. If the return value is 0, the current function returns and exits the loop. If not, it loops.

We can’t ignore the main body of the loop anymore, so let’s finally focus on that disassembly.

Math and Arguments

This loop isn’t too complicated if you break it down into chunks. There are three similar chunks of code, each assigning values to variables, then pushing those variables onto the stack as function arguments before calling fcn.00402270.

Combining a few steps, this block essentially sets var_10 to the value at 0x40423c + var_1h + 0xff.

Next var_1h is incremented by 1, and the value at var_ch is set to 0x40423c + var_1h + 0xff.

Once again, var_1h is incremented by 1, and this time the value ad var_8h is set to the value at 0x40423c + var_1h + 0xff.

Finally, var_1h is incremented one more time before fcn.00402270 is called with the variables that were just set up.

fcn.00402270(var_10h, var_ch, var_8h)

This code essentially indexes 0xFF + counterinto the buffer that was copied with memcpy. Now for the fun part, disassembling fcn.00402270!

Virtual Machines

At a glance, the fcn.00402270 checks var_4h to see if it’s equal to 1, 2, or 3. If it’s none of those numbers, the function exits with a return value of 0 (which causes the caller function to return).

If var_4h equals 1, ecx is set to 0x40423c + arg_ch, then arg_10h is copied into that address.

If the value of var_4h is 2, the value at 0x40423c + arg_ch is moved into the address 0x404240.

Finally, if the value of var_4h is 3, address 0x40423c + arg_ch is xor’d with the value at 0x404240. The result is then moved 0x40423c + arg_ch. Simple… right?

Name Your Variables

Since this function is executing instructions as a virtual machine, it’s safe to assume the check to see the value of var_4h determines what action the VM takes in that cycle. We will rename var_4h to opcode.

Let’s also rename arg_8h, arg_ch, and arg_10h toarg1, arg2, and arg3 (our function's arguments) using afvn. The syntax is:

afvn <NEWNAME> <OLDNAME>

Use afvd to view a function’s variables.

The opcode comparisons we worked through earlier should now be slightly more legible. You’ll also notice some patterns. When arg2 is used, it’s first being added to 0x40423c, then used as a pointer into memory. Instead of writing that out every time, I’ll just refer to 0x40423c + arg2 as *arg2. There’s also a new memory address 0x404240, which can have data moved into it, or used in the xor. We’ll call that offset.

Let’s strip down opcode equals 1 operation:

if opcode == 1:
*arg2 = arg3

This sets the value at arg2 (0x40423c + arg2) to arg3 .

if opcode == 2:
offset = *arg2

This sets offset to the value pointed to by arg2.

if opcode == 3:
*arg2 ^= *arg2 ^ offset

This sets the value pointed to by arg2 to the value at arg2 xor’d with the value at offset. Let’s use this knowledge to write a script which reads from

Final Script

First, we need to read from the ram.bin file. We’ll read it, then create two arrays, one for the code segment at base address+0xff and one for the data which contains the flag.

with open("ram.bin", "rb") as f:
mem_dump = f.read()
code = list(mem_dump[0xff:])
data = list(mem_dump[:0xff])

Next we need to set up a loop, which will:

  1. Assign instructions based on the code section
  2. Increment through the code section
  3. Implement the opcode instructions outlined in pseudocode
index = 0
offset = 0
while True:
opcode = code[index]
arg2 = code[index+1]
arg3 = code[index+2]
# To DO - Implement opcode stuff
index += 3

Here’s the final script with some fancy printing.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import time
import sys
with open("ram.bin", "rb") as f:
mem_dump = f.read()
code = list(mem_dump[0xff:])
data = list(mem_dump[:0xff])
index = 0
offset = 0
while True:
opcode = code[index]
arg2 = code[index+1]
arg3 = code[index+2]
valid_letters = "".join([chr(x) for x in data if chr(x).isprintable()])
sys.stdout.write(
"\rop: {:03} arg2: {:03} arg3: {:03} offset: {:03}\n{}".format(
opcode, arg2, arg3, offset, valid_letters[:32]))
time.sleep(0.1)
if opcode == 1:
data[arg2] = arg3
elif opcode == 2:
offset = data[arg2]
elif opcode == 3:
data[arg2] ^= offset
else:
break
index += 3

ransomeware1

Download ransomeware1 here.

This challenge requires a bit of reversing, and a little thinking outside of the box. After extracting the archive, you’ll see the ransomeware1.exe_ binary, a flag.txt_encrypted, and some images from the Sample Pictures folder. Let’s start with the binary.

Starting from the entry point at 0x00401150, the first function call is to section..text. Seek to it for the defacto main function. This function is a bit complex, so it may be easier to look through it in graph mode with VV, and going chunk-by-chunk.

First snprintf is called with a few arguments. This function composes a string based on a format string and additional arguments. Its prototype is:

int snprintf ( char * s, size_t n, const char * format, ... );

The first argument is a buffer which holds the final string. The second argument is the maximum number of bytes to copy. The third is a format string, and final is an argument which the format string modifies.

char * s = var_108h
size_t = 0x104
const char * = str.s_encrypted
arg = [hTemplateFile]

Note: r2 displays the format string without the %s in graph mode, but you can view the actual value with ps @ str.s_encrypted, and see it is in fact a format string.

The value of var_108 will be the value at hTemplateFile, with _encrypted appended to the end.

Next CreateFileA is called twice and the return value is assigned to a variable. CreateFileA’s prototype is:

HANDLE CreateFileA(
LPCSTR lpFileName,
DWORD dwDesiredAccess,
DWORD dwShareMode,
LPSECURITY_ATTRIBUTES lpSecurityAttributes,
DWORD dwCreationDisposition,
DWORD dwFlagsAndAttributes,
HANDLE hTemplateFile
);

The first call to CreateFileA looks something like:

lpOverlapped = CreateFileA(hTemplateFile, 0x80000000, 0, 0, 3, 0x80, 0)

The second call to CreateFileA is very similar:

hObject = CreateFileA(var_108h, 0x40000000, 0, 0, 2, 0x80, 0)

The main difference is the first function uses the original hTemplateFile value, while the second uses var_108h which is the address holding the string generated with snprintf. Both return a handle to a file, stored in lpOverlapped and hObject. It’s safe to assume these handles are used for reading in the original file, and writing an encrypted version.

The next part of the function consists of a few loops. From Graph mode (VV) , press p to cycle between views until you reach minigraph view.

Cycle back to default view with p, and seek to 0x401071.

This block sets up arguments, then calls ReadFile.

BOOL ReadFile(
HANDLE hFile,
LPVOID lpBuffer,
DWORD nNumberOfBytesToRead,
LPDWORD lpNumberOfBytesRead,
LPOVERLAPPED lpOverlapped
);

Easy enough. The function call looks like:

ReadFile(lpOverlapped, lpNumberofBytesWritten, 0x1000, nNumberofBytesToWrite, 0

That’s… confusing. r2 automatically named some variables, but they are a bit misleading. Let’s fix that by renaming them. Remember, the syntax is:

afvn <NEWNAME> <OLDNAME>

First of all, we know that snprintf put the value of hTemplateFile with _encrypted appended, into var_108h. Rename that to encryptedFileName. We knowlpOverlapped is a handle to the original file, so we will rename it hOriginalFile. hObject is a handle to the file with _encrypted appended to the end, lLet’s rename that hEncryptedFile.

Next let’s fix lpNumberOfBytesWritten and nNumberOfBytesToWrite. When called with ReadFile, the value of lpNumberOfBytesWritten seems to be used as a buffer. Let’s rename that buffer. nNumberOfBytesToWrite holds the value of total bytes read. Let’s rename that bytesRead.

There are only two other variables in this function which need naming, var_111ch and arg_ch. Looking to the next few blocks, var_111ch is set to 0, used in some calculations, and then incremented. It’s safe to say this is the counter for a loop. Let’s rename that to be counter, and worry about arg_ch later since its value is currently unknown.

Now the first block’s call to ReadFile makes a little more sense. The function is being called with:

ReadFile(hOriginalFile, buffer, 0x1000, bytesRead, 0);

Back to the Loop

After the call to ReadFile, the return value in eax is compared to 1, which checks for a failed read attempt and exits the program. If successful, it sets counter to 0, then begins a loop which runs until counter is equal to bytesRead. Until that comparison evaluates as true, a small loop is executed. This appears to be an encryption loop using xor. After each iteration, counter is incremented by 1

Breaking this down, eax is set to the value of counter, ecx is set to 0x20, then eax is divided by ecx. This puts the result in eax, and the remainder in edx. Next eax is set to the value of arg_ch, suggesting the reason for the div operation was just to put the remainder (modulus) into edx.

Next the value of arg_ch + edx is used to index into arg_ch, and the result is placed into ecx. Then edx is set to the value of counter, and used as an index, to set eax to the value at ebp + edx - 0x1118. Next eax and ecx are xor’d together, and the value the lower byte of eax (al) is placed into the value at ebp + ecx - 0x1118.

You might be asking yourself how 0x1118 relates to all this. If you look at the function’s prologue, you’ll see that ebp - 0x1118 is the same address as our buffer variable, which contains the contents read from the ReadFile function call.

Let’s try to make sense of all this by writing out the simplified assembly pseudocode of the xor loop, now knowing 0x1118 is the buffer address:

ecx = byte[arg_ch + counter % 0x20]
eax = byte[buffer + counter]
eax = eax ^ ecx
byte[buffer + counter] = al

As you can see, the counter is being used to index into the entire buffer buffer, as well as the arg_c buffer, but only for the first 32 bytes. The entire loop in Pythonic pseudocode looks something like:

for counter in range(len(buffer):
buffer[counter] = buffer[counter] ^ arg_ch[counter % 0x20]

We have almost everything we need to decrypt the flag. We know that running the same xor loop over encrypted data will decrypt it, and we have an encrypted flag in desperate need of decryption. What we don’t have is arg_ch, which was used to xor the flag... but we can get it.

Known Plaintext Attack

Despite seeming irrelevant initially, those Sample Images included with the challenge are finally useful. We don’t know the value of the key used to xor them, but these are generic images included with Windows 7, so we can easily find the originals.

Once again, since xor encrypted data can be decrypted by running the same xor function over the encrypted data, all we need to do is xor the encrypted image with the original, and we should get the key.

I’m feeling lazy, so I’ll do it in Python using the Chrysanthemum.jpg_encrypted image and originalChrysanthemum.jpg, which I pulled from a Windows 7 machine. You can find a copy here, but if that 404’s, try Googling the md5sum: 076e3caed758a1c18c91a0e9cae3368f

This is a simple 1:1 xor of two files of the same size, so the script is pretty straight forward.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
with open("Chrysanthemum.jpg_encrypted", "rb") as f:
enc = f.read()
with open("Chrysanthemum.jpg", "rb") as f:
org = f.read()
encryption_key = ""
for i in range(len(enc)):
encryption_key += chr(enc[i] ^ org[i])
with open("encryption_key", "wb") as f:
f.write(encryption_key.encode())

Great, now we have the key value. Looking at the key in hexdump, it seems to be 32 bytes long, repeating over and over.

That makes sense, since in the xor loop, the modulus of counter and 32 is used to get the index for the key. Let’s use xxd and sed to take one iteration of the key, and output it in Python friendly hex.

We have everything we need — a xor loop, the key, and an encrypted file. Let’s finish this.

Decrypting the Flag

All we have to do is read in the encrypted flag, loop through it applying the xor function we laid out earlier in pseudocode, then write the result to a file. Since you’ve made it this far, I’m guessing I don’t need to break down the details any further, so here’s my solution.

#!/usr/bin/env python3key = (b"\x6e\xf7\x9b\x0e\x45\x55\x5e\x95\x96\xfe\xab\x75"
b"\x80\xb4\x0e\x40\x3d\xf5\xa7\x1b\xed\xd5\x5b\x80"
b"\xa9\xd3\x8d\x2c\xb8\x0a\x40\x0f")
with open("flag.txt_encrypted", "rb") as f:
encrypted = f.read()
decrypted = ""
for i in range(len(encrypted)):
decrypted += chr(encrypted[i] ^ (key[i % 0x20]))
with open("flag.txt", "wb") as f:
f.write(decrypted.encode() + b"\n")

Oof, that was a long one. Once again, kudos to MalwareTech for the fun challenges, and stay tuned for more thinly veiled excuses to reverse engineer things with r2.

--

--