Malware-analysis Analysis of SmokeLoader
Post
Cancel

Analysis of SmokeLoader

Overview of Stages

flow

Stage 1

A chunk of memory is allocated by LocalAlloc, it’s permissions are changed to RWX

i1

And then the block is decrypted using TEA cipher, and then it’s executed

i2

Stage 2

The Stage 2 loader consists of an embedded PE file (stage 3 loader). It uses process hollowing to execute the embedded PE in a new process

i3

Stage 3

The binary is full of jumps. It has a pattern though. A function is split into three parts:

  1. prologue
  2. function body
  3. epilogue

In the prologue, the function body is decrypted (one byte xor), and in the epilogue, the function body is re-xored to encrypt it back. Here’s a small script to help with the function decryption

Resolving Imports

The hash used is DJB2

djb2

Elevating Privs

elev

It checks whether the process is running as elevated. If not, it uses wmic process create with runas verb to run itself as elevated, and disguise as a child of WmiPrvSE

wmic-child

Anti Hooking

antihook

Smoke Loader copies ntdll to a tmp file and maps it. Since most of the sandboxes hook ntdll, this is an attempt to prevent such hooks.

ahook

The following functions are resolved by the tmp ntdll mapping

amap

Anti Debug

antidebug

Anti VM

Sandbox, AV, and VM Checks

The sample enumerates the subkeys of HKLM:\System\CurrentControlSet\Enum\{IDE,SCSI} to check for the presence of virtual box, qemu, vmware and libvirtio

avm avm1

Process & Module Enumeration

The sample checks for the presence of:

  1. qemu-ga.exe
  2. qga.exe
  3. windanr.exe
  4. vboxservice.exe
  5. vboxtray.exe
  6. vmtoolsd.exe
  7. prl_tools.exe

and if any of the listed process names are present, it additionally checks for the presence of the following modules, which are used by virtual box and vmware

avm2

WoW64 Detection

x64

In 32 bit windows, gs is reserved, fs points to per Thread Information Block.
In 64 bit windows, gs points to per Thread Information Block.
For Wow64, gs points to the 64 bit Thread Information Block, and fs points to the 32 bit Thread Information Block.

By checking whether gs is non-zero, the sample is checking whether the OS is 64 bit or not.

Compressed Payload

i9

For quicker analysis, I have used unicorn to run the decompression algorithm (at 04013DB) and dump the results into IDA

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
import ida_bytes, idc
import unicorn as uc

def get_payload(ptr_raw, size):
    ''' get the raw data from file '''
    f = open("stage3.bin", 'rb')
    f.seek(ptr_raw)
    d = f.read(size)
    f.close()
    d = bytearray(d)
    i = 0
    x = (0x6988ACFC).to_bytes(4, byteorder='little')
    l = len(d) & -4
    while i < l:
        d[i] ^= x[i&3]
        i += 1
    while i < len(d):
        d[i] ^= 0xfc
        i += 1
    return bytes(d)

# prepare unicorn
e = uc.Uc(arch=uc.UC_ARCH_X86, mode=uc.UC_MODE_32)

rESI = 0x410000
rEDI = 0x420000
rESP = 0x600000

x86_tuple = (0x2302, 0x3a4d)    # (offset_in_file, size_of_compressed_data)
x64_tuple = (0x5d4f, 0x45a2)
x86_size = 0x5000               # size_of_decompressed_data
x64_size = 0x6A00

e.mem_map(0x400000, 0xf000, uc.UC_PROT_EXEC | uc.UC_PROT_WRITE)
e.mem_map(rEDI, 0xf000, uc.UC_PROT_WRITE|uc.UC_PROT_READ)
e.mem_map(rESI, 0xf000, uc.UC_PROT_WRITE|uc.UC_PROT_READ)
e.mem_map(rESP, 0xf000, uc.UC_PROT_WRITE|uc.UC_PROT_READ)

e.mem_write(0x400000, idc.get_bytes(0x4013db, 0x300))
# write the compressed buffer
e.mem_write(rESI, get_payload(*x86_tuple))

rOFFSET = 4
# setup the stack frame
e.mem_write(rESP+0xf000-rOFFSET, rEDI.to_bytes(4, byteorder='little'))
e.mem_write(rESP+0xf000-rOFFSET-4, (rESI+4).to_bytes(4, byteorder='little'))
e.mem_write(rESP+0xf000-rOFFSET-8, 0xbaadf00d.to_bytes(4, byteorder='little'))
e.reg_write(uc.x86_const.UC_X86_REG_ESP, rESP+0xf000-rOFFSET-0x500)
e.reg_write(uc.x86_const.UC_X86_REG_EBP, rESP+0xf000-rOFFSET-12)

STOP = 0x0401492-0x4013db+0x400000 & 0xffffffff

e.emu_start(0x400000, STOP)

'''
layout of decompressed data

+00         offset_of_nt_header
+04         padding
+offset     IMAGE_NT_HEADERS
...
'''

# dump the pe
r = e.mem_read(rEDI, x86_size)
f = open("stage4-x86.bin", 'wb+')
offset = int.from_bytes(r[:4], 'little')
payload = r[offset:]
f.write(b'\x00'*(offset))
f.write(payload)
f.seek(0)
f.write(b'MZ')
f.seek(offset)
f.write(b'PE\x00\x00')
f.seek(0x3c)
f.write((offset).to_bytes(4, byteorder='little'))
f.close()

Process Injection

The decompressed blob is a PE file. The pe injector finds explorer.exe using GetShellWindow api. It creates two section objects - one with READ_WRITE perms and the other with RWX perms. It maps them into explorer, and itself. The PE Loader then maps the sections, fixes the relocations and invokes RtlCreateUserThread to run from the entry point

if

Stage 4

Stage4 is a DLL, the entry point is called with the shared mapping (RW) of size 0x5000 bytes

String Encryption

Strings are stored by index, and they are rc4 encrypted with a 4 byte key

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def rc4_decrypt(key, buffer):
    sbox = [i for i in range(0x100)]
    j = 0
    for i in range(0x100):
        j = (j + sbox[i] + key[i%len(key)]) & 0xff
        sbox[i], sbox[j] = sbox[j], sbox[i]
    i, j = 0, 0
    buffer = bytearray(buffer)
    for k in range(len(buffer)):
        i = (i+1) & 0xff
        j = (j + sbox[i]) & 0xff
        sbox[i], sbox[j] = sbox[j], sbox[i]
        buffer[k] ^= sbox[sbox[i] + sbox[j] & 0xff]
    return bytes(buffer)

def rc4_decrypt_string(size):
    addr = 0x10002744
    key = bytes.fromhex('27B66585')
    pos = 0
    count = 0
    n_iter = 0
    while True:
        tmp = idaapi.get_byte(addr+pos)
        if tmp:
            count += 1
        if count == size:
            break
        pos += tmp+1
        n_iter += 1
        if n_iter >= 675:
            return b''
    buffer = idaapi.get_bytes(addr+pos+1, tmp)
    return rc4_decrypt(key, buffer)

API Hashing

1
2
3
4
5
6
7
8
9
10
11
12
def rol(n, k, bits=32):
    k %= bits
    return (n << k | n >> (bits-k)) & ((1<<bits)-1)

def ror(n, k, bits=32):
    return rol(n, -k, bits)

def stage4_hash(name):
    ans = 0
    for i in name:
        ans = rol(ans ^ (i & 0xdf), 8)+(i & 0xdf) & 0xffffffff
    return ans ^ 0x854710DF

API resolution is done manually (traversing the export table). For resolving forwarder entries, LoadLibrary and GetProcAddress are used

Anti VM

Process listing is done using CreateToolhelp32Snapshot. The following processes are checked for and terminated: Autoruns.exe, procexp.exe, procexp64.exe, Procmon.exe, Procmon64.exe, tcpview.exe, Wireshark.exe, ProcessHacker.exe, x32dbg.exe, x64dbg.exe and many others

pkill

Mutex Name Generation

This variant of SmokeLoader generates a name for the mutex (some part of the name is also used as rc4 key for c2 communication). The format is:

1
2
3
comp_name = GetComputerName()
vol_serial = GetVolSerial(g_SysDrive)   # drive where Windows is located, C: mostly
md5(comp_name + "854710DF" + vol_serial).hex().upper() + "%08X" % vol_serial

mutexNamaewa

C2 Urls

C2 urls are encrypted using the following format

urll

urld

List of C2’s

Request Packet Layout

1
2
3
4
5
6
7
8
9
10
11
12
13
14
struct RequestPacket {
    u16     m_Signature;       // 2020, year ?
    char    m_MutexName[41];   // mutex name
    char    m_ComputerName[16];
    u32     m_Key;              // key from stage3 loader (0x34305AB8)
    char    m_zero1[2];
    u8      m_OSVersion;          // major<<4 | minor
    u8      m_field_56B;        // ???
    bool    m_bNeedsElevation;  // do I need elevation ?
    u16     m_Command;
    u32     m_SubCommand;
    u32     m_arg5;
    char    m_RandString[0];    // string of len from 31 to 290
};

The function for sending data has an optional parameter for RC4 encrypting the data to be sent. The RC4 key is \xBE\x21\x8E\x0A

Command Description

Command IDDescription
10001Fetch Plugin
10002Fetch Payload
10003ACK
SubCommand IDDescription
105Fetch Payload and Execute it
114Kill processes and uninstall self
117Fetch payload, execute it, and cleanup
A decimal string (N)Fetch and execute payload N times

Response Packet Layout

The encrypted response packet

enc_resp

Header decryption Key: "\x61\x31\xc9\x8f"
The decrypted header has the following layout:

1
2
3
4
5
struct ResponseHeader {
    u32 signature;    // 2020, current year maybe
    u8  sub_command;
    u8  data[0];
};

m_Data is a string consisting of

MarkerData following marker
plugin_sizeDenotes the size of the plugin, following the header
|:|String copied to a mapped region of memory, unmapped immediately 🤔

serious

Payload Packet

payload_pkt

1
2
3
4
5
6
7
8
9
10
11
12
13
enum payload_type: u8 {
    EXE,                // use CreateProcessW
    DLL_Load,           // use LoadLibraryW
    DLL_RegSvr32,       // use regsvr32.exe to invoke the dll
    EXE_SelfInj,        // self injection
    BAT                 // use ShellExecuteW to execute batch file
};

struct payload_packet_t {
    payload_type    payload_type;
    bool            do_uninstall;
    u8              encrypted_data[0];
};

RC4 key for decryption is "\x61\x31\xC9\x8F" Decrypted data can be of two types:

  • Location: <url>

    Downloads the file returned by the url mentioned and proceed to execute it

  • The executable itself!

Plugin Packet

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// this is just a PE file without DOS header
struct comp_data_t {
    u32                 nt_hdr_offset;  // offset to comp_data_t.hdr
    u8                  unk_b;  // must be zero, for injecting into explorer
    u32                 unk_a;  // must be non zero
    IMAGE_NT_HEADERS    hdr;    // pe header
};

struct component_t {
    u32             data_size;
    // will the process will be killed when uninstall cmd is received ?
    bool            add_to_uninstall_list;
    u8              rc4_key[15];        // for decrypting encrypted_data
    u8              unknown;
    u8              encrypted_data[0];  // comp_data_t
};

struct plugin_t {
    u32             plugin_size;
    u32             magic;  // 0x5BEAD001
    u32             reserved;
    u8              n_component;
    component_t     components[0];
};

The plugin is encrypted before writing it to %AppData%, and it’s decrypted

dr

Executing a Plugin Component

For every plugin component, it decrypts the component data (comp_data_t), spawns new instance of explorer.exe in suspended state. It then maps the PE file, creates a stub and changes the entrypoint of the newly created process to point to the stub

inj

Injection into explorer

The method used is similar to this. Instead of using RtlCreateUserThread to execute the payload, the sample modifies the entrypoint to point to a stub that executes the payload’s entrypoint

Step 1. Stub Creation

stubmk

Step 2. Modifying the entrypoint

entrymod

The injected explorer process looks something like this

stub

Injection context layout

1
2
3
4
5
6
7
8
9
10
11
struct injection_contextfor decrypting
{
  char      field_0;            // always 0
  u32       rc4_key;            // for rencrypting component_data = 0xA8E21BE
  u8        mutex_name[41];
  u32       curr_Pid;           // process id of smokeloader stage4
  u8        c2_url[260];
  u8        user_agent[552];
  u32       component_size;     // from component_t
  u8        component_data[];   // from component_t
};
This post is licensed under CC BY 4.0 by the author.