Reversing-golang Reversing Go - Part 1
Post
Cancel

Reversing Go - Part 1

Representation of Multiword Types

Complex Numbers

complex64 is represented as [2]float32 and complex128 is represented as [2]float64

Slice

A slice consists of 3 words

+00     pointer to buffer
+08     length of the slice
+0c     capacity - length of the buffer

String

A string consists of 2 words

+00     pointer to buffer
+08     len of the string

Channels

Channels are represented as hchan

1
2
3
4
5
6
7
8
9
10
11
12
13
type hchan struct {
    qcount   uint           // total data in the queue
    dataqsiz uint           // size of the circular queue
    buf      unsafe.Pointer // points to an array of dataqsiz elements
    elemsize uint16
    closed   uint32
    elemtype *_type         // element type
    sendx    uint           // send index
    recvx    uint           // receive index
    recvq    waitq          // list of recv waiters
    sendq    waitq          // list of send waiters
    lock mutex
}

Maps

Maps are represented as hmap

1
2
3
4
5
6
7
8
9
10
11
type hmap struct {
    count     int
    flags     uint8
    B         uint8
    noverflow uint16
    hash0     uint32
    buckets    unsafe.Pointer
    oldbuckets unsafe.Pointer
    nevacuate  uintptr
    extra   *mapextra
}

Interfaces

Interfaces are represented as itab structure. Every implementation of the interface is defined by an itab

1
2
3
4
5
6
7
8
9
10
11
12
type itab struct {
    inter *interfacetype    // +00
    utype *_type            // +08
    hash  uint32            // +10
    _     [4]byte           // padding
    fun   [1]uintptr        // +18
}

type eface struct {
    utype *_type            // +00
    data  unsafe.Pointer    // +08
}

inter is the interface being implemented, utype is the type for which the interface is being implemented. The functions are sorted lexicographically and the pointers are stored at fun.

Every interface is wrapped using a eface.

I is an interface, S is a struct that implements I, M be a function that takes an I. So, M takes an eface, whose utype points to itab_S_I. itab_S_I is a itab whose fun array is the functions of I implemented by S.

I is an empty interface(interface{}). Since an empty interface has no methods, it has no itab. This is why an empty interface can point to any object.

I is an interface, and it’s implemented by m structures, then there are m itab’s for every struct

For example, let’s consider the following code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// source: https://gobyexample.com/interfaces
type nameof interface {
    name() string
}

type geometry interface {
    nameof
    area() float64
    perim() float64
}

type rect struct {
    width, height float64
}
type circle struct {
    radius float64
}

func (r rect) area() float64 {
    return r.width * r.height
}
func (r rect) perim() float64 {
    return 2*r.width + 2*r.height
}
func (r *rect) name() string {
    return "rect"
}

func (c circle) area() float64 {
    return math.Pi * c.radius * c.radius
}
func (c circle) perim() float64 {
    return 2 * math.Pi * c.radius
}
func (c *circle) name() string {
    return "circle"
}
func measure(g geometry) {
    fmt.Println(g.name())
    fmt.Println(g.area())
    fmt.Println(g.perim())
}

Since there are two implementations of geometry we have two itab’s. itab_circle_geometry and itab_rect_geometry

1
2
3
4
5
6
7
8
9
10
11
12
13
14
.rdata:004EC180 go_itab__main_circle_main_geometry dq offset main_geometry
.rdata:004EC188    dq offset _ptr_main_circle
.rdata:004EC190    dq 4764E29Bh
.rdata:004EC198    dq offset main___circle__area
.rdata:004EC1A0    dq offset main___circle__name
.rdata:004EC1A8    dq offset main___circle__perim
.rdata:004EC1B0    align 20h
.rdata:004EC1C0    public go_itab__main_rect_main_geometry
.rdata:004EC1C0 go_itab__main_rect_main_geometry dq offset main_geometry
.rdata:004EC1C8    dq offset _ptr_main_rect
.rdata:004EC1D0    dq 0F4666948h
.rdata:004EC1D8    dq offset main___rect__area
.rdata:004EC1E0    dq offset main___rect__name
.rdata:004EC1E8    dq offset main___rect__perim

Note that the interface geometry is composed of three functions - name, area, perim. And every implementation (itab) contains these functions, sorted by name. area is at offset 0x18, name at 0x20, perim at 0x28

When measure is called, we have

1
2
3
4
5
.text:004A5CE1    lea     rcx, go_itab__main_rect_main_geometry
.text:004A5CE8    mov     [rsp+28h+var_28], rcx
.text:004A5CEC    mov     rcx, [rsp+28h+m_pRect]
.text:004A5CF1    mov     [rsp+28h+var_20], rcx
.text:004A5CF6    call    main_measure

As described, measure takes an eface {utype: &go_itab__main_rect_main_geometry, data: pRect}

Inside measure,

1
2
3
4
5
.text:004A5AA8    mov     rax, [rsp+78h+arg_0]  ; utype
.text:004A5AB0    mov     rcx, [rax+20h]
.text:004A5AB4    mov     rdx, [rsp+78h+arg_8]  ; data
.text:004A5ABC    mov     [rsp+78h+var_78], rdx
.text:004A5AC0    call    rcx

rcx points to the function at offset 0x20 inside go_itab__main_rect_main_geometry which is the second function. Since the function array starts at offset 0x18. The second function is name(). So, this code calls name() on the instance of the rect object (data)

Function Calls

All arguments are passed via the stack. The allocation is done via the caller for the arguments and the return values, which are also returned via stack.

for a function fn that takes m arguments and returns r return values, we have the stack something like this

[ return value r ]
...
[ return value 2 ]
[ return value 1 ]
[     arg m      ]
...
[     arg 2      ]
[     arg 1      ]
call fn

This layout is valid for go <= 1.15, there might be a switch to register based parameter passing. Thanks to ppetreus for mentioning this to me

How to distinguish between return values and arguments?

The values that are assigned before the function call are arguments. The remaining values are the return values.

Let’s take a look at fmt.Printf(fmt, args...). The compiler inlines this function to fmt.Fprintf(os.Stdout, fmt, args...)

So, first thing is to understand how to deduce parameter types (signature of the function) from the function call. Let’s start with Fprintf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
.text:004A8D90    lea     rcx, string_autogen_1ZI1UR
.text:004A8D97    mov     qword ptr [rsp+288h+var_1E8], rcx
.text:004A8D9F    mov     rdx, [rsp+288h+var_200]
.text:004A8DA7    mov     qword ptr [rsp+288h+var_1E8+8], rdx
.text:004A8DAF    mov     qword ptr [rsp+288h+var_1D8], rcx
.text:004A8DB7    mov     qword ptr [rsp+288h+var_1D8+8], rax
.text:004A8DBF    mov     rax, cs:os_Stdout
.text:004A8DC6    lea     rdx, go_itab__os_File_io_Writer
.text:004A8DCD    mov     [rsp+288h+var_288], rdx
.text:004A8DD1    mov     [rsp+288h+var_280], rax
.text:004A8DD6    lea     rax, aSS        ; "%s => %s\n"
.text:004A8DDD    mov     [rsp+288h+var_278], rax
.text:004A8DE2    mov     [rsp+288h+var_270], 9
.text:004A8DEB    lea     rbx, [rsp+288h+var_1E8]
.text:004A8DF3    mov     [rsp+288h+var_268], rbx
.text:004A8DF8    mov     [rsp+288h+var_260], 2
.text:004A8E01    mov     [rsp+288h+var_258], 2
.text:004A8E0A    call    fmt_Fprintf

Ok, let’s do this bottom up. var_288, var_280, …, var_258 are assigned before the function call. So, these are the arguments - Fprintf takes 7 arguments.

var_288 is assigned to address of go_itab__os_File_io_Writer. var_280 contains the instance of os.Stdout. So, var_288 and var_280 make up an eface. Now is this a empty interface or what’s the type ? No, the interface is not empty. Since the first member of eface points to an itab, we have atleast one method.

1
2
3
4
.rdata:004EF180 go_itab__os_File_io_Writer dq offset io_Writer
.rdata:004EF188    dq offset _ptr_os_File
.rdata:004EF190    dq 33F3B544h
.rdata:004EF198    dq offset os___File__Write

Ok, so, the interface has only one method Write(). and the type its implemented for is *os.File.

The first parameter to Fprintf is io.Writer.

var_278 is a pointer to a huge string. var_270 contains 9. So, the second parameter is a string (of length 9): “%s => %s\n”.

var_268 is a pointer to var_1E8. var_260 and var_258 has the value 2. Can you guess the type? Yes, you are right! it’s a slice. The slice contains two elements. Now we need to find out the type of the slice

1
2
3
4
5
6
.text:004A8D90    lea     rcx, string_autogen_1ZI1UR
.text:004A8D97    mov     qword ptr [rsp+288h+var_1E8], rcx
.text:004A8D9F    mov     rdx, [rsp+288h+var_200]
.text:004A8DA7    mov     qword ptr [rsp+288h+var_1E8+8], rdx
.text:004A8DAF    mov     qword ptr [rsp+288h+var_1D8], rcx
.text:004A8DB7    mov     qword ptr [rsp+288h+var_1D8+8], rax

var_1E8 is assigned string_autogen_1ZI1UR which is the representation of string type. var_1E8+8 contains the data (of course it’s string).

So, the slice’s type is an interface{}. The deduced signature is

func Fprintf(w io.Writer, fmt string, args []interface{})

What about the return values ?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
.text:0049FFA0                 public fmt_Fprintf
.text:0049FFA0 fmt_Fprintf     proc near
.text:0049FFA0 var_60          = qword ptr -60h
.text:0049FFA0 var_58          = qword ptr -58h
; ...
.text:0049FFA0 var_18          = qword ptr -18h
.text:0049FFA0 var_10          = qword ptr -10h
.text:0049FFA0 var_8           = qword ptr -8
.text:0049FFA0 arg_0           = qword ptr  8
.text:0049FFA0 arg_8           = qword ptr  10h
.text:0049FFA0 arg_10          = qword ptr  18h
.text:0049FFA0 arg_18          = qword ptr  20h
.text:0049FFA0 arg_20          = qword ptr  28h
.text:0049FFA0 arg_28          = qword ptr  30h
.text:0049FFA0 arg_30          = qword ptr  38h
.text:0049FFA0 arg_38          = qword ptr  40h
.text:0049FFA0 arg_40          = qword ptr  48h
.text:0049FFA0 arg_48          = qword ptr  50h

We know that arg_0 and arg_8 constitute io.Writer. arg_10 and arg_18 constitute the format string. arg_20, arg_28 and arg_30 constitute the slice of interface{}

So, the return values are stored in arg_38, arg_40 and arg_48

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
.text:004A0019    mov     rax, [rsp+60h+arg_0]  ; itab for io.Writer
.text:004A001E    mov     rax, [rax+18h]        ; os.File.Write
; ...
.text:004A004A    call    rax                   ; os.File.Write
.text:004A004C    mov     rax, [rsp+60h+var_40]
.text:004A0051    mov     [rsp+60h+var_28], rax
.text:004A0056    mov     rcx, [rsp+60h+var_38]
.text:004A005B    mov     [rsp+60h+var_20], rcx
.text:004A0060    mov     rdx, [rsp+60h+var_30]
.text:004A0065    mov     [rsp+60h+var_10], rdx
; ...
.text:004A0078    mov     rax, [rsp+60h+var_28]
.text:004A007D    mov     [rsp+60h+arg_38], rax
.text:004A0085    mov     rax, [rsp+60h+var_20]
.text:004A008A    mov     [rsp+60h+arg_40], rax
.text:004A0092    mov     rax, [rsp+60h+var_10]
.text:004A0097    mov     [rsp+60h+arg_48], rax

From this code, we can infer the return type of Fprintf is the same as the return type of io.Writer.Write.

The inferred signature is

1
func Fprintf(w io.Writer, fmt string, args []interface{}) (n int, err error)

Wait! Fprintf is defined as

1
func Fprintf(w io.Writer, format string, a ...interface{}) (n int, err error)

where the third arg is a vararg. In Go, varargs (variable arguments) are slices. In slice form, we needed to explicitly make a slice and pass it, whereas for varargs, the go compiler constructs the slice for us :)

What did we learn?

  1. Representation of types
  2. Function call - Deducing Signature of the function
  3. Interfaces and their implementation
This post is licensed under CC BY 4.0 by the author.