Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@ make test PYTHON=3.13 # Specific version
### Rust core (`src/`)

- **`lib.rs`** — PyO3 module entry, exports `CachedFunction`, `SharedCachedFunction`, info types
- **`store.rs`** — In-process backend: `CachedFunction` uses a sharded `hashbrown::HashMap` + `parking_lot::RwLock` per shard (read lock for cache hits, write lock for misses/eviction). The `__call__` method does the entire cache lookup in Rust (hash → shard select → read lock → lookup → equality check → SIEVE visited update → return) in a single FFI crossing
- **`store.rs`** — In-process backend: `CachedFunction` uses sharded `hashbrown::HashMap` with passthrough hasher (avoids re-hashing Python's precomputed hash) + GIL-conditional locking (`GilCell` under GIL for zero-cost, `parking_lot::RwLock` under free-threaded Python). The `__call__` hot path uses `BorrowedArgs` to look up via borrowed pointer (no `CacheKey` allocation on hits), with `CacheKey` only materialized on cache miss for storage
- **`serde.rs`** — Fast-path binary serialization for common primitives (None, bool, int, float, str, bytes, flat tuples); avoids pickle overhead for the shared backend
- **`shared_store.rs`** — Cross-process backend: `SharedCachedFunction` holds `ShmCache` directly (no Mutex), with cached `max_key_size`/`max_value_size` fields and a pre-built `ahash::RandomState`. Serializes via serde.rs (with pickle fallback), stores in mmap'd shared memory
- **`entry.rs`** — `CacheEntry` { value, created_at, visited }
- **`key.rs`** — `CacheKey` wraps `Py<PyAny>` + precomputed hash; uses raw `ffi::PyObject_RichCompareBool` for equality (safe because called inside `#[pymethods]` where GIL is held)
- **`key.rs`** — `CacheKey` wraps `Py<PyAny>` + precomputed hash; uses raw `ffi::PyObject_RichCompareBool` for equality. Also provides `BorrowedArgs` (zero-alloc borrowed key for hit-path lookups via hashbrown's `Equivalent` trait)
- **`shm/`** — Shared memory infrastructure:
- `mod.rs` — `ShmCache`: create/open, get/set with serialized bytes. Uses interior mutability (`&self` methods): reads are lock-free (seqlock), writes acquire seqlock internally. `next_unique_id` is `AtomicU64`
- `layout.rs` — Header + SlotHeader structs, memory offsets
Expand All @@ -58,7 +58,9 @@ make test PYTHON=3.13 # Specific version
- **Single FFI crossing**: entire cache lookup happens in Rust `__call__`, no Python wrapper overhead
- **Release profile**: fat LTO + `codegen-units=1` for cross-crate inlining of PyO3 wrappers
- **SIEVE eviction**: unified across both backends. On hit, sets `visited=1` (single-word store). On evict, hand scans for unvisited entry. Lock-free reads on both backends
- **Thread safety**: sharded `hashbrown::HashMap` + `parking_lot::RwLock` per shard (read lock for hits, write lock for misses) for in-process backend; seqlock (optimistic reads + TTAS spinlock) for shared backend — no Mutex, `ShmCache` uses `&self` methods with interior mutability. Cache hits only acquire a cheap per-shard read lock (memory) or are fully lock-free (shared). Enables true parallel reads across shards under free-threaded Python (3.13t+)
- **Thread safety**: GIL-conditional locking — `GilCell` (zero-cost `UnsafeCell` wrapper) under GIL-enabled Python, `parking_lot::RwLock` under free-threaded Python (`#[cfg(Py_GIL_DISABLED)]`). Shared backend uses seqlock (optimistic reads + TTAS spinlock) — no Mutex. Under free-threaded Python, per-shard `RwLock` enables true parallel reads across cores
- **Borrowed key lookup**: hit path uses `BorrowedArgs` (raw pointer + precomputed hash) via hashbrown's `Equivalent` trait — no `CacheKey` allocation, no refcount churn on hits
- **Passthrough hasher**: `PassthroughHasher` feeds Python's precomputed hash directly to hashbrown, avoiding foldhash re-hashing (~1-2ns saved per lookup). Shard count is power-of-2 for bitmask indexing

## Critical Invariants

Expand Down
3 changes: 3 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ memmap2 = "0.9"
libc = "0.2"
ahash = "0.8"

[lints.rust]
unexpected_cfgs = { level = "warn", check-cfg = ['cfg(Py_GIL_DISABLED)'] }

[profile.release]
lto = "fat"
codegen-units = 1
Expand Down
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
# warp_cache

A thread-safe Python caching decorator backed by a Rust extension. Uses
**SIEVE eviction** for scan-resistant, near-optimal hit rates with per-shard
read locks. The entire cache lookup happens in a single Rust `__call__` — no Python
wrapper overhead. **13-20M ops/s** single-threaded, **22x** faster than
`cachetools`, with a cross-process shared memory backend reaching **9.2M ops/s**.
**SIEVE eviction** for scan-resistant, near-optimal hit rates with zero-cost
locking under the GIL. The entire cache lookup happens in a single Rust `__call__` — no Python
wrapper overhead. **16-23M ops/s** single-threaded, **25x** faster than
`cachetools`, with a cross-process shared memory backend reaching **9.7M ops/s**.

## Features

- **Drop-in replacement for `functools.lru_cache`** — same decorator pattern and hashable-argument requirement, with added thread safety, TTL, and async support
- **[SIEVE eviction](https://junchengyang.com/publication/nsdi24-SIEVE.pdf)** — a simple, scan-resistant algorithm with near-optimal hit rates and O(1) overhead per access
- **Thread-safe** out of the box (sharded `RwLock` + `AtomicBool` for SIEVE visited bit)
- **Thread-safe** out of the box (zero-cost `GilCell` under GIL, sharded `RwLock` under free-threaded Python)
- **Async support**: works with `async def` functions — zero overhead on sync path
- **Shared memory backend**: cross-process caching via mmap with fully lock-free reads
- **TTL support**: optional time-to-live expiration
- **Single FFI crossing**: entire cache lookup happens in Rust, no Python wrapper overhead
- **13-20M ops/s** single-threaded, **17M+ ops/s** under concurrent load, **22x** faster than `cachetools`
- **16-23M ops/s** single-threaded, **20M+ ops/s** under concurrent load, **25x** faster than `cachetools`

## Installation

Expand Down Expand Up @@ -58,17 +58,17 @@ Like `lru_cache`, all arguments must be hashable. See the [usage guide](docs/usa

| Metric | warp_cache | cachetools | lru_cache |
|---|---|---|---|
| Single-threaded (cache=256) | 18.1M ops/s | 814K ops/s | 32.1M ops/s |
| Multi-threaded (8T) | 17.9M ops/s | 774K ops/s (with Lock) | 12.3M ops/s (with Lock) |
| Shared memory (single proc) | 9.2M ops/s (mmap) | No | No |
| Shared memory (4 procs) | 7.5M ops/s total | No | No |
| Thread-safe | Yes (sharded RwLock) | No (manual Lock) | No |
| Single-threaded (cache=256) | 20.4M ops/s | 826K ops/s | 31.0M ops/s |
| Multi-threaded (8T) | 20.4M ops/s | 793K ops/s (with Lock) | 12.6M ops/s (with Lock) |
| Shared memory (single proc) | 9.7M ops/s (mmap) | No | No |
| Shared memory (4 procs) | 8.1M ops/s total | No | No |
| Thread-safe | Yes (GilCell / sharded RwLock) | No (manual Lock) | No |
| Async support | Yes | No | No |
| TTL support | Yes | Yes | No |
| Eviction | SIEVE (scan-resistant) | LRU, LFU, FIFO, RR | LRU only |
| Implementation | Rust (PyO3) | Pure Python | C (CPython) |

`warp_cache` is the fastest *thread-safe* cache — **22x** faster than `cachetools` and **4.9x** faster than `moka_py`. Under multi-threaded load, it's **1.5x faster** than `lru_cache + Lock`. See [full benchmarks](docs/performance.md) for details.
`warp_cache` is the fastest *thread-safe* cache — **25x** faster than `cachetools` and **5.3x** faster than `moka_py`. Under multi-threaded load, it's **1.6x faster** than `lru_cache + Lock`. See [full benchmarks](docs/performance.md) for details.

<picture>
<source media="(prefers-color-scheme: dark)" srcset="benchmarks/results/comparison_mt_scaling_dark.svg">
Expand Down
Loading