A high-performance hybrid vector search engine combining Python's flexibility with Rust's raw speed.
NeuroSearch is a custom-built vector retrieval system designed to demonstrate Foreign Function Interface (FFI) patterns between Python and Rust.
It offloads the heavy mathematical lifting (Cosine Similarity calculation) to a compiled Rust extension, utilizing SIMD (AVX2) intrinsics to process vectors up to 8x faster than standard loop implementations. It wraps this engine in a FastAPI service with Redis semantic caching to minimize latency.
- Python (Control Plane): Handles API requests, JSON validation, and ML model inference (HuggingFace Transformers).
- Rust (Data Plane): Manages in-memory vector storage and performs brute-force similarity scoring using low-level memory management.
- Implements manual AVX2 intrinsics (
_mm256_fmadd_ps). - Processes 8 floating-point numbers per CPU cycle instead of one, dramatically increasing throughput for high-dimensional vectors (e.g., 384d, 768d).
- Uses
parking_lot::RwLockfor a Multiple-Readers-Single-Writer model. - Search queries run in parallel across CPU cores using
Rayon, while ingestion blocks safely only when necessary. - Releases the Python Global Interpreter Lock (GIL) during search, allowing true parallelism.
- Pre-Normalization: Vectors are L2-normalized upon ingestion.
- This simplifies the search formula from Cosine Similarity to Dot Product, eliminating expensive Square Root (
sqrt) and Division operations inside the hot loop.
bash
neuro_search/
├── app/ # Python Control Plane
│ ├── main.py # FastAPI Entrypoint
│ └── config.py # Settings & Env Vars
├── neuro_engine/ # Rust Data Plane
│ ├── src/lib.rs # SIMD Logic & RwLock Implementation
│ ├── Cargo.toml # Rust Dependencies
│ └── pyproject.toml # Maturin Build Config
├── benchmark.py # Latency verification script
├── Dockerfile # Multi-stage production build
└── docker-compose.yml # Orchestration
graph LR
Client -->|POST /search| API[FastAPI]
API -->|Check Key| Redis[(Redis Cache)]
Redis -->|Hit| API
Redis -->|Miss| Model[Transformer Model]
Model -->|Embed Text| API
API -->|Vector| Rust[Rust SIMD Engine]
Rust -- AVX2 Parallel Scan --> API
API -->|JSON| Client
The project includes a multi-stage Docker setup. This is the easiest way to run it, as it handles the Rust compilation in a Linux environment (avoiding Windows file-locking issues).
- Docker Desktop installed and running.
docker-compose up --buildWait for the message: Application startup complete.
The API is now running at http://localhost:8000.
You can test the API using curl or the built-in Swagger UI at http://localhost:8000/docs.
Add text to the engine. It will be embedded by the Transformer model and stored in Rust memory.
curl -X POST "http://localhost:8000/api/v1/ingest" \
-H "Content-Type: application/json" \
-d '{"id": "doc1", "text": "Rust is a systems programming language focused on safety."}'
curl -X POST "http://localhost:8000/api/v1/ingest" \
-H "Content-Type: application/json" \
-d '{"id": "doc2", "text": "Python is excellent for data science and rapid prototyping."}'Search for concepts, not just keywords.
curl -X POST "http://localhost:8000/api/v1/search" \
-H "Content-Type: application/json" \
-d '{"query": "fast safe language", "limit": 2}'Response:
{
"results": [
{ "id": "doc1", "score": 0.8245 }
],
"latency_ms": 12.5,
"engine_docs": 2
}Tests performed on AWS c5.large (2 vCPU, 4GB RAM) with 50k vectors.
| Metric | Performance |
|---|---|
| Rust Engine Throughput | 10,000+ QPS (Core logic) |
| End-to-End Latency | < 15ms (p95) |
| Cache Hit Latency | < 2ms |
| Memory Overhead | ~1.6KB per vector (384-dim float32) |
Verify the benchmarks using the benchmark.py script.
If you wish to develop without Docker (e.g., on Windows), follow these steps carefully to avoid file-locking errors.
1. Prerequisites
- Rust (Cargo)
- Python 3.10+
- Redis (running locally)
2. Build Rust Extension
cd neuro_engine
# --release is CRITICAL for SIMD optimizations
maturin develop --release
cd ..> Note: If you get os error 32 on Windows, stop any running Python processes or VS Code terminals and try again.
3. Run FastAPI
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload