Episodic Memory Platform (EMP)

The Episodic Memory Platform (EMP) is a production-style AI backend designed to model user knowledge as time-stamped semantic memory events. It leverages the high-performance Endee vector database for metadata-aware semantic search, time-aware re-ranking, and episodic Retrieval-Augmented Generation (RAG).

RAGxthon 2026 Submission

This project is submitted for the RAGxthon 2026 Hackathon.

System Architecture Diagram

flowchart TD
    %% User Inputs
    UserQuery[/"👤 User Query"/] --> Router{"Agent Intent Classification\n(Recall / Summarize / Recommend)"}
    
    %% Ingestion Flow
    MemInput[/"📝/🖼️ Memory Input\n(Text or Image)"/] --> VisionLLM["Vision LLM\n(Gemini Flash)"]
    MemInput -.-> |If Image| VisionLLM
    MemInput -.-> |If Text| Embeddings1
    VisionLLM --> |Descriptive Text| Embeddings1
    
    %% Embedding
    Router --> Embeddings2["🧮 Embeddings Model\n(Llama 3 Nemotron)"]
    Embeddings1["🧮 Embeddings Model\n(Llama 3 Nemotron)"] --> VDB[("🗄️ Endee Vector Database\n(Custom C++ / SIMD)")]
    
    %% Retrieval Pipeline
    Embeddings2 --> Filter[("RoaringBitmap\nMetadata Filters")]
    Filter -. "(tags, etc)" .-> VDB
    VDB --> |"HNSW Semantic Search"| TopK["Raw Top-K Retrieval"]
    
    TopK --> TimeDecay["⏳ Time-Decay Re-ranking\n(Exponential Decay Formula)"]
    TimeDecay --> EpisodicGroup["📅 Episodic Grouping\n(Temporal Clustering)"]
    
    %% Generation
    EpisodicGroup --> LLM["🧠 LLM Controller\n(Gemma 3)"]
    LLM --> Response[/"💬 Grounded Response"/]
    
    %% Async Processes
    VDB -.-> Reflection["🌙 Memory Reflection\n(Background Consolidation)"]
    Reflection -.-> |"Synthesized Insights"| VDB

The RAG Pipeline

Our implementation extends standard RAG by adding temporal context, intent routing, and cognitive consolidation to solve the "context fragmentation" problem in typical LLM systems:

Data Source (Episodic Memories): Memories (text or images converted to text via Vision LLMs) are ingested as discrete events with timestamps and categorical metadata tags.
Embeddings: Text is vectorized using nvidia/llama-nemotron-embed via OpenRouter.
Vector Database: We use Endee, a highly optimized, SIMD-accelerated open-source C++ vector database. It executes Pre-Filtering via RoaringBitmaps before HNSW graph traversal, making it incredibly fast for our metadata-heavy queries.
Retrieval (3-Stage Process):
- Semantic Search: Fast, hardware-accelerated Top-K retrieval in Endee.
- Time-Decay Re-ranking: Re-weights semantic similarity scores using an exponential decay function (e^(-λ·days_old)), surfacing relevant and recent memories.
- Episodic Grouping: Clusters adjacent memories into chronological "Episodes" to maintain narrative flow.
LLM Generation: The grouped episodes form a strict grounding prompt for google/gemma-3-27b-it, ensuring hallucination-free output.

Dataset

Unlike static corpora (like PDFs or Wikipedia dumps), EMP relies on a dynamically growing dataset of user-generated events.

A curated dataset representing 30 days of simulated user memories (tech notes, personal updates, reflections) is included in the project. Reviewers can instantly load this dataset using our seed script to demonstrate the platform's time-awareness and analytical capabilities without needing to manually generate an entire history.

Problem Statement

Modern LLM applications often struggle with "context fragmentation" or simply lack a coherent, time-aware memory of past user interactions. Generic "chat with your data" systems rely on naive semantic similarity (e.g., top-K cosine distance), which completely ignores:

When a memory occurred (temporal context).
The overarching "episode" or chronological theme tying multiple memories together.
Metadata categorizations (e.g., extracting intent before querying).

EMP solves this by treating memory not just as text, but as a time-stamped, metadata-rich event. By grouping retrieved memories into temporal "episodes" before passing them to the LLM, the model generates responses that are fundamentally grounded in the user's personal timeline.

Why a Vector Database is Required

Vector databases are essential here because memory retrieval is inherently semantic. When a user asks, "What did we decide about the architecture?", keyword searches will fail if the memory states, "We are going to structure the backend using Domain-Driven Design."

We need a system that can:

Understand Meaning: Convert text into dense mathematical vectors (embeddings) to measure conceptual proximity.
Filter by Metadata: Rapidly narrow down the search space using categorical tags (e.g., tag: "architecture") before doing the heavy vector math.
Scale: Handle thousands or millions of distinct memory events in milliseconds.

Internal Usage of Endee

Endee is incredibly central to this platform. It relies on Endee's highly optimized C++ core using SIMD instructions (AVX2/NEON) to achieve extreme performance.

Schema & Data Flow

Ingestion: When a memory is inserted, the text is vectorized locally or via an API (OpenRouter embeddings). The vector is stored in Endee alongside metadata: text, timestamp (numeric integer), and tags (string list).
Filtering: EMP uses Endee's robust JSON-based filtering (e.g., {$in: [...]}). Endee's Pre-Filtering architecture executes these tag filters first (using RoaringBitmaps), guaranteeing fast retrieval.
Retrieval Logic: EMP queries Endee with the semantic vector to fetch the top K most relevant memories matching the metadata filters. Endee seamlessly switches to a brute-force exact match if the filtered subset is very small, bypassing HNSW graph overhead.
Time-Aware Ranking: After Endee returns the semantic top K, EMP applies an exponential decay function based on the timestamp metadata. A memory from 5 minutes ago will rank higher than an equally relevant memory from 5 months ago.

Architecture

The system uses a clean, modular Python backend:

FastAPI: Provides the asynchronous REST API.
Endee HTTP Client: A dedicated wrapper to orchestrate Endee index creation, upserts, and filtered queries.
Embedding & LLM Clients: Connects to OpenRouter to fetch text embeddings (nvidia/llama-nemotron-embed...) and generate strict, grounded chat responses (google/gemma-3...).
Memory Service: Manages Endee interactions, ingestion, metadata preparation, and the time-decay algorithm.
Agent Service: Analyzes incoming query intents (recall, summarize, recommend), applies chronological clustering to form "Context Episodes", and formats the strict grounding LLM prompt.

Setup Instructions

1. Requirements

Ensure you have Docker and Docker Compose installed.

2. Environment Variables

Create a .env file in the root directory and add your OpenRouter API key:

OPENROUTER_API_KEY=your_openrouter_api_key_here

3. Quick Start (Hackathon Evaluators)

You can run the entire Episodic Memory Platform (Endee Vector DB + FastAPI Backend + Streamlit UI) with a single command:

docker compose up --build

The services will be available at:

Frontend UI: http://localhost:8501
Backend API: http://localhost:8000
Endee VDB: http://localhost:8080 (Internal)

4. Seeding the Database

To see the system in action with a realistic, 30-day timeline of episodic memories, run the seed script:

# If running locally (not in docker):
python scripts/seed_data.py

# If running via Docker Compose:
docker compose exec emp-backend python scripts/seed_data.py

After seeding, refresh the frontend UI to interact with the Memory Map and Agent features.

Example Usage

You can interact with the API directly using curl or by writing a simple client script.

1. Ingesting a Memory

curl -X POST "http://localhost:8000/api/memory/add" \
     -H "Content-Type: application/json" \
     -d '{"text": "I set up the Endee database for the memory platform.", "tags": ["tech"]}'

2. Querying the Agent

curl -X POST "http://localhost:8000/api/agent/ask" \
     -H "Content-Type: application/json" \
     -d '{"user_input": "What did I do today regarding the database?", "intent": "recall"}'

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
frontend.py		frontend.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Episodic Memory Platform (EMP)

RAGxthon 2026 Submission

System Architecture Diagram

The RAG Pipeline

Dataset

Problem Statement

Why a Vector Database is Required

Internal Usage of Endee

Schema & Data Flow

Architecture

Setup Instructions

1. Requirements

2. Environment Variables

3. Quick Start (Hackathon Evaluators)

4. Seeding the Database

Example Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Episodic Memory Platform (EMP)

RAGxthon 2026 Submission

System Architecture Diagram

The RAG Pipeline

Dataset

Problem Statement

Why a Vector Database is Required

Internal Usage of Endee

Schema & Data Flow

Architecture

Setup Instructions

1. Requirements

2. Environment Variables

3. Quick Start (Hackathon Evaluators)

4. Seeding the Database

Example Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages