AI-Powered Email Archive Analysis with Native RAG (Retrieval-Augmented Generation) Pipeline
Download the latest release: MBox Explorer v2.2.1
Or build from source (see below).
New WidgetKit widget for macOS Notification Center:
- Three Widget Sizes: Small, Medium, and Large
- Email Statistics: Total emails, threads, date range at a glance
- Top Senders: See who emails you most frequently
- Recent Searches: Quick access to your recent queries
- Quick Search Action: Jump directly to search in the app
- Auto-Sync: Widget updates automatically when you load new mbox (mailbox format) files
- App Group Sharing: Secure data sharing via
group.com.jkoch.mboxexplorer
Widget Features by Size:
| Size | Features |
|---|---|
| Small | Email count, loaded file name |
| Medium | Stats + Top 3 senders |
| Large | Stats + Top senders + Recent searches + Quick search button |
- Memory-safe SQLite bindings - Fixed SQLITE_TRANSIENT for all string bindings preventing crashes
- FTS5 (Full-Text Search 5) auto-sync triggers - Full-text search index automatically syncs with vector database
- Smart three-tier search - Semantic β FTS keywords β Sample fallback ensures results always found
- Keyword extraction - Stop-word filtering for natural language queries to FTS5
- Extended timeouts - 3 minute request / 10 minute resource timeout for large RAG queries
- Robust JOIN queries - FTS5 external content tables properly joined for complete data retrieval
- Search History - Recent and saved searches with persistence
- Email Statistics Dashboard - Comprehensive analytics with Charts
- Sentiment Dashboard - Analyze email tone using NaturalLanguage
- Email Diff View - Compare emails side-by-side with highlighting
- Spotlight Integration - Find emails via macOS system search
- Quick Look Preview - Space bar preview (native macOS)
- Notification Center - Reminders and follow-ups
- Smart Reply Suggestions - AI-powered replies with tone options
- Meeting/Event Extractor - Extract calendar events with EventKit
- Batch Operations Toolbar - Multi-select tag, star, export, print
- Contact Exporter - Export to vCard (Virtual Contact File), CSV (Comma-Separated Values), or Address Book
- Retrieval-Augmented Generation built entirely in Swift
- Vector database with SQLite + FTS5 full-text search
- Semantic search via Ollama embeddings
- Smart question routing for optimal context selection
- Conversation memory for follow-up questions
- Custom system prompts for personalized AI behavior
- Natural language queries about your email archive
- Real-time AI responses with source citations
- Debug panel to inspect AI prompts
- Export conversations to Markdown/JSON
- Temperature controls to reduce hallucinations
- OpenAI API - GPT-4o for advanced capabilities
- Google Cloud AI - Vertex AI, Vision, Speech
- Microsoft Azure - Cognitive Services
- AWS AI Services - Bedrock, Rekognition, Polly
- IBM Watson - NLU (Natural Language Understanding), Speech, Discovery
- Comprehensive content monitoring
- Prohibited use detection (100+ patterns)
- Automatic blocking of illegal/harmful content
- Crisis resource referrals
- Legal compliance (CSAM (Child Sexual Abuse Material) reporting, etc.)
MBox Explorer includes a native RAG (Retrieval-Augmented Generation) pipeline - no external frameworks required.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAG PIPELINE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β Query βββββΆβ Question βββββΆβ Retrieve βββββΆβ Augment β β
β β Input β β Router β β Context β β Prompt β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ β
β β β β
β βΌ βΌ β
β ββββββββββββ ββββββββββββ β
β β Vector β β LLM β β
β β DB β β Generate β β
β ββββββββββββ ββββββββββββ β
β β β
β βΌ β
β ββββββββββββ β
β β Response β β
β β + Sourcesβ β
β ββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Feature | Implementation |
|---|---|
| Storage | SQLite database (~/Library/Application Support/MBoxExplorer/vectors.db) |
| Full-text search | FTS5 with ranking |
| Vector storage | Float arrays as BLOBs (Binary Large Objects) |
| Indexing | Batch processing with progress |
MBox Explorer supports 4 embedding providers - choose based on your needs:
| Provider | Cost | Privacy | Speed | Quality | Setup |
|---|---|---|---|---|---|
| Ollama | Free | 100% Local | Fast | Good | brew install ollama && ollama pull nomic-embed-text |
| MLX (Machine Learning eXtensions) | Free | 100% Local | Very Fast | Good | Built-in (Apple Silicon only) |
| OpenAI | $0.02/1M tokens | Cloud | Fast | Excellent | API key required |
| Sentence Transformers | Free | 100% Local | Medium | Excellent | pip install sentence-transformers |
1. Ollama Embeddings (Recommended for most users)
| Aspect | Details |
|---|---|
| Pros | Free, private, runs locally, easy setup, multiple models |
| Cons | Requires Ollama daemon running |
| Models | nomic-embed-text (768d), all-minilm (384d), mxbai-embed-large (1024d) |
| Best for | Users who want local, private semantic search |
2. MLX Embeddings (Best for Apple Silicon users)
| Aspect | Details |
|---|---|
| Pros | Native Apple Silicon, fastest inference, no external dependencies |
| Cons | macOS only, Apple Silicon required, model download on first use |
| Models | all-MiniLM-L6-v2 (384d), nomic-embed-text-v1.5 (768d), bge-small-en-v1.5 (384d) |
| Best for | M1/M2/M3 Mac users wanting maximum performance |
3. OpenAI Embeddings (Best quality)
| Aspect | Details |
|---|---|
| Pros | Highest quality, well-documented, reliable |
| Cons | Costs money, data sent to cloud, requires API key |
| Models | text-embedding-3-small (1536d, $0.02/1M), text-embedding-3-large (3072d, $0.13/1M) |
| Best for | Users who prioritize quality and don't mind cloud processing |
4. Sentence Transformers (Best flexibility)
| Aspect | Details |
|---|---|
| Pros | Excellent quality, huge model selection, local processing |
| Cons | Requires Python, slower startup, larger disk footprint |
| Models | Any HuggingFace sentence-transformers model |
| Best for | ML enthusiasts who want model flexibility |
- Storage: Embeddings stored in SQLite as binary data
- Chunking strategy: Subject + first 500 characters of body
- Dimension tracking: Automatically tracked per provider
- Provider switching: Change in Settings β AI β Embedding Provider
// Three search modes with automatic fallback:
1. Semantic Search (if Ollama available)
β Generate query embedding
β Cosine similarity against stored embeddings
β Return top 20 results
2. Keyword Search (FTS5 fallback)
β FTS5 MATCH query
β Ranked by relevance
β Snippet extraction
3. Direct Search (no indexing required)
β In-memory text matching
β Score by term frequency
β Bonus for subject/sender matchesThe pipeline automatically detects question types and optimizes context:
| Question Type | Example | Context Used |
|---|---|---|
STATISTICS |
"How many emails?" | Metadata only |
TOP_LIST |
"Who sent the most?" | Metadata + samples |
DATE_RANGE |
"What's the date range?" | Metadata only |
CONTENT_SEARCH |
"Find emails about project X" | Full RAG search |
SUMMARY |
"Summarize main themes" | Extended context (15 emails) |
FOLLOW_UP |
"Tell me more" | Previous conversation + search |
The prompt sent to the LLM (Large Language Model) includes:
MAILBOX STATISTICS:
- Total emails: [count]
- Date range: [start] - [end]
- Total threads: [count]
- Unique senders: [count]
- Top senders: [list with counts]
PREVIOUS CONVERSATION: (if memory enabled)
[Recent Q&A turns for context]
RETRIEVED EMAILS:
From: [sender]
Subject: [subject]
Date: [date]
Content: [snippet]
---
[...more relevant emails...]
USER QUESTION: [query]
| Setting | Default | Purpose |
|---|---|---|
| Q&A Temperature | 0.2 | Low for factual accuracy |
| Summary Temperature | 0.3 | Slightly higher for synthesis |
| Creative Temperature | 0.7 | Higher for varied output |
| Max Conversation History | 10 turns | Follow-up context |
Perfect for enterprise email migration, compliance review, legal discovery, and archiving old mailboxes.
- Natural language queries - Ask questions about your emails in plain English
- Source citations - See which emails were used to generate answers
- Debug panel - Inspect the full prompt sent to the AI
- Conversation memory - Follow-up questions maintain context
- Export conversations - Save Q&A sessions as Markdown or JSON
- Custom system prompts - Modify AI behavior in settings
- Smart filters - Filter by sender, date, size, attachments
- Thread detection - Group related emails
- Duplicate finder - Identify duplicate messages
- Statistics dashboard - Email counts, top senders, date ranges
- Network visualization - See communication patterns
- Attachment browser - Browse and export attachments
| Backend | Type | Cost | Features |
|---|---|---|---|
| Ollama | Local | Free | LLM + Embeddings |
| MLX | Local | Free | Apple Silicon optimized LLM |
| TinyChat | Local | Free | Fast chatbot with OpenAI-compatible API |
| TinyLLM | Local | Free | Lightweight LLM server |
| OpenWebUI | Self-hosted | Free | Web interface |
| OpenAI | Cloud | Paid | GPT-4o |
| Google Cloud | Cloud | Paid | Vertex AI |
| Azure | Cloud | Paid | Cognitive Services |
| AWS | Cloud | Paid | Bedrock |
| IBM Watson | Cloud | Paid | NLU |
MBox Explorer proudly supports TinyChat and TinyLLM - two excellent open-source projects by Jason Cox.
TinyChat is a lightweight, fast chatbot interface with an OpenAI-compatible API. It's perfect for:
- Quick local inference without heavy dependencies
- Privacy-first AI - all processing stays on your machine
- Easy setup - minimal configuration needed
- OpenAI API compatibility - works seamlessly with existing tools
TinyLLM is a minimalist LLM server that provides:
- Lightweight deployment - runs on modest hardware
- OpenAI-compatible endpoints - drop-in replacement
- Local-first architecture - your data never leaves your device
- Active development - regularly updated with new features
# TinyChat - Fast chatbot interface
git clone https://github.com/jasonacox/tinychat.git
cd tinychat
pip install -r requirements.txt
python server.py # Starts on localhost:8000
# TinyLLM - Lightweight LLM server
git clone https://github.com/jasonacox/TinyLLM.git
cd TinyLLM
pip install -r requirements.txt
python server.py # Starts on localhost:8000- Start TinyChat or TinyLLM server
- Open MBox Explorer β Settings (ββ₯A)
- Select "TinyChat" or "TinyLLM" as your AI Backend
- Default endpoint:
http://localhost:8000 - Start using AI features!
| Feature | TinyChat | TinyLLM |
|---|---|---|
| Text Generation | β | β |
| Embeddings | β | β |
| Streaming Responses | β | β |
| OpenAI API Compatibility | β | β |
| Local Processing | β | β |
Attribution: TinyChat and TinyLLM are created by Jason Cox. We're grateful for his excellent work making local AI accessible to everyone.
| Provider | Type | Cost | Dimensions | Speed |
|---|---|---|---|---|
| Ollama | Local | Free | 384-1024 | Fast |
| MLX | Local | Free | 384-768 | Very Fast |
| OpenAI | Cloud | Paid | 1536-3072 | Fast |
| Sentence Transformers | Local | Free | 384-768+ | Medium |
open MBox-Explorer-latest.dmg
# Drag to Applicationscd "/Volumes/Data/xcode/MBox Explorer"
xcodebuild -scheme "MBox Explorer" -configuration Release build
cp -R build/Release/*.app ~/Applications/# Install Homebrew (if not already installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Verify Homebrew installation
brew --versionOllama provides both LLM and embedding capabilities locally on your Mac.
# Install Ollama
brew install ollama
# Start Ollama service (runs in background)
ollama serve
# Or start Ollama as a background service that auto-starts on login
brew services start ollamaPull Required Models:
# LLM Models (for chat/Q&A) - choose one or more:
ollama pull mistral:latest # 7B params, good balance of speed/quality
ollama pull llama3.2:latest # Meta's latest, very capable
ollama pull gemma2:2b # Smaller, faster
ollama pull phi3:latest # Microsoft's efficient model
# Embedding Models (for semantic search) - choose one:
ollama pull nomic-embed-text # Recommended - 768 dimensions, good quality
ollama pull all-minilm # Smaller - 384 dimensions, faster
ollama pull mxbai-embed-large # Larger - 1024 dimensions, best qualityVerify Ollama is working:
# Check Ollama is running
curl http://localhost:11434/api/tags
# Test embedding generation
curl http://localhost:11434/api/embeddings -d '{"model": "nomic-embed-text", "prompt": "Hello"}'MLX runs natively on Apple Silicon (M1/M2/M3/M4) with no external dependencies.
# No installation required!
# MLX is built into MBox Explorer
# Just select "MLX" in:
# Settings β AI β Embedding Provider
# Models download automatically on first use (~100-500MB per model)
# Stored in: ~/Library/Application Support/MBoxExplorer/MLXModels/Available MLX Embedding Models:
all-MiniLM-L6-v2(384 dimensions) - Default, fastnomic-embed-text-v1.5(768 dimensions) - Better qualitybge-small-en-v1.5(384 dimensions) - Alternativebge-base-en-v1.5(768 dimensions) - Best quality
OpenAI provides the highest quality embeddings but requires an API key and costs money.
# 1. Get an API key:
# - Go to https://platform.openai.com/
# - Sign in or create account
# - Navigate to API Keys section
# - Create new secret key
# - Copy the key (starts with sk-)
# 2. Enter key in MBox Explorer:
# Settings β AI β Cloud API Keys β OpenAI
# 3. Select OpenAI in:
# Settings β AI β Embedding ProviderPricing (as of 2024):
| Model | Dimensions | Cost per 1M tokens |
|---|---|---|
| text-embedding-3-small | 1536 | $0.02 |
| text-embedding-3-large | 3072 | $0.13 |
| text-embedding-ada-002 | 1536 | $0.10 (legacy) |
Estimate: ~250,000 emails = ~$0.50-$5.00 depending on email length and model.
Sentence Transformers offers the widest model selection via Python.
Step 1: Install Python (if not already installed)
# Check if Python 3 is installed
python3 --version
# If not installed, install via Homebrew:
brew install python@3.11
# Or install via official installer:
# https://www.python.org/downloads/Step 2: Install sentence-transformers
# Using pip (recommended)
pip3 install sentence-transformers
# Or using pip with user flag (if permission issues)
pip3 install --user sentence-transformers
# Or using virtual environment (cleanest)
python3 -m venv ~/mbox-env
source ~/mbox-env/bin/activate
pip install sentence-transformersStep 3: Verify Installation
# Test that sentence-transformers works
python3 -c "from sentence_transformers import SentenceTransformer; print('OK')"Step 4: Configure in MBox Explorer
# 1. Set Python path in Settings if using non-standard location:
# Settings β AI β Sentence Transformers β Python Path
# Default: /usr/bin/python3
# Homebrew: /opt/homebrew/bin/python3
# Virtual env: ~/mbox-env/bin/python
# 2. Select Sentence Transformers in:
# Settings β AI β Embedding ProviderAvailable Models (auto-downloaded on first use):
| Model | Dimensions | Size | Quality |
|---|---|---|---|
| all-MiniLM-L6-v2 | 384 | 80MB | Good |
| all-mpnet-base-v2 | 768 | 420MB | Better |
| paraphrase-MiniLM-L6-v2 | 384 | 80MB | Good for paraphrase |
| multi-qa-MiniLM-L6-cos-v1 | 384 | 80MB | Optimized for Q&A |
Ollama not connecting:
# Check if Ollama is running
ps aux | grep ollama
# Restart Ollama
brew services restart ollama
# Or manually:
killall ollama && ollama servePython/pip not found:
# Add Homebrew Python to PATH
echo 'export PATH="/opt/homebrew/bin:$PATH"' >> ~/.zshrc
source ~/.zshrcsentence-transformers import error:
# Install with all dependencies
pip3 install sentence-transformers torch transformersMLX models not downloading:
# Check internet connection
# Models are downloaded from huggingface.co
# Check disk space in ~/Library/Application Support/MBoxExplorer/- Launch MBox Explorer
- Open an MBOX file (File β Open or βO)
- Browse emails in the list view
- Ask AI - Click "Ask AI" in sidebar for natural language queries
- Statistics questions: "How many emails?", "Who are the top senders?"
- Content search: "Find emails about [topic]"
- Summaries: "Summarize the main themes"
- Follow-ups: "Tell me more about that" (uses conversation memory)
Click "Index Emails" for:
- Faster searches on large archives
- Semantic search (finds conceptually related emails)
- Better relevance ranking
Without indexing, basic text search still works.
Access via gear icon (βοΈ) in Ask AI view:
- Conversation Memory: Enable/disable, set history length
- Custom System Prompt: Modify AI instructions
- Debug Mode: See full prompts sent to AI
Access via AI Settings:
- Q&A Temperature (0.0-1.0): Lower = more factual
- Summary Temperature: For email summaries
- Creative Temperature: For open-ended tasks
All AI operations are monitored for:
- β Legal compliance
- β Ethical use
- β Safety
- β Privacy protection
- Local processing: Ollama/MLX run entirely on your Mac
- No cloud required: Cloud AI is optional
- Your data stays yours: Emails never leave your device unless you choose cloud AI
- SQL (Structured Query Language) Injection Prevention -- All database queries in ConversationDatabase now use parameterized bindings instead of string interpolation
- API Key Protection -- OpenWebUI API key migrated from plaintext storage to macOS Keychain
Author: Jordan Koch (@kochj23)
Built with:
- SwiftUI
- SQLite (FTS5 + Vector storage)
- Ollama API
- Native macOS APIs
Architecture:
- MVVM (Model-View-ViewModel) pattern
- Native RAG pipeline
- Multi-backend AI support
- Ethical safeguards
- macOS WidgetKit Widget - View email stats from Notification Center
- Small widget: Email count and loaded file
- Medium widget: Stats + Top 3 senders
- Large widget: Stats + Senders + Recent queries + Quick search
- App Group Data Sharing - Secure data sync between app and widget
- Auto-Sync on Load - Widget updates when mbox files are loaded
- SharedDataManager - Unified data management for widget integration
- Critical bug fixes for RAG pipeline reliability:
- Fixed memory corruption crash (EXC_BAD_ACCESS) with SQLITE_TRANSIENT bindings
- Fixed FTS5 index not syncing with email_vectors table (added triggers)
- Fixed FTS5 queries returning NULL data (added proper JOINs)
- Fixed RAG returning "0 sources" for natural language queries
- Smart three-tier search fallback:
- 1st: Semantic search via embeddings
- 2nd: FTS5 keyword search with stop-word extraction
- 3rd: Sample of recent emails when search terms don't match
- Extended timeouts: 3 minute request / 10 minute resource for large contexts
- Improved keyword extraction: Filters common stop words for better FTS5 matches
- 12 New Features for enhanced productivity:
- Search History - Recent and saved searches with persistence
- Email Statistics Dashboard - Comprehensive analytics with Charts visualizations
- Spotlight Integration - Find emails via macOS system search
- Quick Look Preview - Space bar preview for emails (native macOS Quick Look)
- Batch Operations Toolbar - Multi-select tag, star, export, print operations
- Sentiment Dashboard - Email sentiment analysis using NaturalLanguage framework
- Smart Reply Suggestions - AI-powered reply generation with tone options
- Meeting/Event Extractor - Extract calendar events from emails with EventKit integration
- Notification Center Integration - Reminders and follow-up notifications
- Email Diff View - Compare emails side-by-side with diff highlighting
- Contact Exporter - Export contacts to vCard, CSV, or Address Book
- 4 Embedding Providers: Ollama, MLX, OpenAI, Sentence Transformers
- Provider comparison table with pros/cons
- Automatic provider detection and fallback
- MLX native Apple Silicon embeddings
- OpenAI text-embedding-3-small/large support
- Python bridge for sentence-transformers
- Unified EmbeddingManager for all providers
- Native RAG pipeline implementation
- Ask AI interface with conversation memory
- Smart question routing
- Debug panel for prompt inspection
- Export conversations
- Direct search fallback (no indexing required)
- Temperature controls
- Custom system prompts
- Added 5 cloud AI providers
- Added ethical safeguards
- AI backend status menu
- Auto-fallback system
- MBOX file parsing
- Email browsing and search
- Export capabilities
- Basic AI integration
- GitHub Issues: Report bugs
- Documentation: See project files
- 988 - Suicide Prevention Lifeline
- 741741 - Crisis Text Line (text HOME)
- 1-800-799-7233 - Domestic Violence Hotline
MIT License - See LICENSE file
Ethical Usage Required - See ETHICAL_AI_TERMS_OF_SERVICE.md
MBox Explorer - AI-Powered Email Archive Analysis
Β© 2026 Jordan Koch. All rights reserved.
| App | Description |
|---|---|
| MailSummary | AI-powered email categorization and summarization |
| ExcelExplorer | Native macOS Excel/CSV file viewer |
| RsyncGUI | Native macOS GUI for rsync file synchronization |
| TopGUI | macOS system monitor with real-time metrics |
| DotSync | Configuration file synchronization across machines |
Disclaimer: This is a personal project created on my own time. It is not affiliated with, endorsed by, or representative of my employer.
