Local-first RAG system with:
- FastAPI backend
- React + Vite frontend
- LM Studio (OpenAI-compatible) for generation
- Brave Search for fresh web retrieval
- SQLite + optional ChromaDB for offline retrieval
- Deterministic tabular analytics for Excel documents
- Time-series forecasting for ingested spreadsheets with structured forecast + chart payloads for the UI
- Added predictive forecasting path: baseline linear-trend forecasts with optional
forecastand Recharts-orientedchartonPOST /api/chatand on SSEmeta/donewhen ingested data and query intent match. - Deterministic analytics answers are formatted as markdown via
backend/analytics/display_markdown.py. - Added deterministic tabular analytics for uploaded Excel files (
.xlsx,.xls). - Added analytics metadata migrations in
backend/migrations/. - Added typed analytics schema and dataset profiling support.
- Integrated analytics routing inside
ChatService:- heuristic query routing (aggregation/list/filter intent),
- restricted JSON planning,
- validated execution over ingested sheet tables.
- Refactored backend into clearer layers:
domain,integrations,repositories,services, andanalytics. - Expanded decoupled context-budget controls for web/document retrieval blending.
- Retrieves fresh web context at query time.
- Archives sources and supports offline recall.
- Lets you upload and chat with PDF/Excel documents.
- Streams chat responses over SSE.
- Runs deterministic spreadsheet analytics when document queries are tabular.
- Runs baseline time-series forecasting on ingested Excel when the query is predictive, returning structured forecast data and a chart spec for the frontend.
- Python 3.10+ (3.11 recommended)
- Node.js 18+ (for frontend)
- LM Studio with local server enabled
- Brave Search API key (for online retrieval)
Install backend dependencies:
pip install -r requirements.txt
playwright install- Create and activate virtual environment:
# Windows (PowerShell)
python -m venv .venv
.venv\Scripts\Activate.ps1
# macOS/Linux
python -m venv .venv
source .venv/bin/activate-
Set environment variables (via shell or
.envfile in repo root). -
Start backend:
uvicorn backend.app:app --host 0.0.0.0 --port 8000- Start frontend:
Put Node.js 18+ on your PATH once (new terminals pick it up automatically):
- Add your Node install’s
bindirectory to~/.bashrcor~/.zshrc, or use a version manager (nvm, fnm, mise). - Example tarball layout under
~/.local(Linux): prepend$HOME/.local/node-v22.14.0-linux-x64/bintoPATHin~/.bashrc, then runsource ~/.bashrcor open a new terminal.
If Vite uses a port other than 5173 (because 5173 is busy), set CORS_ORIGINS to a comma-separated list that includes that origin (see Environment Variables).
cd frontend
npm install
npm run devBackend docs:
- Swagger:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc - OpenAPI:
http://localhost:8000/openapi.json
backend/
app.py # FastAPI routes + startup + migrations
config.py # Environment settings + runtime overrides
archive.py # SQLite archive initialization
documents.py # PDF/Excel extraction + chunking + ingestion
scraper.py # Web scraping/clean text extraction
freshness.py # Freshness source checks
vector_store.py # Chroma upsert/query helpers
domain/ # Shared domain models/utilities
integrations/ # LM Studio + Brave clients
repositories/ # Archive/document/analytics data access
services/ # Chat + health orchestration
analytics/ # Routing, planning, forecasting, chart specs, validation, SQL compile, execution
migrations/ # SQL migrations for analytics metadata
frontend/
src/components/ # Chat/archive/documents/settings UI
src/lib/ # API client + hooks + shared types/utilities
src/store/ # Chat state
Core:
BRAVE_API_KEYLM_STUDIO_BASE_URL(default:http://localhost:1111/v1)MODEL_NAME(default:rnj-1)DB_PATH(default:knowledge.db)CORS_ORIGINS(comma-separated browser origins; default:http://localhost:5173,http://127.0.0.1:5173— add e.g.http://localhost:5174if Vite picks the next port)REQUEST_TIMEOUT_S(default:10)
Retrieval:
MAX_SEARCH_RESULTS(default:3)MAX_CHARS_PER_SOURCE(default:2000)OFFLINE_RETRIEVAL_MODE(keywordorsemantic, default:keyword)SEMANTIC_TOP_K(default:3)CHROMA_DIR(default:chroma_db)EMBED_MODEL_NAME(default:sentence-transformers/all-MiniLM-L6-v2)
Decoupled RAG budgets:
WEB_TOP_K(default:3)DOC_SEMANTIC_TOP_K(default:12)DOC_KEYWORD_TOP_K(default:20)WEB_MAX_CHARS(default:2000)DOC_MAX_CHARS(default:0, unlimited)TOTAL_CONTEXT_BUDGET(default:14000)WEB_BUDGET_FRACTION(default:0.4)
Document processing:
UPLOAD_DIR(default:uploads)MAX_UPLOAD_MB(default:25)
Tabular analytics:
ENABLE_TABULAR_ANALYTICS(default:true)ANALYTICS_GROUPBY_TOP_N_DEFAULT(default:50)
There is no ranking, routing, or automatic selection among multiple candidate models at runtime. Configuration is explicit:
| Role | Mechanism | Default |
|---|---|---|
| Chat / reasoning LLM | MODEL_NAME in backend/config.py → Settings.model_name → every OpenAI-compatible /chat/completions call in backend/integrations/llm_client.py |
rnj-1 |
| LM Studio endpoint | LM_STUDIO_BASE_URL in backend/config.py |
http://localhost:1111/v1 |
| Embedding model (RAG) | EMBED_MODEL_NAME → Chroma SentenceTransformerEmbeddingFunction |
sentence-transformers/all-MiniLM-L6-v2 |
| Baseline forecasts | Fixed algorithm: sklearn.linear_model.LinearRegression on prepared time series (not env-selectable) |
— |
Details:
- The chat model id must match what LM Studio exposes for loaded models. The backend does not pick a model from
/models; that endpoint is only used for health checks. LLMClientis built per request inchat_service()fromget_settings(), soPOST /api/configupdates tomodel_name(seeConfigUpdate) apply on the next chat call.EMBED_MODEL_NAMEis used for all Chroma work ingather_contexts/ document retrieval and document ingestion. It is not inConfigUpdate, so embeddings are effectively environment-driven (or test overrides), unlikemodel_namein the settings API.OFFLINE_RETRIEVAL_MODE(keywordvssemantic) controls whether embedding search runs for web archive and document fallbacks, not which embedding model; the model name always comes fromEMBED_MODEL_NAME.
The same configured LLM is used for JSON-oriented helpers, e.g. QueryDecomposer and extract_json inside ChatService.get_answer.
Non-streaming and streaming share the same branches; streaming emits SSE (meta, token, done, error) instead of a single JSON body.
flowchart TD
subgraph client [Client]
UI[React UI]
end
subgraph api [FastAPI]
ChatPOST["POST /api/chat or /api/chat/stream"]
Deps["chat_service: LLMClient plus Brave plus repos from Settings"]
end
subgraph chat [ChatService]
DocBranch{"include_documents and document_ids?"}
Decomp{"tabular analytics and QueryDecomposer?"}
LLMDecompose["LLM: QueryDecomposer.decompose"]
Intent{"intent?"}
ForecastExec["Execute forecast plan plus optional LLM narration"]
AnalyticsExec["Execute analytics plan SQL"]
LegacyA["Legacy: keyword predictive plus try_analytics"]
Gather["gather_contexts: Brave archive Chroma doc chunks"]
Cache{"OFFLINE_ARCHIVE and cached answer?"}
Extract["LLM: extract_json structured answer"]
UsableCtx{"usable context?"}
Complete["LLM: complete or stream"]
end
subgraph retrieval [Retrieval layer]
Brave[Brave Search plus scrape]
SQLite[SQLite archive]
Chroma[Chroma plus SentenceTransformer embeddings]
DocRepo[Document chunk search keyword or semantic]
end
UI --> ChatPOST
ChatPOST --> Deps
Deps --> DocBranch
DocBranch -->|yes| Decomp
Decomp -->|yes| LLMDecompose
LLMDecompose --> Intent
Intent -->|forecast| ForecastExec
Intent -->|analytics| AnalyticsExec
Intent -->|cannot_answer| EarlyReturn[Return reason OFFLINE_ARCHIVE]
Intent -->|other or failed| Gather
Decomp -->|no| LegacyA
LegacyA -->|ChatResult| Return1[Return]
LegacyA -->|prefix or none| Gather
DocBranch -->|no| Gather
Gather --> Brave
Gather --> SQLite
Gather --> Chroma
Gather --> DocRepo
Gather --> Cache
Cache -->|hit| ReturnCached[Return cached text]
Cache -->|miss| Extract
Extract -->|valid JSON answer| ReturnExtract[Return with citation]
Extract -->|no| UsableCtx
UsableCtx -->|no for offline| ReturnErr[Return guidance message]
UsableCtx -->|yes| Complete
Complete --> ReturnLLM[Return LLM body]
ForecastExec --> Return1
AnalyticsExec --> Return1
Sequence (matches the diagram):
- Request hits
backend/api/routers/chat.py;chat_service()buildsChatServicewithLLMClient(base_url, model_name, timeout)fromget_settings(). - Document-scoped path (
include_documentsanddocument_ids): when tabular analytics and metadata exist, QueryDecomposer calls the LLM for intent (forecast/analytics/cannot_answer). Success yields deterministic analytics or forecast output (forecast may add LLM narration viaAnalyticsChatRunner). - Legacy analytics if the decomposer path is unavailable: keyword predictive routing and
try_analyticsmay return before RAG. gather_contexts(backend/services/chat/context.py): online (Brave + scrape + optional Chroma upsert whenoffline_retrieval_mode == semantic) and offline web (keyword SQLite vs semantic Chroma), plus document hybrid retrieval (intent-based exact search, then semantic or keyword chunks).prefer_modeandinclude_webdrive ONLINE vs OFFLINE vs LOCAL_WEIGHTS.- Optional metadata outline when RAG is empty but a spreadsheet summary exists (
_augment_contexts_with_document_summary). - OFFLINE_ARCHIVE: try a cached answer from the archive.
- LLM
extract_json: structured answer from contexts; on success may persist to the archive in ONLINE mode. - If needed,
completeorstreamwith the full RAG prompt; ONLINE may save the final answer to the archive.
- Settings and env:
backend/config.py - LLM calls:
backend/integrations/llm_client.py - Chat orchestration:
backend/services/chat_service.py - Retrieval and modes:
backend/services/chat/context.py - Embeddings / Chroma:
backend/vector_store.py - Runtime model URL and name via API:
backend/api/routers/settings.py,backend/api/schemas.py(ConfigUpdate) - Forecast math:
backend/analytics/forecaster.py
Core/chat:
GET /- service infoPOST /api/chat- non-streaming chat response; may include optionalforecastandchartwhen the predictive short-circuit appliesPOST /api/chat/stream- SSE chat stream; event shapes are described under Chat response and SSE payloads below
REST (POST /api/chat)
The JSON body matches the OpenAPI ChatResponse model. Fields forecast and chart are present only when the request hits the predictive forecasting path; they are omitted for normal RAG and for deterministic tabular analytics that do not produce a forecast.
SSE (POST /api/chat/stream)
Event types: meta, token, done, error.
meta— Always includesmode,sources, andconversation_id. Optionally includes:forecastandcharton the predictive path (same shapes as the REST response).analytics_unavailable—{ "reason", "hint" }when documents were in scope but tabular analytics could not run (normal RAG continues with this hint on the stream).
token—{ "text": "..." }(answer fragments).done— Always includesfinal_text. On the predictive path, also includesforecastandchart.error—{ "code", "message" }.
sources entries may set source_kind to web, document, analytics, or archive (see the Source model in OpenAPI).
Archive:
GET /api/archive/search?q=...GET /api/archive/page/{url_hash}
Documents:
POST /api/documents/uploadGET /api/documentsGET /api/documents/{document_id}DELETE /api/documents/{document_id}
Settings/health:
GET /api/settingsPOST /api/configGET /api/health
Freshness:
GET /api/freshnessGET /api/freshness/{source_id}GET /api/freshness/sources/listPOST /api/freshness/reload
Legacy compatibility:
GET /freshness?query=...
{
"query": "How many users signed up in 2020?",
"conversation_id": "optional-id",
"prefer_mode": "ONLINE",
"include_web": true,
"include_documents": true,
"document_ids": ["optional-doc-id"]
}Notes:
- If
include_documents=truewith scopeddocument_ids, predictive questions can short-circuit to the forecasting path (returningforecastandchart) before generic RAG, similar to the tabular analytics short-circuit. - If
include_documents=true, analytics routing can short-circuit normal RAG flow for spreadsheet-style (aggregation/list/filter) questions. - If no context is available, mode falls back to
LOCAL_WEIGHTS.
knowledge.db(SQLite archive + document + analytics metadata)knowledge.db-wal/knowledge.db-shm(SQLite WAL sidecar files)uploads/(uploaded files)chroma_db/(if semantic mode is used)
Tests need the same Python dependencies as the app (Pydantic, FastAPI, pandas, etc.). Do not rely on the system-wide pytest from apt unless those packages are installed there; use a project virtualenv:
python3 -m venv .venv
.venv/bin/pip install -r requirements.txtRun all tests:
.venv/bin/pytest -qDeterministic analytics contract tests:
.venv/bin/pytest -q tests/test_analytics_deterministic.pyLLM end-to-end chat QA (rubric from tests/test.md, cases in tests/qa_rubric.json):
- Set
RUN_CHAT_LLM_E2E=1. - Provide a workbook: copy your file to
tests/fixtures/Advanced_Sales_Dataset.xlsx, or setFRESHNESS_QA_WORKBOOKto an absolute path. - LM Studio must accept HTTP from the same machine (or WSL host) that runs pytest. Before tests, the fixture probes
GET {LM_STUDIO_BASE_URL}/models. - Set
LM_STUDIO_BASE_URLto the exact base URL from LM Studio’s Local Server tab (typically ends with/v1). The app default ishttp://localhost:1111/v1; LM Studio often uses port1234, e.g.http://127.0.0.1:1234/v1. - Set
MODEL_NAMEto a model id that server exposes (must match what/v1/modelslists for loaded models). - WSL2: if LM Studio runs on Windows,
localhostinside Linux may not reach it. Use the Windows host IP (seenameserverin/etc/resolv.conf), e.g.http://172.22.32.1:1234/v1. - Optional:
SKIP_LLM_HEALTHCHECK=1to skip the probe (not recommended unless you know connectivity is fine).
export LM_STUDIO_BASE_URL=http://127.0.0.1:1234/v1
export MODEL_NAME=your-model-id-from-lm-studio
RUN_CHAT_LLM_E2E=1 .venv/bin/pytest -q tests/test_chat_rubric_llm_e2e.pyWithout RUN_CHAT_LLM_E2E, those tests are skipped so default pytest runs stay offline-friendly.
If you set RUN_CHAT_LLM_E2E=1 but all rubric tests skip, the workbook is missing: add tests/fixtures/Advanced_Sales_Dataset.xlsx or set FRESHNESS_QA_WORKBOOK. Run pytest -rs to print skip reasons.
- Treat web/document content as untrusted input.
- Prompt-injection defenses are applied in prompts, but source text can still be adversarial.
- Do not upload sensitive files unless your local machine/storage is secured.
ModuleNotFoundError: No module named 'pydantic'(or similar) when running pytest: you are using a Python that does not haverequirements.txtinstalled—typically systempytestfrom apt. Create.venv, run.venv/bin/pip install -r requirements.txt, and invoke.venv/bin/pytest.- LM Studio unreachable during chat rubric tests: confirm Local Server is running; align
LM_STUDIO_BASE_URLwith the shown URL (often:1234/v1, not:1111). On WSL2 with LM Studio on Windows, use the Windows host IP, notlocalhost. Quick check:curl -sS "$LM_STUDIO_BASE_URL/models" | head. SetMODEL_NAMEto an id from that response. UseSKIP_LLM_HEALTHCHECK=1only to bypass the test probe. - Brave search failures: verify
BRAVE_API_KEY. - Empty web extraction: target page may block scraping or require heavy JS.
- Upload errors: verify file extension and
MAX_UPLOAD_MB. - Analytics not triggering: ensure
ENABLE_TABULAR_ANALYTICS=trueand query has tabular intent (count/list/filter/grouping). - Forecast or
chartmissing: requires ingested forecast artifacts on the selected documents, a predictive-style question,include_documents=true, anddocument_idsset. For analytics behavior and contracts, seetests/test_analytics_deterministic.py.