2002yy · 2002yy · Jun 5, 2026 · Jun 5, 2026
diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@
 <p>
   <a href="https://github.com/2002yy/study-agent/actions/workflows/ci.yml"><img src="https://github.com/2002yy/study-agent/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
   <img src="https://img.shields.io/badge/python-3.12-blue" alt="Python 3.12">
-  <img src="https://img.shields.io/badge/tests-277%20passed-green" alt="277 tests passed">
+  <img src="https://img.shields.io/badge/tests-290%20passed-green" alt="290 tests passed">
 </p>
 
 A local AI learning assistant with long-term memory, role-based group chat,
@@ -17,7 +17,7 @@ Study Agent 是一个本地优先的 AI 学习助手，重点不是简单调用
 - **长期记忆**：Markdown memory + safe writer
 - **上下文分层**：fast / light / deep / archive
 - **联网搜索**：RSS / News fetch → article extraction → LLM digest → source tracing
-- **RAG MVP**：本地 Markdown / TXT / DOCX / PDF 索引、关键词 / 本地向量原型 / hybrid / backend-vector 检索、可配置 embedding provider、可选 Chroma 持久化、引用上下文、来源块、Streamlit 检索/调试面板、聊天注入和 FastAPI RAG 接口
+- **RAG MVP**：本地 Markdown / TXT / DOCX / PDF 索引、关键词 / 本地向量原型 / hybrid / backend-vector 检索、可配置 embedding provider、可选 Chroma 持久化、受控本地知识检索工具、引用上下文、来源块、Streamlit 检索/调试面板、聊天注入和 FastAPI RAG / chat / memory 基础接口
 - **工程安全**：SSRF protection、detect-secrets、配置模板
 - **工程质量**：pytest 测试套件、Ruff、GitHub Actions CI、打包检查
 
@@ -27,11 +27,11 @@ Study Agent 是一个本地优先的 AI 学习助手，重点不是简单调用
 - **Model routing** with fast / light / deep / archive context tiers
 - **Long-term memory** based on Markdown files and safe-writer persistence
 - **Web search pipeline**: feed registry → URL safety checks → article extraction → LLM digest → auditable source trace
-- **RAG MVP**: local Markdown / TXT / DOCX / PDF indexing, lexical / local vector prototype / hybrid / backend-vector retrieval, configurable embedding providers, optional Chroma persistence, citation-first context formatting, source blocks, a Streamlit retrieval/debug panel, optional chat injection, and FastAPI RAG endpoints
+- **RAG MVP**: local Markdown / TXT / DOCX / PDF indexing, lexical / local vector prototype / hybrid / backend-vector retrieval, configurable embedding providers, optional Chroma persistence, a controlled local-knowledge retrieval tool, citation-first context formatting, source blocks, a Streamlit retrieval/debug panel, optional chat injection, and FastAPI RAG / chat / memory foundation endpoints
 - **SSRF protection** for article fetching, **detect-secrets** in CI
 - **Batched session logging** and multi-layer caching for performance
 - **Performance budget**: mode-based `max_tokens` bounds on the main chat, WeChat, and news LLM paths
-- **277 pytest tests**, Ruff clean, mypy clean, GitHub Actions CI workflow
+- **290 pytest tests**, Ruff clean, mypy clean, GitHub Actions CI workflow
 
 For a detailed breakdown of the stack and engineering highlights, see [Technical Stack & Engineering Highlights](docs/TECH_STACK.md).
 
@@ -109,7 +109,7 @@ Study Agent 的定位很明确：**一个运行在你本地的、有长期记忆
 | **角色群聊** | 四位角色（三月七、刻晴、纳西妲、流萤）群聊讨论，各有独立人设 |
 | **联网搜索** | Google News + Bing News + RSSHub 多源聚合，页面正文三层提取 |
 | **来源追溯** | 搜索结果写入群聊记录，可回溯依据 |
-| **RAG MVP** | 本地 Markdown / TXT / DOCX / PDF 文档索引，前端面板返回带文件路径、行号、分数、命中词和 score breakdown 的引用片段，并可注入单人聊天和微信群互动回复；FastAPI 提供 `/health`、`/rag`、`/rag/index`、`/rag/query` |
+| **RAG MVP** | 本地 Markdown / TXT / DOCX / PDF 文档索引，前端面板返回带文件路径、行号、分数、命中词和 score breakdown 的引用片段，并可注入单人聊天和微信群互动回复；FastAPI 提供 `/health`、`/rag`、`/rag/index`、`/rag/query`、`/rag/status`、`/rag/upload`、`/rag/local-knowledge` |
 | **课后总结** | 学习完成后自动总结进展，用户确认后写入记忆 |
 | **长期记忆** | 学习者画像、进度追踪、项目上下文、当前焦点，多级记忆档案 |
 | **多 Provider** | 支持 OpenAI / DeepSeek / OpenRouter / SiliconFlow / 本地模型 |
@@ -233,7 +233,7 @@ RAG_EMBEDDING_PROVIDER=local_hash
 │   ├── llm_router.py       # 模型路由分发
 │   ├── context_builder.py  # 上下文构建
 │   ├── mode_manager.py     # 模式管理（版本/性能/氛围）
-│   ├── api.py              # FastAPI health / RAG endpoints
+│   ├── api.py              # FastAPI health / chat / memory / sessions / RAG endpoints
 │   ├── role_manager.py     # 角色加载与管理
 │   ├── performance_budget.py # 性能预算（max_tokens 分级）
 │   ├── memory.py           # 记忆系统
@@ -250,6 +250,7 @@ RAG_EMBEDDING_PROVIDER=local_hash
 │   ├── router.py           # 路由配置
 │   ├── news/               # 新闻聚合链路
 │   ├── rag/                # 本地 RAG MVP：加载、分块、索引、关键词/向量原型/embedding/可选后端检索
+│   ├── tools/              # 受控工具边界：本地知识检索等
 │   └── ui/                 # Streamlit UI 组件
 ├── tests/                  # pytest 测试套件
 ├── docs/                   # 设计文档与工程说明
@@ -270,7 +271,7 @@ RAG_EMBEDDING_PROVIDER=local_hash
 ## 测试
 
 ```bash
-pytest tests/ -v            # current local baseline: 277 passed
+pytest tests/ -v            # current local baseline: 290 passed
 pytest tests/ --cov=src     # 覆盖率
 ruff check src/ tests/      # linting
 mypy --explicit-package-bases src/  # type check
@@ -312,8 +313,8 @@ CI 通过 GitHub Actions 在 push / pull request 上运行，集成 `pytest`、`
 
 求职导向的技术演进路线：
 
-- [ ] FastAPI service layer (partial): `/health`, `/rag`, `/rag/index`, `/rag/query` implemented; `/chat` and `/memory` remain planned
-- [x] RAG MVP: Markdown / TXT / DOCX / PDF loading, chunking, local keyword retrieval, local vector prototype, hybrid retrieval, backend-vector retrieval, configurable embedding provider, optional Chroma adapter, citation context, source blocks, Streamlit retrieval panel, optional single-chat and WeChat interactive injection
+- [x] FastAPI service layer foundation: `/health`, `/chat`, `/memory/preview`, `/memory/commit`, `/sessions`, `/rag`, `/rag/index`, `/rag/query`, `/rag/status`, `/rag/upload` and `/rag/local-knowledge` implemented; streaming, auth and frontend-specific contracts remain planned
+- [x] RAG MVP: Markdown / TXT / DOCX / PDF loading, chunking, local keyword retrieval, local vector prototype, hybrid retrieval, backend-vector retrieval, configurable embedding provider, optional Chroma adapter, controlled local-knowledge retrieval, citation context, source blocks, Streamlit retrieval panel, optional single-chat and WeChat interactive injection
 - [ ] RAG document QA (partial): PDF parsing has file-size, page-count, extracted-text and encrypted-file guards; production embedding requires explicit API/env configuration and Chroma remains optional
 - [ ] Vector store: Chroma optional adapter implemented; FAISS local prototype and pgvector engineering version remain planned
 - [ ] Web UI: TypeScript + Vue3 / React, streaming chat, source panel

diff --git a/docs/INTERVIEW_NOTES.md b/docs/INTERVIEW_NOTES.md
@@ -10,7 +10,7 @@ Study Agent 是一个本地优先的 AI 学习助手，重点在多 Provider 模
 2. **长期记忆写入安全** — safe writer + preview/confirm 机制，防止不可逆的记忆污染
 3. **联网搜索来源追溯** — Feed registry / RSS 多源聚合 → URL safety matrix → 文章正文三层提取 → LLM digest → pipeline trace 全过程来源可回溯
 4. **Streamlit 重渲染性能优化** — 多层缓存策略、按模式批量落盘、主链路 token 预算控制
-5. **CI / Ruff / detect-secrets 工程检查** — 277 pytest tests、Ruff clean、mypy local clean、GitHub Actions workflow、detect-secrets 对未豁免发现硬阻断
+5. **CI / Ruff / detect-secrets 工程检查** — 290 pytest tests、Ruff clean、mypy local clean、GitHub Actions workflow、detect-secrets 对未豁免发现硬阻断
 
 ## 可讲亮点
 

diff --git a/docs/RAG.md b/docs/RAG.md
@@ -19,10 +19,11 @@ Implemented:
 - Streamlit retrieval panel for uploads, local paths, indexing, querying and citation preview
 - Optional single-chat and WeChat interactive reply injection through the `用于聊天回答` toggle
 - UI source blocks for retrieved file paths, line ranges, scores and matched terms
-- FastAPI endpoints: `GET /health`, `POST /rag`, `POST /rag/index`, `POST /rag/query`
+- FastAPI endpoints: `GET /health`, `POST /rag`, `POST /rag/index`, `POST /rag/query`, `GET /rag/status`, `POST /rag/upload`, `POST /rag/local-knowledge`
 - Streamlit knowledge/debug panel with index summary, document rows, chunk preview and score breakdowns
 - Optional vector backend interface with local fallback and Chroma adapter
 - Configurable embedding providers: deterministic `local_hash` by default, OpenAI-compatible embeddings when explicitly configured
+- Controlled local-knowledge retrieval tool with intent gating, deterministic query rewrite and explicit not-found behavior
 
 Not implemented yet:
 
@@ -44,7 +45,8 @@ Not implemented yet:
 | `src/rag/eval.py` | LLM-free retrieval quality evaluation over gold query fixtures |
 | `src/rag/service.py` | Application-facing helpers for indexing, querying and context formatting |
 | `src/rag/schema.py` | Dataclasses for documents, chunks, indexes and search results |
-| `src/api.py` | FastAPI health and RAG endpoints |
+| `src/tools/local_knowledge.py` | Controlled retrieval boundary for agentic local knowledge use |
+| `src/api.py` | FastAPI health, chat, memory, session, RAG and local-knowledge endpoints |
 
 ## Data Flow
 
@@ -56,7 +58,9 @@ local files
   -> save_rag_index
   -> query_documents
   -> build_rag_context
+  -> optional controlled local-knowledge tool
   -> optional single-chat / WeChat interactive prompt injection or FastAPI response
+  -> optional frontend-facing chat / memory / session API flow
 ```
 
 ## Retrieval Behavior
@@ -111,8 +115,10 @@ Regression coverage lives in `tests/test_rag.py` and verifies:
 - Local hash-vector and hybrid retrieval behavior
 - Citation formatting and context budget behavior
 - Streamlit RAG panel helpers for uploaded filenames and local path parsing
-- FastAPI `/health`, `/rag`, `/rag/index` and `/rag/query`
+- FastAPI `/health`, `/rag`, `/rag/index`, `/rag/query`, `/rag/status`, `/rag/upload` and `/rag/local-knowledge`
+- FastAPI `/chat`, `/memory/preview`, `/memory/commit`, `/sessions` and `/sessions/{session_id}/flush`
 - Prompt injection behavior for cited RAG context
+- Controlled local-knowledge tool behavior for skip / found / not-found / rewrite
 
 `tests/test_rag_eval.py` adds a small gold fixture suite under `tests/fixtures/rag_eval/` and verifies:
 
@@ -182,7 +188,19 @@ Goal: turn the Streamlit expander into a usable knowledge panel.
 
 Goal: let the model retrieve when it needs evidence instead of always pre-retrieving.
 
-- Add a `retrieve_local_knowledge(query)` tool boundary.
-- Route retrieval only for knowledge-grounded questions.
-- Allow query rewrite and second-pass retrieval when first-pass evidence is weak.
-- Require explicit "not found in local knowledge" behavior when no source is retrieved.
+- [x] Add a `retrieve_local_knowledge(query)` tool boundary.
+- [x] Route retrieval only for knowledge-grounded questions through deterministic intent gating.
+- [x] Allow deterministic query rewrite and second-pass retrieval when first-pass evidence is weak.
+- [x] Require explicit "not found in local knowledge" behavior when no source is retrieved.
+- [x] Expose the same boundary through `POST /rag/local-knowledge` for future frontends.
+- [ ] Add LLM tool-calling / function-calling integration; current implementation is controlled pre-generation retrieval, not free-form tool use.
+
+### P8: Service API Foundation
+
+Goal: expose the current local-first capabilities through stable API boundaries before building a separate web frontend.
+
+- [x] Add RAG status and upload endpoints for index inspection and rebuilds.
+- [x] Add a non-streaming `/chat` endpoint that reuses model routing, role prompts, memory bundles, local-knowledge retrieval and session logging.
+- [x] Add memory preview / commit endpoints with the same runtime write-mode guard as the Streamlit UI.
+- [x] Add session listing and force-flush endpoints for local session inspection.
+- [ ] Add streaming chat, auth, CORS policy and frontend-oriented error envelopes before public or LAN deployment.
diff --git a/docs/STUDY_AGENT_OPTIMIZATION_ROADMAP.md b/docs/STUDY_AGENT_OPTIMIZATION_ROADMAP.md
@@ -286,7 +286,7 @@ Study Agent 后续的核心竞争力应该来自 RAG，而不是普通聊天。
 
 不要让模型无限制自由调用工具，而是先用可控路由实现稳定 Agent 工作流。
 
-## 9. P1：FastAPI 服务化
+## 9. P8：FastAPI 服务化
 
 不建议立刻推翻 Streamlit。推荐三步走：
 
@@ -298,7 +298,7 @@ Streamlit UI → core/chat_engine.py
 
 ### 阶段 2：增加 FastAPI
 
-最小接口：
+当前基础接口已经落地：
 
 ```text
 GET  /health
@@ -307,12 +307,21 @@ POST /memory/preview
 POST /memory/commit
 POST /rag/upload
 POST /rag/query
+GET  /rag/status
+POST /rag/local-knowledge
 GET  /sessions
+POST /sessions/{session_id}/flush
 ```
 
+仍需补齐：streaming chat、auth、CORS、统一错误响应、OpenAPI 示例和 Docker 部署配置。
+
 ### 阶段 3：补前端
 
-前端可用 Vue3 或 React。推荐先 Vue3，开发成本较低。
+前端建议进入 P9 后使用 React + Vite + TypeScript。理由是：
+
+- React 生态更适合后续做聊天流、引用面板、调试抽屉和状态组件拆分。
+- Vite 开发服务器启动快，生产构建输出静态 `dist`，可以独立部署，也可以由 FastAPI 挂载静态目录。
+- TypeScript 能把 API response、RAG source、memory preview、session row 等数据结构固定下来，减少前后端联调时的隐性字段漂移。
 
 最低页面：
 
@@ -368,7 +377,7 @@ GET  /sessions
 | RAG 测试 | chunk、入库、检索、引用来源 |
 | Tool 测试 | 新闻检索、文件读取、摘要 |
 | ContextBuilder 测试 | 不同模式下上下文是否正确 |
-| API 测试 | /chat、/health、/rag/query |
+| API 测试 | /chat、/health、/rag/query、/rag/upload、/rag/status、/memory/preview、/memory/commit、/sessions |
 | UI smoke 测试 | 页面能打开、基本交互不崩 |
 
 最关键的是 Mock Provider。真实模型用于演示和实际使用，Mock Provider 用于自动测试和 CI，避免测试依赖外部 API。
@@ -438,23 +447,23 @@ docs/
 
 任务：
 
-1. 增加 FastAPI
-2. 实现 /health
-3. 实现 /chat
-4. 实现 /rag/upload
-5. 实现 /rag/query
-6. 实现 /memory/preview
-7. 实现 /memory/commit
-8. 补 API 测试
-9. 补 Docker Compose
+1. [x] 增加 FastAPI
+2. [x] 实现 /health
+3. [x] 实现 /chat（当前为非流式）
+4. [x] 实现 /rag/upload
+5. [x] 实现 /rag/query
+6. [x] 实现 /memory/preview
+7. [x] 实现 /memory/commit
+8. [x] 补 API 测试
+9. [ ] 补 streaming chat / auth / CORS / Docker Compose
 
 ### v1.0：前端产品化版本
 
 目标：能演示、能截图、能部署、能写简历。
 
 任务：
 
-1. Vue3 / React 前端
+1. React + Vite + TypeScript 前端
 2. 聊天页
 3. 文件上传页
 4. 知识库列表页
@@ -479,28 +488,27 @@ docs/
 
 ## 15. 当前最建议执行的下一步
 
-第一步先画清主流程并拆模块：
+当前主流程已经可以按 FastAPI 边界继续收口：
 
 ```text
 用户输入
-→ UI 接收
+→ Streamlit 或 Web UI 接收
+→ FastAPI /chat
 → memory 读取
 → context 构建
-→ tool 判断
+→ local knowledge tool 判断
 → provider 调用
-→ stream 输出
+→ response 输出
 → session 记录
 → memory 写回确认
 ```
 
-推荐重构顺序：
-
-1. Provider 抽象稳定
-2. MemoryManager 稳定
-3. ContextBuilder 稳定
-4. SessionLogger 批量写入
-5. ToolRouter 初步成型
-6. Streamlit 只保留 UI
-7. 再加 FastAPI
-8. 再加 RAG
-9. 最后做前端
+推荐推进顺序：
+
+1. [x] Provider 抽象稳定
+2. [x] Memory / ContextBuilder 基础稳定
+3. [x] SessionLogger 批量写入
+4. [x] RAG MVP 与 local knowledge tool
+5. [x] FastAPI 基础服务层
+6. [ ] streaming chat / auth / CORS / Docker
+7. [ ] React + Vite + TypeScript 前端