This guide helps you set up a similar FDA 510(k) workflow for a different product area (e.g. another product code, device type, or indication). The cardiac_cta project is the reference implementation; its PROJECT_SUMMARY.md documents every step in detail.
| Step | What | Script / command |
|---|---|---|
| 1 | Create project dir and pull initial 510(k) data | uv run prediscope '<query>' --data-dir projects/<my_project> |
| 2 | (Optional) Deduplicate index | Edit or script: dedupe index.jsonl by k_number |
| 3 | Download one summary per applicant (sampling) | uv run python scripts/download_one_per_applicant.py --data-dir projects/<my_project> |
| 4 | Find devices matching your topic from raw text | Adapt find_cardiac_from_raw.py → keywords + output file names (see below) |
| 5 | Download all summaries for those applicants | uv run python scripts/download_by_k_list.py --data-dir projects/<my_project> --k-list projects/<my_project>/k_numbers_all_<topic>_applicants.txt |
| 6 | Build knowledge base (JSONL) | Adapt build_cardiac_knowledge_base.py to read your topic’s K-list (see below) |
| 7 | (Optional) HTML summary | Adapt generate_cardiac_summary_html.py (titles, paths) or skip |
| 8 | (Optional) AI/ML deep dive | uv run python scripts/extract_ai_ml_deep.py --data-dir projects/<my_project> |
| 9 | Build RAG chunks for retrieval / MCP | uv run python scripts/build_rag_chunks.py --data-dir projects/<my_project> |
| 10 | Use retrieval in Cursor (MCP) or CLI | See MCP_SETUP.md; set PREDISCOPE_DATA_DIR=projects/<my_project> |
Steps 1, 2, 3, 5, 8, and 9 use generic scripts: they only need --data-dir (and sometimes --k-list). Steps 4, 6, and 7 are cardiac-specific and must be adapted (or replaced) for your topic.
Use your own openFDA query and project path:
mkdir -p projects/my_project
uv run prediscope 'product_code:XXX' --data-dir projects/my_project
# Add more queries if needed (some APIs don’t support OR; run separately and merge/dedupe index)uv run python scripts/download_one_per_applicant.py --data-dir projects/my_projectScript: scripts/find_cardiac_from_raw.py
Why it’s specific: Hardcoded keywords (e.g. cardiac, coronary, CTA) and output file names (cardiac_applicants.txt, cardiac_k_numbers.txt, k_numbers_all_cardiac_applicants.txt, cardiac_report.md).
Options:
- Copy and adapt: Copy the script to e.g.
find_<topic>_from_raw.py, change the keyword list and all output file names (and the K-list filename used in the next step). - Generic script (future): A script that reads a config file (e.g.
keywords.txtandout_prefix) would work for any project; for now, the cardiac script is the template.
Downstream scripts expect a K-number list file (one K per line) for “devices matching my topic” and optionally “all K-numbers from those applicants” for the full download. Name them consistently (e.g. k_numbers_<topic>.txt and k_numbers_all_<topic>_applicants.txt) so the next step can find them.
uv run python scripts/download_by_k_list.py --data-dir projects/my_project \
--k-list projects/my_project/k_numbers_all_<topic>_applicants.txtScript: scripts/build_cardiac_knowledge_base.py
Why it’s specific: It looks for cardiac_k_numbers.txt and writes device records into knowledge_base.jsonl. The logic (read index + K-list + raw, enrich with year_cleared, product_code, has_ai_ml) is reusable.
Options:
- Copy and adapt: Copy to e.g.
build_<topic>_knowledge_base.pyand change the input file fromcardiac_k_numbers.txtto your topic’s K-list (e.g.k_numbers_<topic>.txt). Keep the output asknowledge_base.jsonlso downstream scripts (extract_ai_ml_deep, RAG) still work. - Symlink: In your project dir,
ln -s k_numbers_<topic>.txt cardiac_k_numbers.txtand run the existing script (quick hack; file names in reports will still say “cardiac”).
uv run python scripts/build_rag_chunks.py --data-dir projects/my_projectThen use the MCP server with your project’s chunks:
- In Cursor MCP config, set
PREDISCOPE_DATA_DIRtoprojects/my_project(see MCP_SETUP.md). - Or run retrieval from the CLI:
uv run python scripts/retrieve_chunks.py "your question" --data-dir projects/my_project --top-k 10
- Full workflow and script reference: projects/cardiac_cta/PROJECT_SUMMARY.md — narrative and tables for every script.
- MCP and retrieval in Cursor: MCP_SETUP.md.
- FDA AI agent (RAG, embeddings, next steps): FDA_AI_AGENT.md.
Under .gitignore, project data (raw/, summaries/, *.jsonl, *.txt in project dirs) is ignored; docs (README.md, PROJECT_SUMMARY.md, *.md, *.html) are tracked. So you can commit your project’s README and summary without committing PDFs or large JSONL. See repo root .gitignore for the exact patterns.