Skip to content

yacine-baghli/GenomicOracle

Repository files navigation

🧬 GenomicOracle — AI × Quantum Genomic Analysis Platform

Precision oncology powered by Quantum Machine Learning, Multi-LLM Clinical Reasoning, and SOPHiA DDM™ Integration


📋 Overview

GenomicOracle is a full-stack genomic analysis platform built for the SOPHiA Genetics × ETH Zurich Future of Health Hackathon (May 8–10, 2026). It combines three pillars of modern precision oncology:

  1. ⚛️ Quantum Machine Learning — Variational Quantum Circuits (VQC) for pathogenicity scoring and Grover-inspired combinatorial search for multi-variant interactions (PennyLane)
  2. 🤖 Multi-LLM Clinical Reasoning — Evidence-based treatment recommendations and interactive clinical Q&A powered by Claude, Gemini, or GPT-4o
  3. 🔬 SOPHiA DDM™ Integration — Native VQS API connectivity and CSV export support for real-world genomic data from clinical labs

The platform ships with two complementary interfaces:

Interface Port Purpose
GenomicOracle (main.py) 5000 Full patient analysis — VCF upload, QML scoring, LLM reports, PDF export, AI chat
Genomic Co-Pilot (GenomicSophia/app.py) 5001 Clinical triage dashboard — bulk case ranking, 4-metric scoring, PubMed literature search

✨ Key Features

GenomicOracle — Full Patient Analysis

Feature Description
🧬 VCF File Upload Parse standard .vcf files or enter genomic variants manually
⚛️ Quantum ML Scoring Variational Quantum Circuit (VQC) pathogenicity scoring via PennyLane
🔍 Grover-Inspired Search Quantum combinatorial search across multi-variant state spaces for pathogenic interactions
🤖 LLM Treatment Reports NCCN/ESMO-aligned clinical oncology reports generated by Claude, Gemini, or GPT-4o
💬 Interactive AI Chat Context-aware clinical Q&A chatbot on the results page
🎓 VQC Training Dashboard Live training visualization with loss curves and terminal logs
📄 PDF Report Export Download formatted clinical genomics reports as PDF (xhtml2pdf)
🔒 Offline Fallback Full expert rule-base when no API key is configured — covers EGFR, KRAS, BRAF, PIK3CA, ALK, BRCA, and more

Genomic Co-Pilot — Clinical Triage Dashboard

Feature Description
📊 4-Metric Case Scoring ABCD Prediction (35%) + ClinVar Evidence (25%) + Community Frequency (20%) + QA Confidence (20%)
🏥 Case Priority Ranking 🔴 Critical → 🟠 High → 🟡 Moderate → 🟢 Low urgency tiers
📁 Bulk CSV Analysis Drop SOPHiA DDM™ exports into /samples — auto-loaded and ranked on startup
🔌 VQS API Integration Live connection to SOPHiA DDM™ platform via dataset keys
🧠 AI Clinical Summaries Optional Claude/Gemini/GPT-4o 2-sentence case summaries
Deterministic Summaries Always-available rule-based 2-sentence summaries (no API key needed)
📚 PubMed Literature Search Per-variant paper search via NCBI E-utilities with clickable paper cards
💊 Therapy Matching 21-gene actionable database with FDA-approved targeted therapies
🌓 Dark/Light Mode Toggle between dark and light themes

CLI — Batch Variant Analysis

Feature Description
📋 variant_analysis_ai.py Batch-process up to 10 SOPHiA DDM™ CSV files via Claude API
🎯 AI Importance Scoring 0–100 importance score per sample with 2-sentence clinical summary
📊 CSV Output Saves aggregated results to ai_analysis.csv

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                    GenomicOracle Platform                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │  main.py — Full Patient Analysis (port 5000)                      │  │
│  │  ┌──────────────┬─────────────────┬──────────────────────────┐    │  │
│  │  │  VCF Parser  │  QML Module     │    LLM Module            │    │  │
│  │  │  (vcfpy)     │ (qml_module.py) │   (llm_module.py)        │    │  │
│  │  │              │                 │                          │    │  │
│  │  │              │ • VQC Scoring   │ • Claude Sonnet / Opus   │    │  │
│  │  │              │ • VQC Features  │ • Gemini 2.5 Flash       │    │  │
│  │  │              │ • Grover Search │ • GPT-4o                 │    │  │
│  │  │              │ (PennyLane)     │ • Offline Fallback       │    │  │
│  │  └──────────────┴─────────────────┴──────────────────────────┘    │  │
│  └────────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │  GenomicSophia/app.py — Co-Pilot Triage Dashboard (port 5001)     │  │
│  │  ┌───────────────────┬───────────────────┬────────────────────┐   │  │
│  │  │  4-Metric Scoring │  VQS API Client   │  LLM Summaries    │   │  │
│  │  │  (ai_engine.py)   │  (vqs_client.py)  │  (ai_engine.py)   │   │  │
│  │  │                   │                   │                    │   │  │
│  │  │  • ABCD Score     │  • IAM Auth       │  • Claude Sonnet   │   │  │
│  │  │  • ClinVar Score  │  • VQS Query      │  • Gemini Flash    │   │  │
│  │  │  • Community Freq │  • Schema Fetch   │  • GPT-4o          │   │  │
│  │  │  • QA Confidence  │  • CSV Parser     │  • Deterministic   │   │  │
│  │  └───────────────────┴───────────────────┴────────────────────┘   │  │
│  └────────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │  variant_analysis_ai.py — CLI Batch Analyzer                      │  │
│  │  • Claude API (Opus 4) batch CSV scoring + 2-sentence summaries   │  │
│  └────────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│  Shared: vqs_client.py · ai_engine.py                                   │
└─────────────────────────────────────────────────────────────────────────┘

🚀 Getting Started

Prerequisites

  • Python 3.10+
  • (Optional) An API key for one of: Anthropic Claude, Google Gemini, or OpenAI GPT-4o

Installation

# Clone the repository
git clone https://github.com/yacine-baghli/GenomicOracle.git
cd GenomicOracle

# Create a virtual environment
python -m venv .venv

# Activate it
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Run — GenomicOracle (Full Analysis)

python main.py

Open http://127.0.0.1:5000 — enter patient data, upload VCF files or manually enter variants, and generate AI-powered clinical reports.

Run — Genomic Co-Pilot (Triage Dashboard)

cd GenomicSophia
pip install -r requirements.txt
python app.py

Open http://127.0.0.1:5001 — the dashboard auto-loads and ranks all CSV files from samples/ on startup.

Run — CLI Batch Analysis

# Set API key
set ANTHROPIC_API_KEY=sk-ant-...

# Analyze one or more CSV files
python variant_analysis_ai.py samples/patient1.csv samples/patient2.csv --output results.csv

⚙️ Configuration

LLM Provider Setup

You can configure your LLM provider in two ways:

  1. Via the UI — Click ⚙️ API Key in the dashboard header, select your provider, paste your key, and save.
  2. Via environment variables:
# Set one of the following before launching
set ANTHROPIC_API_KEY=your-key     # Anthropic Claude (recommended)
set GEMINI_API_KEY=your-key        # Google Gemini
set OPENAI_API_KEY=your-key        # OpenAI GPT-4o

Note: If no API key is provided, both interfaces operate in Offline Deterministic Mode using built-in expert rule-bases that cover 21+ actionable oncological genes (EGFR, KRAS, BRAF, PIK3CA, ALK, BRCA1/2, RET, NTRK, MET, and more).


🧪 Usage — GenomicOracle (Full Analysis)

1. Enter Patient Data

Fill in the patient's name, age, gender, and clinical condition (e.g., Lung Adenocarcinoma).

2. Provide Genetic Variants

  • Upload a VCF file, or
  • Manually enter variants (Gene, HGVS notation, Zygosity)
  • Or click Load Sample Data to try one of 5 pre-built clinical scenarios

3. Run AI Analysis

The platform will:

  • Annotate variants against ClinVar
  • Score pathogenicity with the Variational Quantum Circuit
  • Search for multi-variant interactions via Grover-inspired quantum search
  • Generate an LLM-powered clinical oncology report (or expert rule-base fallback)

4. Explore Results

  • Read targeted therapy recommendations aligned with NCCN/ESMO guidelines
  • Chat with the AI about the patient's genomic profile
  • Download the report as a PDF

5. Train the Quantum Model (optional)

Navigate to the ⚛ Train Quantum Model page to run the VQC training pipeline with live loss convergence visualization.


🧪 Usage — Genomic Co-Pilot (Triage Dashboard)

Data Input

Tab How
📄 CSV Data Drop SOPHiA DDM™ export CSVs into GenomicSophia/samples/ — they auto-load. Or upload via the dashboard.
🔌 VQS API Enter SOPHiA DDM™ credentials + dataset keys from browser DevTools.

Ranking Workflow

  1. Load data — CSVs from /samples are auto-loaded on page load
  2. Rank cases — Click 🔬 Rank All Cases (or 🔍 Analyze for VQS API)
  3. Review — Cases are sorted by urgency. Click any case card to expand it.
  4. Explore variants — The variant table shows all scored variants, sorted by composite score
  5. Literature — Click 📚 on any variant row to search PubMed for relevant papers
  6. AI Summaries — Set an LLM API key and select a provider to get AI-generated clinical summaries

What You See Per Case

  • 4-metric radar — ABCD Prediction, ClinVar Evidence, Community Frequency, QA Confidence
  • 🧠 AI Clinical Summary — 2-sentence LLM narrative (when API key is configured)
  • ⚡ Deterministic Analysis — Always-available 2-sentence summary + bullet breakdown + action label
  • 💊 Therapy tags — Matched from the 21-gene actionable database
  • Variant table — Full priority ranking with gene links to ClinVar, per-variant PubMed search

🔬 Scoring System — 4 Metrics (Co-Pilot)

Variant-Level Scoring

Each variant is scored across 4 metrics (0–100), weighted and combined into a composite score.

1. ABCD Prediction (Weight: 35%)

SOPHiA DDM™'s proprietary ABCD predictor — the primary pathogenicity signal.

ABCD Class Base Score + Consequence Bonus
A — High confidence pathogenic 65 Nonsense/Frameshift/Splice: +30
B — Moderate confidence 45 Missense: +12
C — Low confidence 20 Inframe: +5
D — Benign prediction 5

2. ClinVar Evidence (Weight: 25%)

ClinVar Significance Points + Review Status Points
Pathogenic +55 Expert panel +35
Likely Pathogenic +40 Multiple submitters +25
VUS +15 Single submitter +15
Benign +5 No assertion +5

3. Community Frequency (Weight: 20%)

SOPHiA DDM™ lab-wide frequency — rarer variants score higher.

Community Freq. Score Interpretation
Absent / 0 95 Never seen across all labs
< 0.1% 90 Ultra-rare
< 0.5% 75 Rare
< 1% 60 Low frequency
≥ 10% 10 Likely artifact

4. QA Confidence (Weight: 20%)

Sequencing quality — starts at 50, adjusted by read depth, VAF, and ABCD.

Factor Condition Adjustment
Read Depth ≥ 200× +25
VAF ≥ 30% +20
VAF < 5% −10
ABCD A +10

Case-Level Ranking

Case Score = Top Variant × 0.50 + Average Score × 0.30 + High-ABCD Bonus × 0.20
Case Score Tier Action
≥ 70 🔴 CRITICAL Immediate review required
≥ 50 🟠 HIGH Priority review
≥ 30 🟡 MODERATE Standard review
< 30 🟢 LOW Routine processing

💊 Actionable Gene Database

The platform maintains a curated database of 21 actionable genes with Tier 1/2 classifications and matched FDA-approved therapies:

Gene Tier Matched Therapies
EGFR 1 Osimertinib, Erlotinib, Gefitinib
ALK 1 Alectinib, Lorlatinib, Crizotinib
BRAF 1 Dabrafenib+Trametinib, Vemurafenib
BRCA1/2 1 Olaparib, Rucaparib
ROS1 1 Crizotinib, Entrectinib
RET 1 Selpercatinib, Pralsetinib
NTRK1/2/3 1 Larotrectinib, Entrectinib
KRAS 2 Sotorasib (G12C), Adagrasib (G12C)
MET 2 Capmatinib, Tepotinib
HER2/ERBB2 2 Trastuzumab deruxtecan
PIK3CA 2 Alpelisib
ESR1 2 Elacestrant
FGFR2/3 2 Pemigatinib, Erdafitinib
IDH1/2 2 Ivosidenib, Enasidenib

Additionally, 6 prognostic markers are tracked: TP53, PTEN, RB1, CDKN2A, APC, SMAD4.


⚛️ Quantum ML Components

Variational Quantum Circuit (VQC) Scoring

A PennyLane-based quantum circuit (default.qubit simulator) that:

  • Encodes variant features into quantum states via RX and RY rotations
  • Outputs a pathogenicity score (0–1) from PauliZ expectation values
  • Supports post-training VQC feature extraction with interpretable variant topology descriptions

Grover-Inspired Combinatorial Search

Simulates quantum amplitude amplification across multi-variant state spaces to identify clinically significant co-occurring mutations:

Interaction Genes Clinical Impact
Synergistic Resistance KRAS + PIK3CA Combined MAP/PI3K pathway hyper-activation
Negative Prognostic EGFR + TP53 Severely limits TKI response duration
Synthetic Lethality Bypass BRAF + PTEN PTEN loss bypasses BRAF inhibition
Synthetic Lethality Synergy PIK3CA + BRCA2 Combined PARP + PI3K inhibition response

📜 Sample Clinical Scenarios

GenomicOracle — 5 Built-in Patients

Patient Condition Key Variants
Jane Smith, 62F Stage IV NSCLC EGFR L858R, EGFR T790M, TP53 R273H
Michael Johnson, 58M Metastatic CRC KRAS G12V, PIK3CA E545K, APC R876*
Sarah Williams, 45F Stage III Melanoma BRAF V600E, PTEN R130G, CDKN2A R58*
Emily Davis, 40F Metastatic Breast (HR+/HER2-) PIK3CA H1047R, ESR1 D538G, BRCA2 S1982fs
David Brown, 8M High-Risk Neuroblastoma MYCN Amplification, ALK F1174L

Genomic Co-Pilot — 8 SOPHiA DDM™ CSV Samples

8 real-format SOPHiA DDM™ export CSVs are included in GenomicSophia/samples/ (patient1–patient8), containing 125–500+ variants each with full ABCD predictions, ClinVar annotations, community frequencies, and read depth data.


📁 Project Structure

GenomicOracle/
├── main.py                    # Flask app — full patient analysis (port 5000)
├── qml_module.py              # Quantum ML: VQC scoring, features, Grover search
├── llm_module.py              # Multi-LLM: treatment reports, chat, offline fallback
├── ai_engine.py               # 4-metric scoring engine + CSV parser + LLM summaries
├── vqs_client.py              # SOPHiA VQS API client (auth + query + schema)
├── variant_analysis_ai.py     # CLI batch analyzer (Claude API)
├── requirements.txt           # Python dependencies
├── templates/
│   ├── index.html             # Landing page — patient & variant input
│   ├── analysis.html          # Variant analysis results
│   ├── results.html           # Full clinical report + AI chat
│   ├── results_pdf.html       # PDF-optimized report template
│   ├── training.html          # VQC training dashboard
│   └── copilot.html           # Co-Pilot dashboard (v1)
├── static/
│   └── style.css              # UI stylesheet (glassmorphism dark theme)
├── uploads/                   # Uploaded VCF files (gitignored)
│
├── GenomicSophia/             # ← Co-Pilot Triage Dashboard (standalone)
│   ├── app.py                 # Flask routes & API endpoints (port 5001)
│   ├── ai_engine.py           # 4-metric scoring + deterministic summaries
│   ├── vqs_client.py          # SOPHiA VQS API client
│   ├── requirements.txt       # Minimal dependencies (flask, requests)
│   ├── samples/               # 8 SOPHiA DDM™ CSV exports (auto-loaded)
│   │   ├── patient1.csv       # ... through patient8.csv
│   │   └── ...
│   ├── templates/
│   │   └── copilot.html       # Dashboard HTML
│   └── static/
│       ├── style.css          # Dark/Light theme CSS
│       └── app.js             # Frontend logic
│
└── GenomicOracle/             # ← Earlier standalone version (separate repo)
    └── ...

🔬 API Endpoints

GenomicOracle (main.py — port 5000)

Method Endpoint Description
GET / Landing page — patient & variant input
POST /analysis Run variant analysis (VCF or manual)
GET /results View clinical report + AI chat
POST /api/chat Interactive AI chatbot (context-aware)
GET /training VQC training dashboard
POST /api/train_model Trigger quantum model training
GET /download_pdf Download clinical report as PDF
POST /settings Save LLM provider + API key
GET /copilot Co-Pilot dashboard (embedded)
POST /api/copilot/analyze VQS API analysis
POST /api/copilot/csv CSV file analysis

Genomic Co-Pilot (GenomicSophia/app.py — port 5001)

Method Endpoint Description
GET / Dashboard
GET /api/samples List CSV files in /samples
POST /api/upload Upload & persist CSVs to /samples
POST /api/remove_sample Remove a CSV from /samples
POST /api/rank Score & rank all cases (+ optional LLM)
POST /api/analyze Analyze via VQS API
POST /api/set_key Store LLM API key in session
POST /api/variant_papers PubMed paper search for a gene/variant

🛠️ Tech Stack

Layer Technology
Backend Flask (Python 3.10+)
Quantum ML PennyLane (default.qubit simulator)
LLM Integration Anthropic Claude, Google Gemini, OpenAI GPT-4o
VCF Parsing vcfpy
PDF Generation xhtml2pdf
SOPHiA DDM™ VQS API (DuckDB-as-a-service) + CSV exports
Literature PubMed E-utilities (free NCBI API)
Frontend HTML5, CSS3 (glassmorphism), vanilla JavaScript, Inter font
Charts Chart.js (training dashboard)

Dependencies

flask>=3.0
pennylane>=0.39
vcfpy>=0.13
xhtml2pdf>=0.2
google-genai>=1.0
openai>=1.0
anthropic>=0.30
requests>=2.31
pandas>=2.0

🔌 SOPHiA DDM™ VQS API

The platform integrates with SOPHiA Genetics' Variant Query Service (VQS) — a DuckDB-powered API for querying genomic datasets stored on the SOPHiA DDM™ platform.

Authentication

POST https://iam-vandv.sophiagenetics.com/account/token
Body: {"username": "...", "password": "..."}
→ Returns: {"access_token": "Bearer ..."}

Querying Variants

POST https://platform-vandv1.sophiagenetics.com/api/variant/query
Params: key=<dataset_key>&engine.paginate=true
Body: {"columns": ["*"], "pagination": {"offset": 0, "limit": 500}}

How to Get Dataset Keys

  1. Log into SOPHiA DDM™ platform
  2. Open browser DevTools → Network tab
  3. Navigate to a patient's variant list
  4. Copy the key parameter from the /api/variant/query request
  5. Paste into the VQS API tab in the dashboard

⚠️ Disclaimer

This platform is a hackathon prototype built for educational and demonstration purposes. It is not intended for clinical use. All treatment recommendations should be validated by qualified medical professionals. The quantum ML components use simulated quantum circuits on classical hardware.


📄 License

This project is open-source under the MIT License.


👥 Team

Built with ❤️ at ETH Zurich for the SOPHiA Genetics × ETH Zurich Future of Health Hackathon 2026

About

GenomicOracle is a web-based genomic analysis platform built for the AI for Health Hackathon. It combines Quantum Machine Learning (via PennyLane), LLM-powered clinical reasoning (Gemini / OpenAI / Anthropic), and ClinVar variant annotation to deliver end-to-end precision oncology reports from raw genetic data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors