vinayak = {
"role" : "AI Research Associate @ Vidur Research",
"education" : "B.Tech CSE @ KIIT University (2023–2027)",
"location" : "India 🇮🇳",
"focus" : ["Production RAG Systems", "LLM Fine-tuning", "Agentic Workflows"],
"currently" : "Building financial intelligence systems with CRAG + RAGAS at scale",
"philosophy" : "Real products > notebooks. Metrics > vibes.",
}- 🔭 Building CRAG pipelines that cut hallucinations 30% across 50+ financial instruments
- ⚡ Achieved 95%+ temporal accuracy with evidence gating + contradiction detection
- 🤖 Fine-tuned Phi-2 (2.7B params) with LoRA — 85% command accuracy, 40% faster inference
- 📊 Integrated RAGAS evaluation driving 10–20% retrieval gains per iteration
- 🏆 First Prize — FED Hackathon (NLP-integrated finance tracker)
- 📰 Shipped The Vector Daily — automated AI newsletter processing 100+ articles/day
| 🏢 |
AI Research Associate · Vidur Research (Subsidiary of Dreamskrin)
|
| Project | The honest description | What the numbers say | |
|---|---|---|---|
| 🧩 | PrepGraph | RAG chatbot with hybrid BM25 + FAISS, semantic cache, and a query router that picks between Llama 8b and 70b based on whether your question deserves the big model | ↓40% LLM calls · ↓25% latency · ↓30% inference cost |
| 🖥️ | Command Copilot | Fine-tuned Phi-2 (2.7B params) with LoRA to turn plain English into Linux commands. Runs completely offline. No cloud. No privacy leak. | 85% command accuracy · ↓40% inference time |
| 🧠 | ChatGPT Memory From Scratch | Built a three-layer memory system (short-term dict → FAISS long-term → LLM summarization) without LangChain. Because I wanted to understand what "memory" actually means, not just call a library | No LangChain. Pure logic. |
| 📈 | FinVector Research | Embedded financial news with FinBERT, clustered the semantic space, found that market regime shifts show up in the geometry of news embeddings before price moves | Applied on NIFTY 50 · 768-dim vectors · regime transition matrices |
| 👗 | Reel2Retail | YOLOv8 detects clothing in video frames → CLIP embeds them → FAISS matches to catalog → NLP classifies the vibe. End-to-end CV + NLP pipeline | >75% match confidence threshold · full JSON output |
| 📰 | The Vector Daily | Automated AI newsletter that scrapes arXiv, Medium, HuggingFace → LLM digest → HTML → sent to your inbox. Runs every day without me touching it. | 100+ articles/day · live in production |
| 🛡️ | SecuFlow | SSH intrusion detection + Groq LLM threat analysis + Telegram bot for human approval + automatic UFW blocking. Human-in-the-loop security. | Real-time blocking · explainable AI threat reports |
I've been writing about this too — because writing forces me to find the gaps in my own understanding:
📝 Building an LLM From Scratch. Here's Where It All Starts. — Towards AI, May 2026
Tokenization is not splitting words. BPE was invented for file compression in 1994. I didn't know that until I built it myself.
📝 My RAG App Was Confidently Wrong — That's When I Found CRAG — Mar 2026
Vector databases always return neighbors. Whether those neighbors are useful is a completely different question.
