"Bridging the gap between Theoretical Deep Learning, and High-Performance Computing."
|
ML Researcher @ Yale University Focus: Cryptographic Deep Learning & Compiler Theory
|
|
Quantitative Research Consultant @ WorldQuant (2022 - 2025) Focus: Alpha Generation & Market Signals
|
| Core AI & Research |
|
| Speech & Audio |
|
| GenAI & NLP |
|
| MLOps & Deployment |
|
| Quant & Data |
|
| Languages |
|
Advanced Deep Learning architectures for Particle Physics challenges (ML4SCI).
Engineered novel models for high-dimensional calorimeter data (125x125x3 matrices):End-to-end Speech Grammar evaluation pipeline.
- Classification: Designed ResNet-15 (PyTorch) for Photon identification and a hybrid VGG-12/Custom CNN architecture for Quark/Gluon tagging.
- Real-Time Regression: Pioneered the use of Graph Neural Networks (GNNs) for the CMS Trigger System.
- Optimization: Conducted critical trade-off analysis between GCNs (Low Latency) and GATs (High Accuracy) for momentum estimation.
Built a high-performance audio analysis system:
- Transcription: Integrated OpenAI Whisper for robust speech-to-text conversion.
- Embeddings: Implemented DeBERTa-v3, BGE, and RoBERTa for deep semantic feature extraction.
- Scoring: Developed a CatBoost regressor with ensemble cross-validation to predict grammar scores with high correlation to human baselines.
Biomedical Q&A utilizing Retrieval-Augmented Generation.
- Built an end-to-end Python pipeline to fine-tune Whisper-small transformer models on conversational Hindi audio.
- Developed a custom Devanagari text normalization engine with phonetic reverse-transliteration and automated acoustic segmentation (librosa + VAD) to handle loanwords and speech disfluencies.
- Designed a word-level Confusion Network to evaluate accuracy against imperfect human transcripts using majority voting across five ASR model outputs.
Combined PubMedBERT embeddings with a Qdrant vector store and BioMistral-7B to improve query relevance by 30% over standard keyword search.



