waqasm86

waqasm86 waqasm86

Achievements

llcuda/llcuda llcuda/llcuda Public

CUDA 12-first backend inference for Unsloth on Kaggle — Optimized for small GGUF models (1B-5B) on dual Tesla T4 GPUs (15GB each, SM 7.5)

Jupyter Notebook 8
Ubuntu-Cuda-Llama.cpp-Executable Ubuntu-Cuda-Llama.cpp-Executable Public

Pre-built llama.cpp CUDA binary for Ubuntu 22.04. No compilation required - download, extract, and run! Works with llcuda Python package for JupyterLab integration. Tested on GeForce 940M to RTX 4090.

Python 1
cuda-nvidia-systems-engg cuda-nvidia-systems-engg Public

Production-grade C++20/CUDA distributed LLM inference system with TCP networking, MPI scheduling, and content-addressed storage. Features comprehensive benchmarking (p50/p95/p99 latencies), epoll a…

C++
llcuda/llcuda.github.io llcuda/llcuda.github.io Public

This is a github pages website for my llcuda python sdk project

Python
llamatelemetry/llamatelemetry llamatelemetry/llamatelemetry Public

CUDA-first OpenTelemetry Python SDK for LLM inference observability and explainability.

Jupyter Notebook