- Lahore, Pakistan
- llcuda.github.io
- in/mohammad-waqas-3a1384270
- @waqasm86
Pinned Loading
-
llcuda/llcuda
llcuda/llcuda PublicCUDA 12-first backend inference for Unsloth on Kaggle — Optimized for small GGUF models (1B-5B) on dual Tesla T4 GPUs (15GB each, SM 7.5)
Jupyter Notebook 8
-
Ubuntu-Cuda-Llama.cpp-Executable
Ubuntu-Cuda-Llama.cpp-Executable PublicPre-built llama.cpp CUDA binary for Ubuntu 22.04. No compilation required - download, extract, and run! Works with llcuda Python package for JupyterLab integration. Tested on GeForce 940M to RTX 4090.
Python 1
-
cuda-nvidia-systems-engg
cuda-nvidia-systems-engg PublicProduction-grade C++20/CUDA distributed LLM inference system with TCP networking, MPI scheduling, and content-addressed storage. Features comprehensive benchmarking (p50/p95/p99 latencies), epoll a…
C++
-
llcuda/llcuda.github.io
llcuda/llcuda.github.io PublicThis is a github pages website for my llcuda python sdk project
Python
-
llamatelemetry/llamatelemetry
llamatelemetry/llamatelemetry PublicCUDA-first OpenTelemetry Python SDK for LLM inference observability and explainability.
Jupyter Notebook
If the problem persists, check the GitHub status page or contact support.

