Senior Cloud Platform Engineer building GPU/AI infrastructure at scale.
CNCF Golden Kubestronaut. Oracle ACE Associate. Dragonfly Community Member.
31+ PRs across 17 open-source projects in CNCF, ASWF, and beyond.
If GPUs need scheduling, scaling, or observability on Kubernetes — that's what I build.
| 🎮 GPU Autoscaling | KEDA External Scaler with native NVML metrics, DaemonSet architecture, scaling profiles for vLLM, Triton, and training workloads. Referenced in KEDA #7538 and published on CNCF Blog. |
| 🔬 GPU NUMA Topology | Volcano scheduler plugin for NUMA-aware GPU placement — topology discovery via sysfs, CRD extensions, and cross-socket affinity optimization. |
| 📡 GPU Observability | OpenTelemetry Collector receiver for GPU metrics (NVML-native) and Docker Desktop Extension for real-time GPU monitoring dashboards. |
| 🧠 Topology-Aware AIOps | Knowledge graph of Kubernetes resources with graph-based root-cause traversal, AlertManager webhook integration, and blast-radius analysis. |
| ☁️ Platform Engineering | Kubernetes, ArgoCD, Crossplane, Docker, KEDA — production platforms serving enterprise workloads at scale. |
| 📝 Technical Writing | 20 published articles across CNCF Blog, IEEE ComSoc, Platform Engineering, VKTR, Cloud Native Now, and Medium. |
Golden Kubestronaut — All five Kubernetes certifications: KCNA, CKA, CKAD, CKS, KCSA
|
KEDA External gRPC Scaler for GPU/AI workloads
Tech: Go · gRPC · NVIDIA NVML · Kubernetes · Helm Referenced in KEDA #7538 | CNCF Blog |
OpenTelemetry Collector receiver for GPU metrics
Tech: Go · OpenTelemetry Collector SDK · NVML |
|
Real-time NVIDIA GPU metrics in Docker Desktop
Tech: Go · React · Recharts · Docker Extension SDK · NVML |
K8s knowledge graph & automated root-cause analysis
Tech: Go · Kubernetes API · Gorilla Mux · Helm |
More projects: KubeAI Autoscaler · Ingress2Gateway · Golden Kubestronaut Learning · LLMOps
31+ PRs across 17 projects in CNCF, ASWF, and open-source foundations.
| Project | Description | Contributions |
|---|---|---|
| Dragonfly | P2P-based file distribution and image acceleration | client#1861 - Fix error chain propagation in backend stream failures, client#1665 - Add Hugging Face backend support with hf:// protocol, client#1673 - Add ModelScope backend support with modelscope:// protocol, d7y.io#386 - Add hf:// protocol documentation, d7y.io#398 - Add P2P-accelerated AI model downloads blog post, helm-charts#455 - Add injector support to helm chart, helm-charts#480 - Replace deprecated bitnamilegacy/mysql with bitnami/mysql |
| Kubernetes | Production-Grade Container Orchestration | #53891 - Document deployment.kubernetes.io/* annotations, #53892 - Add kubectl apply view-last-applied documentation |
| TiKV | Distributed transactional key-value database | #19225 - Add AGENTS.md for AI agent guidance |
| Volcano | Cloud-native batch scheduling for AI/HPC | #5328 - Fix typos in scheduler comments, #5095 - GPU NUMA topology awareness in scheduler, apis#229 - Add GPUInfo type to NumatopoSpec CRD, resource-exporter#12 - GPU NUMA topology discovery via sysfs |
| HAMi | Heterogeneous AI Computing Virtualization Middleware | #1893 - Add unit tests for nvinternal info, mig, and watch packages |
| KEDA | Kubernetes Event-driven Autoscaling | keda-docs#1658 - Removing metricName from the kedadocs, keda-docs#1769 - Fix datadog scaler typos across all versions, #7538 - GPU/AI inference scaler architectural analysis |
| Metal³ | Bare metal host provisioning for Kubernetes | #624 - Fix redirect links in tryit.md |
| OpenTelemetry | Observability framework | #8632 - Add .NET troubleshooting page |
| kpt | Kubernetes-native packaging and resource management | #4278 - Fix kpt fn doc command for KRM functions expecting input |
| traceAI | Open-source LLM observability SDK | #165 - Fix exporter shutdown and thread safety in Python SDK, #166 - Add Go SDK with OpenAI instrumentor |
| Project | Description | Contributions |
|---|---|---|
| OpenColorIO | Color management library | #2229 - Add release signing workflow, #2230 - Add Dependabot configuration, #2243 - Add Vulkan unit test framework |
| OpenCue | Cloud rendering management system | #2134 - Add scheduled subscription recalculation task |
| OpenImageIO | Image processing library | #4976 - Fix IBA::compare_Yee() channel access |
| RAWtoACES | RAW to ACES image conversion | #222 - Add build developer documentation |
| xSTUDIO | Playback and review application | #186 - Fix broken build guide links |
20 articles published across CNCF Blog, IEEE ComSoc, Platform Engineering, VKTR, Cloud Native Now, and Medium.
Quoted as a subject-matter expert across 11+ publications on enterprise AI, GPU infrastructure, cloud security, and platform engineering.
| Publication | Article | Quote / Mention |
|---|---|---|
| AI Business (Informa PLC) | OpenAI vs. Anthropic vs. Google: But the Model Isn't the Point | "The real dependency risk comes from the orchestration, workflow and data integration layers built around them... Relying on third-party orchestration is where real lock-ins happen." |
| VKTR (Simpler Media Group) | Enterprise AI Costs Climb as GPU Demand Outpaces Supply | "The architecture that works is a routing layer: simple tasks go to a lightweight SLM, complex reasoning escalates to the frontier model. You stop paying frontier prices for envelope-delivery workloads." |
| Techopedia (10M+ monthly visitors) | AI Experts Call for a Reality Check on Allbirds' Pivot | "GPU capacity is genuinely hard to get right now... You can't buy that institutional knowledge with a convertible note and a rebrand." |
| Reworked (Simpler Media Group) | AI Agents and the Process Documentation Fallacy | "If an AI agent is trained purely by observing the official workflow in the ticketing platform, it's learning a fantasy... You have to fence the AI in." |
| InfoSec Relations | Agentic AI is Exposing the Accountability Gap in Cloud Security Governance | "We enforce this with Policy-as-Code at the admission layer, so the agent's available responses are constrained by the infrastructure itself, not by a governance doc that someone wrote once and nobody checks." |
| Tech Round (UK) | Meta Acquires Moltbook: What Responsibility Do Meta And Regulators Have? | "We are building autonomous agents without implementing Zero Trust security... Regulators must urgently pivot to regulating Agentic Privileges." |
| TLDR Newsletter (3M+ subscribers) | Featured Mention | CNCF GPU autoscaling blog featured to 3M+ subscribers |
| Habr (VKTech / VK Group) | GPU Auto-Scaling on Kubernetes with KEDA | Russian-language adaptation of CNCF blog — 4,500+ views in 13 hours |
| Cloud Native Now (Techstrong Group) | Stop Wasting GPU Budget: Autoscaling AI Inference on Kubernetes with KEDA | Primary author — GPU autoscaling architecture and scale-to-zero for AI inference |
| Y Square Technology | AI Agent Documentation Reality Gap | Quoted on enterprise AI agent deployment challenges |
|
CNCF LinkedIn (500K+ followers)
204+ likes · 28 reposts · 3 comments |
CNCF Twitter/X (@CloudNativeFdn)
2,122 views · 24 likes · 7 bookmarks |
|
CNCF Bluesky (cncf.io)
Featured across all 3 CNCF social platforms |
CNCF LinkedIn (500K+ followers)
26+ likes · 1 repost |
Stats updated on 2026-06-11 15:25 UTC
Building GPU infrastructure for Kubernetes? Working on CNCF projects? Let's collaborate.





