I am now bridging the gap between LLM inference systems and their "Speed-of-Light", much as I once bridged the performance gap between CPU simulators and real silicon.
- I am a deep learning performance architect in NVIDIA, working on LLM inference system analysis, benchmarking, and modeling.
- I am systematically studying performance of emerging models (such as sparse attention models, video generation models) on cutting-edge hardwares.
- I participated in InferenceX V2 benchmarking. I delivered the recipes for NVIDIA B200/B300 disaggregated serving (15800+ token/sec per GPU on Deepseek R1 nvfp4).
- Before joining NVIDIA, I was a CPU performance architect in Beijing Institute of Open Source Chip (BOSC)
- I helped XiangShan CPU to achieve 15/GHz SPECint 2k6 score.
- I built a GEM5-based perf simulator for XiangShan CPU.





