周耀阳 (Zhou Yaoyang) shinezyy

Hi there 👋

I am now bridging the gap between LLM inference systems and their "Speed-of-Light", much as I once bridged the performance gap between CPU simulators and real silicon.

I am a deep learning performance architect in NVIDIA, working on LLM inference system analysis, benchmarking, and modeling.
- I am systematically studying performance of emerging models (such as sparse attention models, video generation models) on cutting-edge hardwares.
- I participated in InferenceX V2 benchmarking. I delivered the recipes for NVIDIA B200/B300 disaggregated serving (15800+ token/sec per GPU on Deepseek R1 nvfp4).
Before joining NVIDIA, I was a CPU performance architect in Beijing Institute of Open Source Chip (BOSC)
- I helped XiangShan CPU to achieve 15/GHz SPECint 2k6 score.
- I built a GEM5-based perf simulator for XiangShan CPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

周耀阳 (Zhou Yaoyang) shinezyy

Achievements

Achievements

Block or report shinezyy

Hi there 👋

Pinned Loading

Uh oh!