Skip to content
View shinezyy's full-sized avatar

Block or report shinezyy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
shinezyy/README.md

Hi there 👋

I am now bridging the gap between LLM inference systems and their "Speed-of-Light", much as I once bridged the performance gap between CPU simulators and real silicon.

  • I am a deep learning performance architect in NVIDIA, working on LLM inference system analysis, benchmarking, and modeling.
    • I am systematically studying performance of emerging models (such as sparse attention models, video generation models) on cutting-edge hardwares.
    • I participated in InferenceX V2 benchmarking. I delivered the recipes for NVIDIA B200/B300 disaggregated serving (15800+ token/sec per GPU on Deepseek R1 nvfp4).
  • Before joining NVIDIA, I was a CPU performance architect in Beijing Institute of Open Source Chip (BOSC)

Pinned Loading

  1. OpenXiangShan/GEM5 OpenXiangShan/GEM5 Public

    C++ 129 71

  2. micro-arch-training micro-arch-training Public

    How to make undergraduates or new graduates ready for advanced computer architecture research or modern CPU design

    626 46

  3. OpenXiangShan/NEMU OpenXiangShan/NEMU Public

    Super fast RISC-V ISA emulator for XiangShan processor

    C 314 124