Skip to content

[codex] Add bitlift sidecar runtime and eval tools#212

Draft
Chedrian07 wants to merge 1 commit into
antirez:mainfrom
Chedrian07:codex/ko-expert-usage-trace
Draft

[codex] Add bitlift sidecar runtime and eval tools#212
Chedrian07 wants to merge 1 commit into
antirez:mainfrom
Chedrian07:codex/ko-expert-usage-trace

Conversation

@Chedrian07
Copy link
Copy Markdown

Summary

  • add bitlift sidecar loading/binding for routed expert Q4 tensors
  • add Metal-side sidecar routed dispatch buffers and route tracing support
  • add GGUF/source conversion tools for base GGUF, HF FP4, and HF Base-FP8 source sidecars
  • add local Korean/KMMLU/Think MAX/long-instruction evaluation tools and reports

Validation

  • python3 -m py_compile tools/eval_ds4_project.py tools/eval_kmmlu_sample.py tools/bench_thinkmax_ds4.py tools/eval_long_instructions_v2.py tools/write_bitlift_sidecar_from_hf_fp8.py
  • make -B ds4
  • generated and loaded L10 Base-FP8 Q4 sidecars: Top64, Top128, Full256
  • ran inspect and Think MAX route traces for all three sidecars
  • ran structured Korean/control/exact-long evaluation
  • ran KMMLU 300 sample
  • ran Think MAX expanded30
  • ran long instruction v2

Result Notes

The Base-FP8 L10 source sidecars are runtime-stable, but they are not the recommended quality winner. Current operating recommendation remains: base or LateStable5Q4 for general nothink chat, Layer10Q4 for Think MAX/KMMLU-style Korean experiments. The uploaded private HF artifact repository records the experimental sidecars and reports for reproducibility: https://huggingface.co/KCh3dRi4n/DeepSeek-V4-Flash-KR-L10-BaseFP8-Q4-Sidecar-GGUF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant