diff --git a/docs/en/_static/custom.css b/docs/en/_static/custom.css new file mode 100644 index 000000000..4ff3e6e61 --- /dev/null +++ b/docs/en/_static/custom.css @@ -0,0 +1,39 @@ +/*h1 {*/ +/* color: #003B71 !important;*/ +/*}*/ +/**/ +/*.admonition.note > .admonition-title::after {*/ +/* content: "" !important;*/ +/* -webkit-mask: none !important;*/ +/* mask: none !important;*/ +/* background: url("image/note.png") no-repeat center / contain !important;*/ +/* width: 2em; height: 2em;*/ +/* margin-left: -0.5rem; /* 视情况微调 */*/ +/* margin-top: -0.15rem; /* 视情况微调 */*/ +/*}*/ +/**/ +/*html[data-theme="light"], html[data-mode="light"] { --pst-icon-admonition-note: url("icons/note-light.svg"); } */ +/*html[data-theme="dark"], html[data-mode="dark"] { --pst-icon-admonition-note: url("icons/note-dark.svg"); } */ + + +/*.admonition.note > .admonition-title {*/ +/* background-color: #dce7fc !important;*/ +/*}*/ + +/* docs/zh_cn/_static/custom.css */ +h1 { + color: #003B71 !important; +} + +/* 明/暗色分别指定图标 */ +html[data-theme="light"], html[data-mode="light"] { --note-icon-url: url("image/note-light.png"); } +html[data-theme="dark"], html[data-mode="dark"] { --note-icon-url: url("image/note-dark.png"); } + +/* 覆盖 note 的图标渲染 */ +.admonition.note > .admonition-title::after { + content: "" !important; + width: 2em; height: 2em; + margin-left: -0.5rem; + margin-top: -0.15rem; + background: var(--note-icon-url) no-repeat center / contain !important; +} diff --git a/docs/en/_static/image/logo.png b/docs/en/_static/image/logo.png new file mode 100644 index 000000000..0d6b754c9 Binary files /dev/null and b/docs/en/_static/image/logo.png differ diff --git a/docs/en/_static/image/note-dark.png b/docs/en/_static/image/note-dark.png new file mode 100755 index 000000000..8e624f530 Binary files /dev/null and b/docs/en/_static/image/note-dark.png differ diff --git a/docs/en/_static/image/note-light.png b/docs/en/_static/image/note-light.png new file mode 100755 index 000000000..c24d218c8 Binary files /dev/null and b/docs/en/_static/image/note-light.png differ diff --git a/docs/en/get_started/index.rst b/docs/en/get_started/index.rst index 9a2d277fb..b2f8a8f97 100644 --- a/docs/en/get_started/index.rst +++ b/docs/en/get_started/index.rst @@ -2,8 +2,7 @@ :maxdepth: 1 :caption: Getting Started - xtuner_v1.md installation.md sft.md mllm_sft.md - grpo.md \ No newline at end of file + grpo.md diff --git a/docs/en/index.rst b/docs/en/index.rst index 89e45f3fa..8e2b9c16b 100644 --- a/docs/en/index.rst +++ b/docs/en/index.rst @@ -64,13 +64,6 @@ Welcome to XTuner V1 English Documentation benchmark/index.rst -.. toctree:: - :hidden: - :maxdepth: 1 - :caption: Legacy Documentation - - legacy_index.rst - .. toctree:: :hidden: :maxdepth: 2 @@ -83,7 +76,7 @@ Welcome to XTuner V1 English Documentation Loss Context -XTuner V1 is a new generation large model training engine specifically designed for ultra-large-scale MoE models. Compared with traditional 3D parallel training architectures, XTuner V1 has been deeply optimized for the current mainstream MoE training scenarios in academia. +XTuner V1 is a next-generation LLM training engine specifically designed for ultra-large-scale MoE models. Unlike traditional 3D parallel training architectures, XTuner V1 is optimized for the mainstream MoE training scenarios prevalent in today's academic research. 🚀 Speed Benchmark ================================== @@ -92,24 +85,24 @@ XTuner V1 is a new generation large model training engine specifically designed :align: center :width: 90% -Core Features +Key Features ============= **📊 Dropless Training** -- **Flexible Scaling, No Complex Configuration:** 200B scale MoE without expert parallelism; 600B MoE only requires intra-node expert parallelism -- **Optimized Parallel Strategy:** Compared with traditional 3D parallel solutions, smaller expert parallel dimensions enable more efficient Dropless training +- **Scalable without complexity:** Train 200B-scale MoE models without expert parallelism; 600B models require only intra-node expert parallelism +- **Optimized parallelism strategy:** Smaller expert parallelism dimension compared to traditional 3D approaches, enabling more efficient Dropless training **📝 Long Sequence Support** -- **Memory Efficient Design:** Through advanced memory optimization technology combinations, 200B MoE models can train 64k sequence length without sequence parallelism -- **Flexible Extension Capability:** Full support for DeepSpeed Ulysses sequence parallelism, maximum sequence length can be linearly extended -- **Stable and Reliable:** Insensitive to expert load imbalance during long sequence training, maintaining stable performance +- **Memory-efficient design:** Train 200B MoE models on 64k sequence lengths without sequence parallelism through advanced memory optimization techniques +- **Flexible scaling:** Full support for DeepSpeed Ulysses sequence parallelism with linearly scalable maximum sequence length +- **Robust performance:** Maintains stability despite expert load imbalance during long sequence training -**⚡ Excellent Efficiency** +**⚡ Superior Efficiency** -- **Ultra-Large Scale Support:** Supports MoE model training up to 1T parameters -- **Breakthrough Performance Bottleneck:** First time achieving FSDP training throughput surpassing traditional 3D parallel solutions on MoE models above 200B scale -- **Hardware Optimization:** Training efficiency surpasses NVIDIA H800 on Ascend A3 NPU supernodes +- **Massive scale:** Supports MoE training up to 1T parameters +- **Breakthrough performance:** First to achieve FSDP training throughput that surpasses traditional 3D parallel schemes for MoE models above 200B scale +- **Hardware optimization:** Achieves training efficiency on Ascend A3 Supernode that exceeds NVIDIA H800 .. figure:: ../assets/images/benchmark/structure.png @@ -121,12 +114,12 @@ Core Features 🔥 Roadmap ========== -XTuner V1 is committed to continuously improving the pretraining, instruction fine-tuning, and reinforcement learning training efficiency of ultra-large-scale MoE models, with a focus on optimizing Ascend NPU support. +XTuner V1 is committed to continuously improving training efficiency for pre-training, instruction fine-tuning, and reinforcement learning of ultra-large MoE models, with special focus on Ascend NPU optimization. 🚀 Training Engine ------------ +-------------------- -Our vision is to build XTuner V1 into a universal training backend that seamlessly integrates into a broader open-source ecosystem. +Our vision is to establish XTuner V1 as a versatile training backend that seamlessly integrates with the broader open-source ecosystem. +------------+-----------+----------+-----------+ | Model | GPU(FP8) | GPU(BF16)| NPU(BF16) | @@ -147,14 +140,14 @@ Our vision is to build XTuner V1 into a universal training backend that seamless +------------+-----------+----------+-----------+ -🧠 Algorithm Suite ------------ +🧠 Algorithm +-------------- -Algorithm components are rapidly iterating. Community contributions are welcome - use XTuner V1 to scale your algorithms to unprecedented scales! +The algorithm component is actively evolving. We welcome community contributions - with XTuner V1, scale your algorithms to unprecedented sizes! **Implemented** -- ✅ **Multimodal Pretraining** - Full support for vision-language model training +- ✅ **Multimodal Pre-training** - Full support for vision-language model training - ✅ **Multimodal Supervised Fine-tuning** - Optimized for instruction following - ✅ `GRPO `_ - Group Relative Policy Optimization @@ -166,9 +159,9 @@ Algorithm components are rapidly iterating. Community contributions are welcome ⚡ Inference Engine Integration ---------------- +----------------------------------- -Seamless integration with mainstream inference frameworks +Seamless deployment with leading inference frameworks: * |checked| LMDeploy * |unchecked| vLLM @@ -176,34 +169,34 @@ Seamless integration with mainstream inference frameworks -🤝 Contribution Guidelines ------------ +🤝 Contributing +----------------- -We thank all contributors for their efforts to improve and enhance XTuner. Please refer to the `Contribution Guidelines <.github/CONTRIBUTING.md>`_ to understand the relevant guidelines for participating in the project. +We appreciate all contributions to XTuner. Please refer to `Contributing Guide <.github/CONTRIBUTING.md>`_ for the contributing guideline. 🙏 Acknowledgments ------------ +------------------- -The development of XTuner V1 is deeply inspired and supported by excellent projects in the open-source community. We extend our sincere gratitude to the following pioneering projects: +The development of XTuner V1's training engine has been greatly inspired by and built upon the excellent work of the open-source community. We extend our sincere gratitude to the following pioneering projects: -**Training Engines:** +**Training Engine:** -- [Torchtitan](https://github.com/pytorch/torchtitan) - PyTorch native distributed training framework -- [Deepspeed](https://github.com/deepspeedai/DeepSpeed) - Microsoft deep learning optimization library -- [MindSpeed](https://gitee.com/ascend/MindSpeed) - Ascend high-performance training acceleration library -- [Megatron](https://github.com/NVIDIA/Megatron-LM) - NVIDIA large-scale Transformer training framework +- [Torchtitan](https://github.com/pytorch/torchtitan) - A PyTorch native platform for training generative AI models +- [Deepspeed](https://github.com/deepspeedai/DeepSpeed) - Microsoft's deep learning optimization library +- [MindSpeed](https://gitee.com/ascend/MindSpeed) - Ascend's high-performance training acceleration library +- [Megatron](https://github.com/NVIDIA/Megatron-LM) - NVIDIA's large-scale Transformer training framework. **Reinforcement Learning:** -XTuner V1's reinforcement learning capabilities draw on the excellent practices and experience of the following projects +XTuner V1's reinforcement learning capabilities have been enhanced through insights and best practices from: - [veRL](https://github.com/volcengine/verl) - Volcano Engine Reinforcement Learning for LLMs - [SLIME](https://github.com/THUDM/slime) - THU's scalable RLHF implementation - [AReal](https://github.com/inclusionAI/AReaL) - Ant Reasoning Reinforcement Learning for LLMs - [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) - An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray -We sincerely thank all contributors and maintainers of these projects for their continuous advancement of the large-scale model training field. +We are deeply grateful to all contributors and maintainers of these projects for advancing the field of large-scale model training. 🖊️ Citation @@ -219,6 +212,6 @@ We sincerely thank all contributors and maintainers of these projects for their } Open Source License -========== +====================== -This project adopts the `Apache License 2.0 Open Source License `_. At the same time, please comply with the licenses of the models and datasets used. \ No newline at end of file +The project is released under the `Apache License 2.0 `_. Please also adhere to the Licenses of models and datasets being used. diff --git a/docs/en/legacy_index.rst b/docs/en/legacy_index.rst deleted file mode 100644 index f6fd15bf2..000000000 --- a/docs/en/legacy_index.rst +++ /dev/null @@ -1,97 +0,0 @@ -.. xtuner documentation master file, created by - sphinx-quickstart on Tue Jan 9 16:33:06 2024. - You can adapt this file completely to your liking, but it should at least - contain the root `toctree` directive. - -Welcome to XTuner Chinese Documentation -================================== - -.. figure:: ./_static/image/logo.png - :align: center - :alt: xtuner - :class: no-scaled-link - -.. raw:: html - -

- LLM One-Stop Toolbox - -

- -

- - Star - Watch - Fork -

- - - -Documentation -------------- -.. toctree:: - :maxdepth: 2 - :caption: Getting Started - - legacy/get_started/installation.rst - legacy/get_started/quickstart.rst - -.. toctree:: - :maxdepth: 2 - :caption: Preparation - - legacy/preparation/pretrained_model.rst - legacy/preparation/prompt_template.rst - -.. toctree:: - :maxdepth: 2 - :caption: Training - - legacy/training/open_source_dataset.rst - legacy/training/custom_sft_dataset.rst - legacy/training/custom_pretrain_dataset.rst - legacy/training/multi_modal_dataset.rst - legacy/acceleration/train_large_scale_dataset.rst - legacy/training/modify_settings.rst - legacy/training/visualization.rst - -.. toctree:: - :maxdepth: 2 - :caption: DPO - - legacy/dpo/overview.md - legacy/dpo/quick_start.md - legacy/dpo/modify_settings.md - -.. toctree:: - :maxdepth: 2 - :caption: Reward Model - - legacy/reward_model/overview.md - legacy/reward_model/quick_start.md - legacy/reward_model/modify_settings.md - legacy/reward_model/preference_data.md - -.. toctree:: - :maxdepth: 2 - :caption: Accelerated Training - - legacy/acceleration/deepspeed.rst - legacy/acceleration/flash_attn.rst - legacy/acceleration/varlen_flash_attn.rst - legacy/acceleration/pack_to_max_length.rst - legacy/acceleration/length_grouped_sampler.rst - legacy/acceleration/train_extreme_long_sequence.rst - legacy/acceleration/hyper_parameters.rst - legacy/acceleration/benchmark.rst - - -.. toctree:: - :maxdepth: 1 - :caption: InternEvo Migration - - legacy/internevo_migration/differences.rst - legacy/internevo_migration/ftdp_dataset/tokenized_and_internlm2.rst - legacy/internevo_migration/ftdp_dataset/processed_and_internlm2.rst - legacy/internevo_migration/ftdp_dataset/processed_and_others.rst - legacy/internevo_migration/ftdp_dataset/processed_normal_chat.rst \ No newline at end of file diff --git a/docs/en/rl/tutorial/rl_grpo_trainer.md b/docs/en/rl/tutorial/rl_grpo_trainer.md index 95f0efd5a..a2ad02864 100644 --- a/docs/en/rl/tutorial/rl_grpo_trainer.md +++ b/docs/en/rl/tutorial/rl_grpo_trainer.md @@ -200,7 +200,7 @@ evaluator_cfg = EvaluatorConfig( In addition to the above generation and training configurations, we need to configure system required resources (such as GPU, CPU, memory), etc. Here we use the default resource configuration, example as follows. ```{code-block} python -from xtuner.v1.ray.accelerator import AcceleratorResourcesConfig +from xtuner.v1.ray.base import AcceleratorResourcesConfig resources = AcceleratorResourcesConfig( accelerator="GPU", num_accelerators_per_worker=1, @@ -263,4 +263,4 @@ Combine and save all the above configurations as a Python file (e.g., `train_grp XTUNER_USE_FA3=1 XTUNER_USE_LMDEPLOY=1 python train_grpo.py ``` -Congratulations! Now you have mastered the method of customizing `RLTrainer` through Python code, and can conduct reinforcement learning experiments more flexibly. \ No newline at end of file +Congratulations! Now you have mastered the method of customizing `RLTrainer` through Python code, and can conduct reinforcement learning experiments more flexibly. diff --git a/docs/zh_cn/get_started/index.rst b/docs/zh_cn/get_started/index.rst index d738c9777..4e8b4a495 100644 --- a/docs/zh_cn/get_started/index.rst +++ b/docs/zh_cn/get_started/index.rst @@ -2,7 +2,6 @@ :maxdepth: 1 :caption: 开始使用 - xtuner_v1.md installation.md sft.md mllm_sft.md