Skip to content

Commit 43ef6ab

Browse files
pppppMcrazysteeaamJimmyMa99fanqiNO1Jianfeng777
authored
[Docs] Readthedocs ZH (#553)
* [Docs] Readthedocs (#304) * init readthedocs * add en docs * add zh docs * fix lint * [Fix] Support ZH Readthedocs (#305) * add zh yaml * test zh cn * test yaml path * pass * update conf.py * [Docs] Document optimization (#362) Document optimization * [Docs] Update Docs docs/en/get_started/installation.md (#364) * 更新中文 installation.md 完成中文的 安装-安装流程-最佳实践 & 安装-验证安装 * Update installation.md en * Update installation.md zh typo * [Docs] Refine Quick Start (#378) * [Docs] Add zh_cn quickstart * [Fix] Fix color rendering logic for github * [Fix] Fix comments * [Fix] Add hyperlinks * [Docs] Add en quickstart * [Fix] Fix comments * Update overview.md (#412) * Update overview.md * Update overview.md 已根据要求进行修改,请查阅 * Update overview.md 进一步的修正 * Update overview.md 根据要求的完善 * Merge branch 'main' into 'docs' (#463) * [Improve] Redesign the `prompt_template` (#294) * update * update cfgs * update * fix bugs * upload docs * rename * update * Revert "update cfgs" This reverts commit 93966aa. * update cfgs * update * rename * rename * fix bc * fix stop_word * fix * fix * Update prompt_template.md * [Fix] Fix errors about `stop_words` (#313) * fix bugs * Update mmbench.py * [Fix] Fix Mixtral LoRA setting (#312) set target_modules * [Feature] Support DeepSeek-MoE (#311) * support deepseek moe * update docs * update * update * [Fix] Set `torch.optim.AdamW` as the default optimizer (#318) fix * [FIx] Fix `pth_to_hf` for LLaVA model (#316) Update pth_to_hf.py * [Improve] Add `demo_data` examples (#278) * update examples * add examples * add json template config * rename * update * update * update * [Feature] Support InternLM2 (#321) * add cfgs * add internlm2 template * add dispatch * add docs * update readme * update * [Fix] Fix the resume of seed (#309) * fix * Update utils.py * [Feature] Accelerate `xtuner xxx` (#307) * accelerate cli * Update entry_point.py * Update entry_point.py --------- Co-authored-by: Zhihao Lin <[email protected]> * [Fix] Fix InternLM2 url (#325) * fix * update * Update README.md * Update README_zh-CN.md * [Fix] Limit the version of python, `>=3.8, <3.11` (#327) update * [Fix] Add `trust_remote_code=True` for AutoModel (#328) update * [Docs] Improve README (#326) * update * Update README.md * Update README.md * Update README.md * Update README_zh-CN.md * update * update * fix pre-commit * update * bump verion to v0.1.12 (#323) bump v0.1.12 * set dev version (#329) Update version.py * [Docs] Add LLaVA-InternLM2 results (#332) * update results * update * Update internlm2_chat template (#339) Update internlm2 template * [Fix] Fix examples demo_data configs (#334) fix * bump version to v0.1.13 (#340) update * set dev version (#341) update * [Feature] More flexible `TrainLoop` (#348) * add new loop * rename * fix pre-commit * add max_keep_ckpts * fix * update cfgs * update examples * fix * update * update llava * update * update * update * update * [Feature]Support CEPH (#266) * support petrelfs * fix deepspeed save/load/resume * add ENV to toggle petrelfs * support hf save_pretrained * patch deepspeed engine * [Improve] Add `--repetition-penalty` for `xtuner chat` (#351) fix * [Feature] Support MMBench DDP Evaluate (#300) * support ddp mmbench evaluate * Update xtuner/tools/mmbench.py Co-authored-by: Zhihao Lin <[email protected]> * Update xtuner/tools/mmbench.py Co-authored-by: Zhihao Lin <[email protected]> * update minimum version of mmengine * Update runtime.txt --------- Co-authored-by: Zhihao Lin <[email protected]> * [Fix] `KeyError` of `encode_fn` (#361) fix * [Fix] Fix `batch_size` of full fine-tuing LLaVA-InternLM2 (#360) fix * [Fix] Remove `system` for `alpaca_map_fn` (#363) update * [Fix] Use `DEFAULT_IMAGE_TOKEN` instead of `'<image>'` (#353) Update utils.py * [Feature] Efficient SFT (#302) * add local_attn_args_to_messagehub_hook * add internlm repo sampler * add internlm repo dataset and collate_fn * dispatch internlm1 and internlm2 local attn * add internlm2 config * add internlm1 and intenrlm2 config * add internlm2 template * fix replace_internlm1_rote bugs * add internlm1 and internlm2 config templates * change priority of EvaluateChatHook * fix docs * fix config * fix bug * set rotary_base according the latest internlm2 config * add llama local attn * add llama local attn * update intern_repo_dataset docs when using aliyun * support using both hf load_dataset and intern_repo packed_dataset * add configs * add opencompass doc * update opencompass doc * use T data order * use T data order * add config * add a tool to get data order * support offline processing untokenized dataset * add docs * add doc about only saving model weights * add doc about only saving model weights * dispatch mistral * add mistral template * add mistral template * fix torch_dtype * reset pre-commit-config * fix config * fix internlm_7b_full_intern_repo_dataset_template * update local_attn to varlen_attn * rename local_attn * fix InternlmRepoSampler and train.py to support resume * modify Packer to support varlen attn * support varlen attn in default pipeline * update mmengine version requirement to 0.10.3 * Update ceph.md * delete intern_repo_collate_fn * delete intern_repo_collate_fn * delete useless files * assert pack_to_max_length=True if use_varlen_attn=True * add varlen attn doc * add varlen attn to configs * delete useless codes * update * update * update configs * fix priority of ThroughputHook and flake8 ignore W504 * using map_fn to set length attr to dataset * support split=None in process_hf_dataset * add dataset_format_mapping * support preprocess ftdp and normal dataset * refactor process_hf_dataset * support pack dataset in process_untokenized_datasets * add xtuner_dataset_timeout * using gloo backend for monitored barrier * set gloo timeout * fix bugs * fix configs * refactor intern repo dataset docs * fix doc * fix lint --------- Co-authored-by: pppppM <[email protected]> Co-authored-by: pppppM <[email protected]> * [Fix] Add `attention_mask` for `default_collate_fn` (#371) fix * [Fix] Update requirements (#369) Update runtime.txt * [Fix] Fix rotary_base, add `colors_map_fn` to `DATASET_FORMAT_MAPPING` and rename 'internlm_repo' to 'intern_repo' (#372) * fix * rename internlm_repo to intern_repo * add InternlmRepoSampler for preventing bc break * add how to install flash_attn to doc * update (#377) * Delete useless codes and refactor process_untokenized_datasets (#379) * delete useless codes * refactor process_untokenized_datasets: add ftdp to dataset-format * fix lint * [Feature] support flash attn 2 in internlm1, internlm2 and llama (#381) support flash attn 2 in internlm1, internlm2 and llama * [Fix] Fix installation docs of mmengine in `intern_repo_dataset.md` (#384) update * [Fix] Update InternLM2 `apply_rotary_pos_emb` (#383) update * [Feature] support saving eval output before save checkpoint (#385) * support saving eval output before save checkpoint * refactor * [Fix] lr scheduler setting (#394) * fix lr scheduler setting * fix more --------- Co-authored-by: zilong.guo <[email protected]> Co-authored-by: LZHgrla <[email protected]> * [Fix] Remove pre-defined `system` of `alpaca_zh_map_fn` (#395) fix * [Feature] Support `Qwen1.5` (#407) * rename * update docs * update template * update * add cfgs * update * update * [Fix] Fix no space in chat output using InternLM2. (#357) (#404) * [Fix] Fix no space in chat output using InternLM2. (#357) * Update chat.py * Update utils.py * Update utils.py * fix pre-commit --------- Co-authored-by: Zhihao Lin <[email protected]> Co-authored-by: LZHgrla <[email protected]> * [Fix] typo: `--system-prompt` to `--system-template` (#406) fix * [Improve] Add `output_with_loss` for dataset process (#408) update * [Fix] Fix dispatch to support transformers>=4.36 & Add USE_TRITON_KERNEL environment variable (#411) * dispatch support transformers>=4.36 * add USE_TRITON_KERNEL environment variable * raise RuntimeError use triton kernels on cpu * fix lint * [Feature]Add InternLM2-1_8b configs (#396) * [Feature]Add InternLM2-Chat-1_8b full config * [Feature]Add InternLM2-Chat-1_8b full config * update --------- Co-authored-by: LZHgrla <[email protected]> Co-authored-by: Zhihao Lin <[email protected]> * [Fix] Fix `extract_json_objects` (#419) * [Fix] Fix pth_to_hf error (#426) fix * [Feature] Support `Gemma` (#429) * added gemma config and template * check config and make sure the consistancy * Update xtuner/configs/gemma/gemma_2b_base/gemma_2b_base_qlora_alpaca_e3.py Co-authored-by: Zhihao Lin <[email protected]> * Update xtuner/configs/gemma/gemma_2b_base/gemma_2b_base_full_alpaca_e3.py Co-authored-by: Zhihao Lin <[email protected]> * Update xtuner/configs/gemma/gemma_7b_base/gemma_7b_base_full_alpaca_e3.py Co-authored-by: Zhihao Lin <[email protected]> * Update xtuner/configs/gemma/gemma_7b_base/gemma_7b_base_qlora_alpaca_e3.py Co-authored-by: Zhihao Lin <[email protected]> * Update xtuner/utils/templates.py Co-authored-by: Zhihao Lin <[email protected]> * update * added required version * update * update --------- Co-authored-by: Zhihao Lin <[email protected]> Co-authored-by: LZHgrla <[email protected]> * add refcoco to llava (#425) * add base dataset * update dataset generation * update refcoco * add convert refcooc * add eval_refcoco * add config * update dataset * fix bug * fix bug * update data prepare * fix error * refactor eval_refcoco * fix bug * fix error * update readme * add entry_point * update config * update config * update entry point * update * update doc * update --------- Co-authored-by: jacky <[email protected]> * [Fix] Inconsistent BatchSize of `LengthGroupedSampler` (#436) update * bump version to v0.1.14 (#431) update * set dev version (#437) * Update version.py * Update version.py * [Bugs] Fix bugs when using EpochBasedRunner (#439) fix bugs when using epochbasedrunner * [Feature] Support processing ftdp dataset and custom dataset offline (#410) * support smart_tokenizer_and_embedding_resize * replace ast with json.loads * support list_dataset_format cli * add doc about ftdp and custom dataset * add custom dataset template * add args name to process_hf_dataset * use new process_untokenized_datasets * support tokenize_ftdp_datasets * add mistral_7b_w_tokenized_dataset config * update doc * update doc * add comments * fix data save path * smart_tokenizer_and_embedding_resize support zero3 * fix lint * add data format to internlm2_7b_full_finetune_custom_dataset_e1.py * add a data format example to configs associated with finetuning custom dataset * add a data format example to configs associated with finetuning custom dataset * fix lint * Update prompt_template.md (#441) 修改了一个错别字 * [Doc] Split finetune_custom_dataset.md to 6 parts (#445) * split finetune_custom_dataset.md to 6 parts * refactor custom_dataset and ftdp_dataset related docs * fix comments * fix pre-commit --------- Co-authored-by: pppppM <[email protected]> Co-authored-by: RangiLyu <[email protected]> Co-authored-by: whcao <[email protected]> Co-authored-by: pppppM <[email protected]> Co-authored-by: gzlong96 <[email protected]> Co-authored-by: zilong.guo <[email protected]> Co-authored-by: Ko Sung <[email protected]> Co-authored-by: 不要葱姜蒜 <[email protected]> Co-authored-by: fanqiNO1 <[email protected]> Co-authored-by: PommesPeter <[email protected]> Co-authored-by: LKJacky <[email protected]> Co-authored-by: jacky <[email protected]> Co-authored-by: xzw <[email protected]> * [Docs] Add `docs/zh_cn/preparation/pretrained_model.md` (#462) * fix pre-commit * update * Update pretrained_model.md * Update pretrained_model.md * fix pre-commit * Update pretrained_model.md * update * update * update * update * Update pretrained_model.md * [Docs] Add `docs/zh_cn/training/multi_modal_dataset.md` (#503) * update * update * [Docs] Improve readthedocs style (#545) * update style * update style * fix requirements * fix * fix * add logo * update * update * update * [Docs] `.md` to `.rst` (#544) * update rst * update rst * update rst * [Docs] Add `docs/zh_cn/training/custom_pretrain_dataset.rst` (#535) * update * update * update rst * [Docs] Add docs about training on large scale dataset (#517) * add train_on_large_scale_dataset doc * refine doc * add llava offline doc * refine doc * replace md with rst * refine rst * refine rst * [Docs] Add internevo migration related documents (#506) * add internevo related * fix comments * refine doc * rename internlm2_7b_w_tokenized_dataset.py to internlm2_7b_w_internevo_dataset.py * refine doc * replace md with rst * refine rst * refine rst * [Docs] Add `docs/zh_cn/training/modify_settings.rst` (#490) * update * update * update * update * update * update * Update modify_settings.md * Update modify_settings.md * update * Update docs/zh_cn/training/modify_settings.md Co-authored-by: Haian Huang(深度眸) <[email protected]> * update deepspeed * update rst * update rst --------- Co-authored-by: Haian Huang(深度眸) <[email protected]> * [Docs] Add `length_grouped_sampler.rst` (#511) * update * update * update * Update length_grouped_sampler.md * update rst * Update length_grouped_sampler.rst Co-authored-by: whcao <[email protected]> --------- Co-authored-by: whcao <[email protected]> * [Docs] Add accelerate related (#504) * add accelerate related * split accelerate docs * fix comments * add speed benchmark * explain why qlora can not be used with zero3 * refine doc * fix configs * refine doc * refine doc * refine configs * add benchmark to index.rst * refine doc * add hyper-param docs * refine doc * add explanation about memory cost optimization when using zero * add figure to show the speed comparison * refine figures * refine doc * fix figures * refine figures * update figures and benchmark configs * add pack rst * delete pack md * replace md with rst * replace md with rst * replace md with rst * replace md with rst * refine rst * refine rst * refine rst * refine rst * refine rst * refine rst * refine rst * refine rst * refine rst * refine rst * refine rst * refine rst * refine rst * refine rst --------- Co-authored-by: pppppM <[email protected]> * [Docs] Add visualization docs (#516) * add visualization docs * delete other visualization tools and add explanation about how to use tensorboard * replace md with rst --------- Co-authored-by: pppppM <[email protected]> * [Docs] Add docs about SFT with custom dataset (#514) * add custom sft dataset docs * add custom dataset template configs * add openai data format * refine doc * update (#2) * replace md with rst --------- Co-authored-by: Zhihao Lin <[email protected]> Co-authored-by: pppppM <[email protected]> * [Docs] Add `docs/zh_cn/training/open_source_dataset.rst` (#502) * update * update * update * update * format table * fix typo * update rst --------- Co-authored-by: pppppM <[email protected]> * [Docs] Add `docs/zh_cn/preparation/prompt_template.rst` (#475) * update * update * Update prompt_template.md * Update prompt_template.md * update * add tips * update * update rst --------- Co-authored-by: pppppM <[email protected]> * [Docs] Add Sequence Parallel documents (#505) * add sp related * add sequence parallel supported models * refine doc * Update docs/zh_cn/training/training_extreme_long_sequence.md Co-authored-by: Haian Huang(深度眸) <[email protected]> * refine doc * refine doc * test the capability boundary of zero3 * refine doc * test rst * test rst * add training speed figure * delete debug rst * sp need flash_attn * WIP * replace md with rst * refine rst * refine rst * add explanation about why pt 2.1 is not accepted * refine rst * refine rst * add loss curve --------- Co-authored-by: Haian Huang(深度眸) <[email protected]> Co-authored-by: pppppM <[email protected]> * [Docs] Update `docs/zh_cn` outline (#556) update * [Docs] Update `docs/en` theme (#557) * update * update * update * update * update * update * update * update * [Docs] Add tokenizer to sft in Case 2 (#584) add tokenizer to sft in Case 2 * [Docs] Improve the Rendering Effect of Readthedocs (#664) * refine get_start and training * fix acceleration * update maxdepth * refine internevo migration * refine internevo * fix typos * fix lint --------- Co-authored-by: zhengjie.xu <[email protected]> Co-authored-by: Ma Zhiming <[email protected]> Co-authored-by: fanqiNO1 <[email protected]> Co-authored-by: Jianfeng777 <[email protected]> Co-authored-by: Zhihao Lin <[email protected]> Co-authored-by: RangiLyu <[email protected]> Co-authored-by: whcao <[email protected]> Co-authored-by: gzlong96 <[email protected]> Co-authored-by: zilong.guo <[email protected]> Co-authored-by: Ko Sung <[email protected]> Co-authored-by: 不要葱姜蒜 <[email protected]> Co-authored-by: PommesPeter <[email protected]> Co-authored-by: LKJacky <[email protected]> Co-authored-by: jacky <[email protected]> Co-authored-by: xzw <[email protected]> Co-authored-by: Haian Huang(深度眸) <[email protected]>
1 parent b6aac32 commit 43ef6ab

File tree

165 files changed

+22365
-3
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

165 files changed

+22365
-3
lines changed

docs/en/.readthedocs.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
version: 2
2+
3+
build:
4+
os: ubuntu-22.04
5+
tools:
6+
python: "3.8"
7+
8+
formats:
9+
- epub
10+
11+
python:
12+
install:
13+
- requirements: requirements/docs.txt
14+
15+
sphinx:
16+
configuration: docs/en/conf.py

docs/en/Makefile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = .
9+
BUILDDIR = _build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
.header-logo {
2+
background-image: url("../image/logo.png");
3+
background-size: 177px 40px;
4+
height: 40px;
5+
width: 177px;
6+
}

docs/en/_static/image/logo.png

26.2 KB
Loading

docs/en/acceleration/benchmark.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Benchmark
2+
=========

docs/en/acceleration/deepspeed.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
DeepSpeed
2+
=========
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Flash Attention
2+
===============
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
HyperParameters
2+
===============
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Length Grouped Sampler
2+
======================
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Pack to Max Length
2+
==================

0 commit comments

Comments
 (0)