- [ ] summarize benchmarks from the following papers: * [Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor](https://arxiv.org/pdf/2212.09689.pdf) * [SELF-INSTRUCT: Aligning Language Model with Self Generated Instructions](https://arxiv.org/pdf/2212.10560.pdf) * [InstructGPT](https://arxiv.org/pdf/2203.02155.pdf) * [PAL: Program-aided Language Models](https://arxiv.org/pdf/2211.10435.pdf) * [Scaling instruction-finetuned language models](https://arxiv.org/abs/2210.11416) * [OPT-IML : Scaling Language Model Instruction Meta Learning through the Lens of Generalization](https://arxiv.org/pdf/2212.12017.pdf) ## List of Commonly-used Benchmark * LMentry: * BIG-bench: Hard: * ....