Skip to content

Pinned Loading

  1. OLMo OLMo Public

    Modeling, training, eval, and inference code for OLMo

    Python 6.2k 683

  2. dolma dolma Public

    Data and tools for generating and inspecting OLMo pre-training data.

    Python 1.4k 161

  3. ai2thor ai2thor Public

    An open-source platform for Visual AI.

    C# 1.6k 265

  4. olmocr olmocr Public

    Toolkit for linearizing PDFs for LLM datasets/training

    Python 16.2k 1.2k

  5. OLMoE OLMoE Public

    OLMoE: Open Mixture-of-Experts Language Models

    Jupyter Notebook 923 86

Repositories

Showing 10 of 534 repositories
  • olmoearth_projects Public

    OlmoEarth projects

    allenai/olmoearth_projects’s past year of commit activity
    Python 46 5 1 1 Updated Dec 10, 2025
  • OLMo-core Public

    PyTorch building blocks for the OLMo ecosystem

    allenai/OLMo-core’s past year of commit activity
    Python 524 Apache-2.0 92 7 41 Updated Dec 10, 2025
  • rslearn Public

    A tool for developing remote sensing datasets and models.

    allenai/rslearn’s past year of commit activity
    Python 59 Apache-2.0 10 20 5 Updated Dec 10, 2025
  • olmo-cookbook Public

    OLMost every training recipe you need to perform data interventions with the OLMo family of models.

    allenai/olmo-cookbook’s past year of commit activity
    Python 57 Apache-2.0 11 1 31 Updated Dec 10, 2025
  • open-instruct Public

    AllenAI's post-training codebase

    allenai/open-instruct’s past year of commit activity
    Python 3,412 Apache-2.0 471 12 (1 issue needs help) 42 Updated Dec 10, 2025
  • olmocr Public

    Toolkit for linearizing PDFs for LLM datasets/training

    allenai/olmocr’s past year of commit activity
    Python 16,174 Apache-2.0 1,246 31 13 Updated Dec 10, 2025
  • duplodocus Public

    Tooling for exact and MinHash deduplication of large-scale text datasets

    allenai/duplodocus’s past year of commit activity
    Rust 40 Apache-2.0 3 0 1 Updated Dec 10, 2025
  • allenai/rslearn_projects’s past year of commit activity
    Python 16 Apache-2.0 6 15 6 Updated Dec 10, 2025
  • PeerRead Public

    Data and code for Kang et al., NAACL 2018's paper titled "A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications"

    allenai/PeerRead’s past year of commit activity
    Python 424 108 5 3 Updated Dec 9, 2025
  • dnw Public

    Discovering Neural Wirings (https://arxiv.org/abs/1906.00586)

    allenai/dnw’s past year of commit activity
    Python 136 17 2 6 Updated Dec 9, 2025