Add dclm-core-22 to jupiter by harshraj172 · Pull Request #42 · OpenEuroLLM/oellm-cli

harshraj172 · 2026-02-18T02:50:59Z

To add lighteval for dclm on JUPITER we had to add another jupiter-lighteval.def as including the installations in the same .def file for jupiter was creating conflicting issues. Because Jupiter uses ARM64, and many of lighteval's runtime dependencies (spacy, underthesea, pyvi, etc) lack pre-built aarch64 wheels

geoalgo

LGTM but they are some conflicts to solve.

timurcarstensen · 2026-02-19T10:07:44Z

oellm/resources/task-groups.yaml

      - task: include_base_44_ukrainian
        subset: Ukrainian

+  dclm-core-22:


could you add the dataset field to either the task group or tasks? otherwise dataset pre-downloading (before job submission) will fail/not be available and then the jobs will fail unless you have internet access on the compute nodes. You can check global-mmlu-eu or generic-multilingual for an example of how this looks

timurcarstensen · 2026-02-19T10:10:03Z

.github/workflows/build-and-push-apptainer.yml

-
  build-push:
-    needs: setup-lambda
    strategy:


we moved to SkyPilot + Lambda Labs for the container build workflow so the actual workflows are now split between .github/workflows and .github/sky

timurcarstensen · 2026-02-19T10:13:01Z

Thanks for the contribution and making this work for Jupiter @harshraj172! Could you update your branch with the latest changes from main? We changed a few things about the github actions workflows (left you a comment about that) and testing.

timurcarstensen · 2026-02-19T10:14:38Z

oellm/resources/clusters.yaml

-  SINGULARITY_ARGS: "--nv --contain --env PYTHONNOUSERSITE=1"
+  EVAL_CONTAINER_IMAGE: "lm-eval-jupiter.sif"
+  LIGHTEVAL_CONTAINER_IMAGE: "lighteval-jupiter.sif"
+  SINGULARITY_ARGS: "--nv --contain --env PYTHONNOUSERSITE=1 --env SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt"


You'll probably also have to integrate this into the container downloading logic in oellm/utils.py::_ensure_singularity_image otherwise it'll only download the lm-eval container

harshraj172 · 2026-03-05T16:29:10Z

Some changes in this PR. @timurcarstensen @geoalgo @JeniaJitsev

Switched from lighteval to lm-eval-harness. The DCLM paper uses LLM Foundry for evaluation, which computes log-probs for multiple-choice and schema tasks. lm-eval handles these the same way, whereas lighteval uses a different approach for some of them. Pinned to lm-eval==0.4.9.2 because >v0.4.10 has a regression that breaks agieval_lsat_ar in few-shot mode. This also required pinning transformers<5.0.0 and datasets<4.0.0 for compatibility.
21 of the 22 CORE tasks are covered. The missing one is jeopardy, lm-eval has a jeopardy task but it's broken (the task config doesn't properly handle the generation format). Planning to look into fixing this upstream. One other thing: DCLM evaluates on squad v1 but lm-eval only ships squadv2, so that task's numbers won't be directly comparable to DCLM baselines. All other tasks match the paper's few-shot counts, metrics, and evaluation types exactly. Tested everything on qwen3-0.6B-base on jupiter

geoalgo

thank you @harshraj172
can you rebase so that we can merge to mainline?
my main comment is on not tying a specific version just for jupiter, otherwise it looks good to go once updated with mainline.

geoalgo · 2026-03-05T16:34:48Z

core.jpbl-s02-02.582022

are those intended? If not lets remove them.

geoalgo · 2026-03-05T16:40:48Z

containers/jupiter.def

+    uv pip install --system --break-system-packages "lm-eval==0.4.9.2" \
+        "transformers>=4.43.2,<5.0.0" "datasets<4.0.0" wandb sentencepiece tiktoken accelerate


This part is tricky, we do not want to tie the version per containers as we will get different results accross clusters. If you need a specific version for a fixed benchmark, the best would be to use a custom environment (which can be done by following https://github.com/OpenEuroLLM/oellm-cli/blob/main/docs/VENV.md happy to help if there is any issue with this, I tested it and it worked on lumi for instance).

Suggested change

uv pip install --system --break-system-packages "lm-eval==0.4.9.2" \

"transformers>=4.43.2,<5.0.0" "datasets<4.0.0" wandb sentencepiece tiktoken accelerate

uv pip install --system --break-system-packages lm-eval \

"transformers<=4.53.0" "datasets<4.0.0" wandb sentencepiece tiktoken accelerate

Yes, this is tricky. If we don’t pin the versions, agieval_lsat_ar will break. Do you think we should mention somewhere in docs/VENV.md that people should use a venv if they want to run all 21 tasks (otherwise it will only run 20)?

Thanks for rebasing, my only remaining point is about not pinning the version on the jupiter container.

What you suggest is a good idea, we can also possibly add another requirements.txt with those versions named with something like requirements-venv-dclm.txt (in addition to that one https://github.com/OpenEuroLLM/oellm-cli/blob/main/requirements-venv.txt) and add an entry in the readme.md (or VENV.md) which explains how to evaluate this.

Thanks for the review. I resolved the comments.

geoalgo · 2026-03-05T16:41:38Z

core.jpbl-s02-02.582022

those binaries are probably not intended, can you remove them?

oh, my bad.

geoalgo · 2026-03-06T09:17:11Z

containers/jupiter.def

+    uv pip install --system --break-system-packages lm-eval \
+        wandb sentencepiece tiktoken accelerate


Isnt it missing datasets and some other dependencies?

uv pip install --system --break-system-packages \ lm-eval \ transformers \ "datasets<4.0.0" \ wandb \ sentencepiece \ tiktoken \ accelerate \ nltk

If so lets merge this PR to add DCLM support and update the jupiter container in a follow-up since the PR has been open for a long time.

@geoalgo I found that lm-eval installs these already. Also nltk isn't required now as it was used with lighteval earlier.

geoalgo approved these changes Feb 18, 2026

View reviewed changes

timurcarstensen reviewed Feb 19, 2026

View reviewed changes

add dclm-core-21

35a1f83

harshraj172 force-pushed the harsh/dclm-core-22-jupiter branch from 23c87ea to 35a1f83 Compare March 5, 2026 16:19

geoalgo reviewed Mar 5, 2026

View reviewed changes

harshraj172 and others added 3 commits March 5, 2026 08:53

Merge branch 'main' into harsh/dclm-core-22-jupiter

911a45a

remove binaries

8885934

add requirements-dclm.txt

8526804

geoalgo approved these changes Mar 6, 2026

View reviewed changes

geoalgo merged commit 1b75fcd into main Mar 6, 2026
3 checks passed

geoalgo deleted the harsh/dclm-core-22-jupiter branch March 6, 2026 09:18

		uv pip install --system --break-system-packages "lm-eval==0.4.9.2" \
		"transformers>=4.43.2,<5.0.0" "datasets<4.0.0" wandb sentencepiece tiktoken accelerate

		uv pip install --system --break-system-packages lm-eval \
		wandb sentencepiece tiktoken accelerate

Conversation

harshraj172 commented Feb 18, 2026

Uh oh!

geoalgo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timurcarstensen commented Feb 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harshraj172 commented Mar 5, 2026

Uh oh!

geoalgo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harshraj172 Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

harshraj172 Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

geoalgo left a comment •

edited

Loading

harshraj172 Mar 5, 2026 •

edited

Loading

harshraj172 Mar 6, 2026 •

edited

Loading