This is the official demo for the paper Breaking the Factorization Barrier in Diffusion Language Models.
Motivation and Intuition of CoDD. Left: Illustration of the misspecification gap. The plot reports the perplexity of LLaDA on the MathInstruct validation set across varying mask ratios. Curve (a) Sequential generation represents the ideal baseline (i.e., the true joint distribution learned by the model). When restricted to (b) One-step generation, the independence assumption causes significant performance degradation. The shaded region highlights this loss of perplexity, defined as the misspecification gap
conda env create -f environment.yml
conda activate coddOur evaluation uses a customized lm-evaluation-harness. Add it to your Python path:
export PYTHONPATH="${PYTHONPATH}:$(pwd)/lm-evaluation-harness"
cd lm-evaluation-harness
pip install -e .
pip install math_verify
cd ..To make this permanent, consider adding the above line to your ~/.bashrc or ~/.zshrc.
python example.pyThis script compares on a simple example:
- Base LLaDA-8B-Instruct: Standard block diffusion generation
- CoDD: Copula-guided block diffusion generation
We provide domain-specific Probabilistic Circuit (PC) guidance models for both LLaDA and Dream architectures. Use these with the --pc_ckpt argument to enable Copula-guided generation.
| Base Model | Domain / Task | Checkpoint ID |
|---|---|---|
| LLaDA-8B | Mathematical Reasoning | il18/llada-math-pc |
| Grade School Math | il18/llada-gsm-pc |
|
| Code Generation | il18/llada-code-pc |
|
| Dream-7B | Mathematical Reasoning | il18/dream-math-pc |
| Grade School Math | il18/dream-gsm-pc |
|
| Code Generation | il18/dream-code-pc |
Use ./eval/eval.sh to run evaluations on benchmarks (GSM8K, MATH500, MBPP, GPQA).
cd eval
./eval.sh --gpus 0 \
--run '--model_alias llada --llada_ckpt GSAI-ML/LLaDA-8B-Instruct --task math500 --alg low_confidence --num_steps 256'./eval.sh --gpus 0 \
--run '--model_alias llada --llada_ckpt GSAI-ML/LLaDA-8B-Instruct --task math500 --alg low_confidence --num_steps 256 --pc_ckpt il18/llada-math-pc --pc_temperature 0.2 --pc_frac 0.5'| Option | Description |
|---|---|
--gpus |
Comma-separated GPU IDs (e.g., 0,1,2) |
--run |
Arguments for a single evaluation run |
--output_dir |
Directory for results (default: results) |
--tag |
Optional tag for log files |
| Argument | Description |
|---|---|
--model_alias |
Model type: llada or dream |
--llada_ckpt |
LLaDA checkpoint path or HuggingFace repo |
--dream_ckpt |
Dream checkpoint path or HuggingFace repo |
--task |
Benchmark: gsm8k, math500, mbpp, gpqa |
--alg |
Remasking algorithm: low_confidence, random, entropy, margin, topprob |
--num_steps |
Number of diffusion steps |
--pc_ckpt |
Path or HuggingFace repo for PC model |
--pc_temperature |
PC guidance temperature (default: 0.7) |
--pc_frac |
Fraction of steps using PC guidance (default: 0.3) |
--block_length |
Block length for semi-autoregressive generation (default: 32) |
Run multiple evaluations in parallel across GPUs:
./eval.sh --gpus 0,1\
--run '--model_alias llada --llada_ckpt GSAI-ML/LLaDA-8B-Instruct --task gpqa --alg low_confidence --num_steps 256' \
--run '--model_alias llada --llada_ckpt GSAI-ML/LLaDA-8B-Instruct --task math500 --alg low_confidence --num_steps 256' \Logs are saved to eval/results/logs/.
@misc{li2026breakingfactorizationbarrierdiffusion,
title={Breaking the Factorization Barrier in Diffusion Language Models},
author={Ian Li and Zilei Shao and Benjie Wang and Rose Yu and Guy Van den Broeck and Anji Liu},
year={2026},
eprint={2603.00045},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2603.00045},
}The evaluation scripts in this repository are adapted from APD, building upon the EleutherAI lm-evaluation-harness.
