Skip to content

Commit 168f695

Browse files
committed
colm
1 parent 7a9b2f7 commit 168f695

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

_bibliography/papers.bib

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,8 @@ @article{wang2025teachingmodelsunderstandbut
1616
title={Teaching Models to Understand (but not Generate) High-risk Data},
1717
author={Ryan Wang and Matthew Finlayson and Luca Soldaini and Swabha Swayamdipta and Robin Jia},
1818
year={2025},
19-
eprint={2505.03052},
20-
abbr={arXiv},
21-
journal={Under Review},
19+
abbr={CoLM},
20+
journal={Conference on Language Modeling},
2221
url={https://arxiv.org/abs/2505.03052},
2322
abstract={Language model developers typically filter out high-risk content -- such as toxic or copyrighted text -- from their pre-training data to prevent models from generating similar outputs. However, removing such data altogether limits models' ability to recognize and appropriately respond to harmful or sensitive content. In this paper, we introduce Selective Loss to Understand but Not Generate (SLUNG), a pre-training paradigm through which models learn to understand high-risk data without learning to generate it. Instead of uniformly applying the next-token prediction loss, SLUNG selectively avoids incentivizing the generation of high-risk tokens while ensuring they remain within the model's context window. As the model learns to predict low-risk tokens that follow high-risk ones, it is forced to understand the high-risk content. Through our experiments, we show that SLUNG consistently improves models' understanding of high-risk data (e.g., ability to recognize toxic content) without increasing its generation (e.g., toxicity of model responses). Overall, our SLUNG paradigm enables models to benefit from high-risk text that would otherwise be filtered out.},
2423
}

0 commit comments

Comments
 (0)