Skip to content

Conversation

@daixque
Copy link
Contributor

@daixque daixque commented Jul 22, 2025

Overview

This PR is related to the Elasticsearch change: elastic/elasticsearch#131679

In the above PR, Elasticsearch supports sparse embeddings including non-ELSER models. A significant example of a sparse vector model is the SPLADE model, which is a reference model for ELSER.
To inform Elasticsearch that the target model is for a SPLADE type one, this PR introduces a new expansion_type parameter when it calls the create trained model API.

How Eland detects the SPLADE model

Eland identifies the model as a SPLADE model by checking the dimention of the output tensor. If the second dimension of the output tensor is over 1, it is considered a SPLADE model. This is because SPLADE models typically output embeddings per token, which is different from ELSER.

Related

elif self._task_type == "text_expansion":
sample_embedding = self._traceable_model.sample_output()
if type(sample_embedding) is tuple:
text_embedding = sample_embedding[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename text_embedding to sparse_embedding

Copy link
Contributor Author

@daixque daixque Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current codebase, text_expansion is used in anywhere. (No sparse_embedding)
https://github.com/search?q=repo%3Aelastic%2Feland%20text_expansion&type=code

% grep -inR "text_expansion" eland tests | grep -v "Binary file"
eland/ml/pytorch/transformers.py:76:    "text_expansion",
eland/ml/pytorch/transformers.py:97:    "text_expansion": TextExpansionInferenceOptions,
eland/ml/pytorch/transformers.py:557:        elif self._task_type == "text_expansion":
eland/ml/pytorch/transformers.py:747:        if self._task_type == "text_expansion":
eland/ml/pytorch/nlp_ml_model.py:320:        super().__init__(configuration_type="text_expansion")
tests/ml/pytorch/test_pytorch_model_config_pytest.py:149:            "text_expansion",
tests/ml/pytorch/test_pytorch_model_config_pytest.py:217:            if task_type == "text_expansion":

Should I rename everything? It will cause CLI interface change. Should we keep --task-type=text_expansion for the compatibility? (I feel that renaming should be another PR)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok thanks. Yes the rename is not necessary in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants