This crate provides simple pipelines that can be used out-of-the box to perform text-embedding and re-ranking using ONNX models.
They are built with 🧩 orp (which relies on the 🦀 ort runtime), and use 🤗 tokenizers for token encoding.
[dependencies]
"gte-rs" = "0.9.1"
"orp" = "0.9.2"Embedding:
let params = Parameters::default();
let pipeline = TextEmbeddingPipeline::new("gte-modernbert-base/tokenizer.json", ¶ms)?;
let model = Model::new("gte-modernbert-base/model.onnx", RuntimeParameters::default())?;
let inputs = TextInput::from_str(&[
"text content",
"some more content",
//...
]);
let embeddings = model.inference(inputs, &pipeline, ¶ms)?;Re-ranking:
let params = Parameters::default();
let pipeline = RerankingPipeline::new("gte-reranker-modernbert-base/tokenizer.json", ¶ms)?;
let model = Model::new("gte-reranker-modernbert-base/model.onnx", RuntimeParameters::default())?;
let inputs = TextInput::from_str(&[
("one candidate", "query"),
("another candidate", "query"),
//...
]);
let similarities = model.inference(inputs, &pipeline, ¶ms)?;Please refer the the source code in examples for complete examples.
For english language, the gte-modernbert-base model outperforms larger models on retrieval with only 149M parameters, and runs efficiently on GPU and CPU. The gte-reranker-modernbert-base version does re-ranking with similar characteristics. This post provides interesting insights about them.
This crate should be usable out-of-the box with other models, or easily adapted to other ones. Please report your own tests or requirements!
This project follows the same principles as the ones below. Refer to their documentation for more details:
- 🌿 gline-rs: inference engine for GLiNER models
- 🏷️ gliclass-rs: inference engine for GLiClass models