[FR] Kreuzberg as an alternative content exractor to Apache Tika

## Is your feature request related to a problem? Please describe.
According to the `.env.example` from the `opencloud-compose` repository, Apache Tika is disabled as a search extractor by default due to [performance reasons](https://github.com/opencloud-eu/opencloud-compose/blob/2b1476950b6869624b9762ca256dc6ba3d09aa03/.env.example#L196).


## Describe the solution you'd like
I would like to propose adding support for an additional content extractor based on the [Kreuzberg project](https://github.com/kreuzberg-dev/kreuzberg).

From Kreuzberg's README:  

> Extract text and metadata from a wide range of file formats (91+), generate embeddings and post-process at native speeds without needing a GPU.

> **Flexible deployment** – Use as library, CLI tool, REST API server, or MCP server

## Describe alternatives you've considered
N/A

## Additional context
I am not a developer so I cannot estimate the effort required to develop such a content extractor for OpenCloud, nor can I validate the quality or production-readiness of the mentioned project. However, I would happily receive feedback on the idea in general.

My research showed that the co-founder of Kreuzberg is also located in Berlin! ;)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] Kreuzberg as an alternative content exractor to Apache Tika #2507

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FR] Kreuzberg as an alternative content exractor to Apache Tika #2507

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions