-
Notifications
You must be signed in to change notification settings - Fork 173
Description
Is your feature request related to a problem? Please describe.
According to the .env.example from the opencloud-compose repository, Apache Tika is disabled as a search extractor by default due to performance reasons.
Describe the solution you'd like
I would like to propose adding support for an additional content extractor based on the Kreuzberg project.
From Kreuzberg's README:
Extract text and metadata from a wide range of file formats (91+), generate embeddings and post-process at native speeds without needing a GPU.
Flexible deployment – Use as library, CLI tool, REST API server, or MCP server
Describe alternatives you've considered
N/A
Additional context
I am not a developer so I cannot estimate the effort required to develop such a content extractor for OpenCloud, nor can I validate the quality or production-readiness of the mentioned project. However, I would happily receive feedback on the idea in general.
My research showed that the co-founder of Kreuzberg is also located in Berlin! ;)