fix(infra): fix some dependency hells and add some lazy loading to reduce celery worker RAM usage #1

cubic-dev-local · 2025-09-24T16:37:47Z

## Description

https://linear.app/danswer/issue/DAN-2573/move-more-imports-onto-lazy-loading

so i have a script that i run to check memory usage for celery workers,

before this pr its ~600MB per worker

after its ~250 MB for some workers

docker container from 4.3GB -> 2.819GiB

to diagnose why, its not easy - its not that all pip dependencies are loaded into memory at worker start so i can just lazy load any, its specifically ones that get imported at runtime due to an actual import statement

this makes it very tricky to track down exactly what causes the 600 MB. literally had to trial and error suspiscious imports tracing starting from the celery worker main file

TBH the existing repo dependency graph is a little scuffed, one example branch that caused the worker to import llm stuff (there are like a dozen of these had to swift through to get all the memory offenders down):

app base
->

/Users/edwinluo/onyx/backend/onyx/background/celery/tasks/docprocessing/utils.py
->

redis connector
->

/Users/edwinluo/onyx/backend/onyx/redis/redis_connector_delete.py

->

/Users/edwinluo/onyx/backend/onyx/db/document.py

->

/Users/edwinluo/onyx/backend/onyx/db/feedback.py

->

/Users/edwinluo/onyx/backend/onyx/db/chat.py

->

/Users/edwinluo/onyx/backend/onyx/context/search/utils.py

->

/Users/edwinluo/onyx/backend/onyx/db/search_settings.py

->

/Users/edwinluo/onyx/backend/onyx/db/llm.py OR /Users/edwinluo/onyx/backend/onyx/natural_language_processing/search_nlp_models.py

->

/Users/edwinluo/onyx/backend/onyx/llm/utils.py (langchian, litellm, etc.)

How Has This Been Tested?

[Describe the tests you ran to verify your changes]

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

This PR should be backported (make sure to check that the backport attempt succeeds)
[Optional] Override Linear Check

Summary by cubic

Moves heavy imports to lazy-loading across indexing, LLM, NLP, and file-processing code to reduce worker memory and speed up startup. Also consolidates search doc conversion into SearchDoc and extracts PromptSnapshot to a shared schema. Addresses Linear DAN-2573 (Reduce Memory usage in Onyx).

Refactors
- Lazy-load litellm, tiktoken, openai, markitdown, read_pdf_file, instantiate_connector, and run_indexing_pipeline within functions.
- Move chunks_or_sections_to_search_docs into SearchDoc as classmethods (plus from_inference_section/from_inference_chunk); remove the utils version.
- Extract PromptSnapshot to onyx.chat.prompt_builder.schemas and update imports.
- Redis connector delete: remove direct DB import to break import chain; per-document enqueue loop is currently disabled.
Migration
- Replace chunks_or_sections_to_search_docs(...) with SearchDoc.chunks_or_sections_to_search_docs(...).
- Import PromptSnapshot from onyx.chat.prompt_builder.schemas.

---

Based on: onyx-dot-app/onyx#5478

cubic-dev-ai

5 issues found across 6 files

Prompt for AI agents (all 5 issues)


Understand the root cause of the following 5 issues and fix them.


<file name="backend/onyx/context/search/models.py">

<violation number="1" location="backend/onyx/context/search/models.py:360">
The core logic within `chunks_or_sections_to_search_docs` for converting `InferenceChunk` or `InferenceSection.center_chunk` to `SearchDoc` is a direct duplication of `SearchDoc.from_inference_chunk()`. This method should be refactored to reuse `from_inference_chunk` and `from_inference_section`.</violation>

<violation number="2" location="backend/onyx/context/search/models.py:362">
Quoted type annotation reduces clarity and type tool effectiveness; remove quotes for consistency with the rest of the module.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Stylistic only; quoted forward refs are used throughout the module and avoid forward-ref evaluation issues. No functional impact.</violation>

<violation number="3" location="backend/onyx/context/search/models.py:379">
Accessing source_links with [0] can raise KeyError; use .get(0) to safely retrieve the zero-index entry when present.</violation>

<violation number="4" location="backend/onyx/context/search/models.py:431">
Accessing source_links with [0] can raise KeyError; replace with .get(0) to avoid exceptions when key 0 is missing.</violation>
</file>

<file name="backend/onyx/server/query_and_chat/query_backend.py">

<violation number="1" location="backend/onyx/server/query_and_chat/query_backend.py:76">
This PR&#39;s goal of reducing memory via lazy-loading is undermined by `onyx/llm/utils.py`, which still eagerly imports heavy libraries like `litellm` and `tiktoken` at the top level. This file&#39;s import chain causes these libraries to be loaded on startup, negating the memory savings from other lazy-loading efforts.</violation>
</file>

_{React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.}

cubic-dev-ai · 2025-09-24T16:51:12Z

backend/onyx/context/search/models.py

    is_internet: bool = False

+    @classmethod
+    def chunks_or_sections_to_search_docs(


The core logic within chunks_or_sections_to_search_docs for converting InferenceChunk or InferenceSection.center_chunk to SearchDoc is a direct duplication of SearchDoc.from_inference_chunk(). This method should be refactored to reuse from_inference_chunk and from_inference_section.

Prompt for AI agents

Address the following comment on backend/onyx/context/search/models.py at line 360: <comment>The core logic within `chunks_or_sections_to_search_docs` for converting `InferenceChunk` or `InferenceSection.center_chunk` to `SearchDoc` is a direct duplication of `SearchDoc.from_inference_chunk()`. This method should be refactored to reuse `from_inference_chunk` and `from_inference_section`.</comment> <file context> @@ -355,6 +356,97 @@ class SearchDoc(BaseModel): is_internet: bool = False + @classmethod + def chunks_or_sections_to_search_docs( + cls, + items: "Sequence[InferenceChunk | InferenceSection] | None", </file context>

[internal] Confidence score: 9.5/10

[internal] Posted by: Duplicate Detection Agent

cubic-dev-ai · 2025-09-24T16:51:12Z

backend/onyx/server/query_and_chat/query_backend.py

    matching_chunks = document_index.admin_retrieval(query=query, filters=final_filters)

-    documents = chunks_or_sections_to_search_docs(matching_chunks)
+    documents = SearchDoc.chunks_or_sections_to_search_docs(matching_chunks)


This PR's goal of reducing memory via lazy-loading is undermined by onyx/llm/utils.py, which still eagerly imports heavy libraries like litellm and tiktoken at the top level. This file's import chain causes these libraries to be loaded on startup, negating the memory savings from other lazy-loading efforts.

Prompt for AI agents

Address the following comment on backend/onyx/server/query_and_chat/query_backend.py at line 76: <comment>This PR's goal of reducing memory via lazy-loading is undermined by `onyx/llm/utils.py`, which still eagerly imports heavy libraries like `litellm` and `tiktoken` at the top level. This file's import chain causes these libraries to be loaded on startup, negating the memory savings from other lazy-loading efforts.</comment> <file context> @@ -74,7 +73,7 @@ def admin_search( matching_chunks = document_index.admin_retrieval(query=query, filters=final_filters) - documents = chunks_or_sections_to_search_docs(matching_chunks) + documents = SearchDoc.chunks_or_sections_to_search_docs(matching_chunks) # Deduplicate documents by id </file context>

[internal] Confidence score: 10/10

[internal] Posted by: System Design Agent

cubic-dev-ai · 2025-09-24T16:51:12Z

backend/onyx/context/search/models.py

+            chunk_ind=inference_chunk.chunk_id,
+            semantic_identifier=inference_chunk.semantic_identifier or "Unknown",
+            link=(
+                inference_chunk.source_links[0]


Accessing source_links with [0] can raise KeyError; replace with .get(0) to avoid exceptions when key 0 is missing.

Prompt for AI agents

Address the following comment on backend/onyx/context/search/models.py at line 431: <comment>Accessing source_links with [0] can raise KeyError; replace with .get(0) to avoid exceptions when key 0 is missing.</comment> <file context> @@ -355,6 +356,97 @@ class SearchDoc(BaseModel): + chunk_ind=inference_chunk.chunk_id, + semantic_identifier=inference_chunk.semantic_identifier or "Unknown", + link=( + inference_chunk.source_links[0] + if inference_chunk.source_links + else None </file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

cubic-dev-ai · 2025-09-24T16:51:12Z

backend/onyx/context/search/models.py

+                ).document_id,
+                chunk_ind=chunk.chunk_id,
+                semantic_identifier=chunk.semantic_identifier or "Unknown",
+                link=chunk.source_links[0] if chunk.source_links else None,


Accessing source_links with [0] can raise KeyError; use .get(0) to safely retrieve the zero-index entry when present.

Prompt for AI agents

Address the following comment on backend/onyx/context/search/models.py at line 379: <comment>Accessing source_links with [0] can raise KeyError; use .get(0) to safely retrieve the zero-index entry when present.</comment> <file context> @@ -355,6 +356,97 @@ class SearchDoc(BaseModel): + ).document_id, + chunk_ind=chunk.chunk_id, + semantic_identifier=chunk.semantic_identifier or "Unknown", + link=chunk.source_links[0] if chunk.source_links else None, + blurb=chunk.blurb, + source_type=chunk.source_type, </file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

cubic-dev-ai · 2025-09-24T16:51:12Z

backend/onyx/context/search/models.py

+    @classmethod
+    def chunks_or_sections_to_search_docs(
+        cls,
+        items: "Sequence[InferenceChunk | InferenceSection] | None",


Quoted type annotation reduces clarity and type tool effectiveness; remove quotes for consistency with the rest of the module.

DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
• GPT-5: Stylistic only; quoted forward refs are used throughout the module and avoid forward-ref evaluation issues. No functional impact.

Prompt for AI agents

Address the following comment on backend/onyx/context/search/models.py at line 362: <comment>Quoted type annotation reduces clarity and type tool effectiveness; remove quotes for consistency with the rest of the module. DEV MODE: This violation would have been filtered out by GPT-5. Reasoning: • **GPT-5**: Stylistic only; quoted forward refs are used throughout the module and avoid forward-ref evaluation issues. No functional impact.</comment> <file context> @@ -355,6 +356,97 @@ class SearchDoc(BaseModel): + @classmethod + def chunks_or_sections_to_search_docs( + cls, + items: "Sequence[InferenceChunk | InferenceSection] | None", + ) -> list["SearchDoc"]: + """Convert a sequence of InferenceChunk or InferenceSection objects to SearchDoc objects.""" </file context>

[internal] Confidence score: 8/10

[internal] Posted by: General AI Review Agent

.

51afca7

cubic-dev-ai bot reviewed Sep 24, 2025

View reviewed changes

cubic-dev-local bot mentioned this pull request Sep 25, 2025

fix(infra): fix some dependency hells and add some lazy loading to reduce celery worker RAM usage #3

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(infra): fix some dependency hells and add some lazy loading to reduce celery worker RAM usage #1

fix(infra): fix some dependency hells and add some lazy loading to reduce celery worker RAM usage #1

Uh oh!

cubic-dev-local bot commented Sep 24, 2025

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

Uh oh!

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

Uh oh!

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

Uh oh!

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

Uh oh!

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(infra): fix some dependency hells and add some lazy loading to reduce celery worker RAM usage #1

Are you sure you want to change the base?

fix(infra): fix some dependency hells and add some lazy loading to reduce celery worker RAM usage #1

Uh oh!

Conversation

cubic-dev-local bot commented Sep 24, 2025

How Has This Been Tested?

Backporting (check the box to trigger backport action)

Summary by cubic

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading

cubic-dev-ai bot Sep 24, 2025 •

edited

Loading