Skip to content

Conversation

@cubic-dev-local
Copy link

## Description

https://linear.app/danswer/issue/DAN-2573/move-more-imports-onto-lazy-loading

so i have a script that i run to check memory usage for celery workers,

before this pr its ~600MB per worker

after its ~250 MB for some workers

docker container from 4.3GB -> 2.819GiB

to diagnose why, its not easy - its not that all pip dependencies are loaded into memory at worker start so i can just lazy load any, its specifically ones that get imported at runtime due to an actual import statement

this makes it very tricky to track down exactly what causes the 600 MB. literally had to trial and error suspiscious imports tracing starting from the celery worker main file

TBH the existing repo dependency graph is a little scuffed, one example branch that caused the worker to import llm stuff (there are like a dozen of these had to swift through to get all the memory offenders down):

app base
->

/Users/edwinluo/onyx/backend/onyx/background/celery/tasks/docprocessing/utils.py
->

redis connector
->

/Users/edwinluo/onyx/backend/onyx/redis/redis_connector_delete.py

->

/Users/edwinluo/onyx/backend/onyx/db/document.py

->

/Users/edwinluo/onyx/backend/onyx/db/feedback.py

->

/Users/edwinluo/onyx/backend/onyx/db/chat.py

->

/Users/edwinluo/onyx/backend/onyx/context/search/utils.py

->

/Users/edwinluo/onyx/backend/onyx/db/search_settings.py

->

/Users/edwinluo/onyx/backend/onyx/db/llm.py OR /Users/edwinluo/onyx/backend/onyx/natural_language_processing/search_nlp_models.py

->

/Users/edwinluo/onyx/backend/onyx/llm/utils.py (langchian, litellm, etc.)

How Has This Been Tested?

[Describe the tests you ran to verify your changes]

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

Summary by cubic

Moves heavy imports to lazy-loading across indexing, LLM, NLP, and file-processing code to reduce worker memory and speed up startup. Also consolidates search doc conversion into SearchDoc and extracts PromptSnapshot to a shared schema. Addresses Linear DAN-2573 (Reduce Memory usage in Onyx).

  • Refactors

    • Lazy-load litellm, tiktoken, openai, markitdown, read_pdf_file, instantiate_connector, and run_indexing_pipeline within functions.
    • Move chunks_or_sections_to_search_docs into SearchDoc as classmethods (plus from_inference_section/from_inference_chunk); remove the utils version.
    • Extract PromptSnapshot to onyx.chat.prompt_builder.schemas and update imports.
    • Redis connector delete: remove direct DB import to break import chain; per-document enqueue loop is currently disabled.
  • Migration

    • Replace chunks_or_sections_to_search_docs(...) with SearchDoc.chunks_or_sections_to_search_docs(...).
    • Import PromptSnapshot from onyx.chat.prompt_builder.schemas.
---

Based on: onyx-dot-app/onyx#5478

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 6 files

Prompt for AI agents (all 5 issues)

Understand the root cause of the following 5 issues and fix them.


<file name="backend/onyx/context/search/models.py">

<violation number="1" location="backend/onyx/context/search/models.py:360">
The core logic within `chunks_or_sections_to_search_docs` for converting `InferenceChunk` or `InferenceSection.center_chunk` to `SearchDoc` is a direct duplication of `SearchDoc.from_inference_chunk()`. This method should be refactored to reuse `from_inference_chunk` and `from_inference_section`.</violation>

<violation number="2" location="backend/onyx/context/search/models.py:362">
Quoted type annotation reduces clarity and type tool effectiveness; remove quotes for consistency with the rest of the module.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Stylistic only; quoted forward refs are used throughout the module and avoid forward-ref evaluation issues. No functional impact.</violation>

<violation number="3" location="backend/onyx/context/search/models.py:379">
Accessing source_links with [0] can raise KeyError; use .get(0) to safely retrieve the zero-index entry when present.</violation>

<violation number="4" location="backend/onyx/context/search/models.py:431">
Accessing source_links with [0] can raise KeyError; replace with .get(0) to avoid exceptions when key 0 is missing.</violation>
</file>

<file name="backend/onyx/server/query_and_chat/query_backend.py">

<violation number="1" location="backend/onyx/server/query_and_chat/query_backend.py:76">
This PR&#39;s goal of reducing memory via lazy-loading is undermined by `onyx/llm/utils.py`, which still eagerly imports heavy libraries like `litellm` and `tiktoken` at the top level. This file&#39;s import chain causes these libraries to be loaded on startup, negating the memory savings from other lazy-loading efforts.</violation>
</file>

React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.

is_internet: bool = False

@classmethod
def chunks_or_sections_to_search_docs(
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core logic within chunks_or_sections_to_search_docs for converting InferenceChunk or InferenceSection.center_chunk to SearchDoc is a direct duplication of SearchDoc.from_inference_chunk(). This method should be refactored to reuse from_inference_chunk and from_inference_section.

Prompt for AI agents
Address the following comment on backend/onyx/context/search/models.py at line 360:

<comment>The core logic within `chunks_or_sections_to_search_docs` for converting `InferenceChunk` or `InferenceSection.center_chunk` to `SearchDoc` is a direct duplication of `SearchDoc.from_inference_chunk()`. This method should be refactored to reuse `from_inference_chunk` and `from_inference_section`.</comment>

<file context>
@@ -355,6 +356,97 @@ class SearchDoc(BaseModel):
     is_internet: bool = False
 
+    @classmethod
+    def chunks_or_sections_to_search_docs(
+        cls,
+        items: &quot;Sequence[InferenceChunk | InferenceSection] | None&quot;,
</file context>

[internal] Confidence score: 9.5/10

[internal] Posted by: Duplicate Detection Agent

Fix with Cubic

matching_chunks = document_index.admin_retrieval(query=query, filters=final_filters)

documents = chunks_or_sections_to_search_docs(matching_chunks)
documents = SearchDoc.chunks_or_sections_to_search_docs(matching_chunks)
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR's goal of reducing memory via lazy-loading is undermined by onyx/llm/utils.py, which still eagerly imports heavy libraries like litellm and tiktoken at the top level. This file's import chain causes these libraries to be loaded on startup, negating the memory savings from other lazy-loading efforts.

Prompt for AI agents
Address the following comment on backend/onyx/server/query_and_chat/query_backend.py at line 76:

<comment>This PR&#39;s goal of reducing memory via lazy-loading is undermined by `onyx/llm/utils.py`, which still eagerly imports heavy libraries like `litellm` and `tiktoken` at the top level. This file&#39;s import chain causes these libraries to be loaded on startup, negating the memory savings from other lazy-loading efforts.</comment>

<file context>
@@ -74,7 +73,7 @@ def admin_search(
     matching_chunks = document_index.admin_retrieval(query=query, filters=final_filters)
 
-    documents = chunks_or_sections_to_search_docs(matching_chunks)
+    documents = SearchDoc.chunks_or_sections_to_search_docs(matching_chunks)
 
     # Deduplicate documents by id
</file context>

[internal] Confidence score: 10/10

[internal] Posted by: System Design Agent

Fix with Cubic

chunk_ind=inference_chunk.chunk_id,
semantic_identifier=inference_chunk.semantic_identifier or "Unknown",
link=(
inference_chunk.source_links[0]
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing source_links with [0] can raise KeyError; replace with .get(0) to avoid exceptions when key 0 is missing.

Prompt for AI agents
Address the following comment on backend/onyx/context/search/models.py at line 431:

<comment>Accessing source_links with [0] can raise KeyError; replace with .get(0) to avoid exceptions when key 0 is missing.</comment>

<file context>
@@ -355,6 +356,97 @@ class SearchDoc(BaseModel):
+            chunk_ind=inference_chunk.chunk_id,
+            semantic_identifier=inference_chunk.semantic_identifier or &quot;Unknown&quot;,
+            link=(
+                inference_chunk.source_links[0]
+                if inference_chunk.source_links
+                else None
</file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

Fix with Cubic

).document_id,
chunk_ind=chunk.chunk_id,
semantic_identifier=chunk.semantic_identifier or "Unknown",
link=chunk.source_links[0] if chunk.source_links else None,
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing source_links with [0] can raise KeyError; use .get(0) to safely retrieve the zero-index entry when present.

Prompt for AI agents
Address the following comment on backend/onyx/context/search/models.py at line 379:

<comment>Accessing source_links with [0] can raise KeyError; use .get(0) to safely retrieve the zero-index entry when present.</comment>

<file context>
@@ -355,6 +356,97 @@ class SearchDoc(BaseModel):
+                ).document_id,
+                chunk_ind=chunk.chunk_id,
+                semantic_identifier=chunk.semantic_identifier or &quot;Unknown&quot;,
+                link=chunk.source_links[0] if chunk.source_links else None,
+                blurb=chunk.blurb,
+                source_type=chunk.source_type,
</file context>

[internal] Confidence score: 9/10

[internal] Posted by: General AI Review Agent

Fix with Cubic

@classmethod
def chunks_or_sections_to_search_docs(
cls,
items: "Sequence[InferenceChunk | InferenceSection] | None",
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quoted type annotation reduces clarity and type tool effectiveness; remove quotes for consistency with the rest of the module.

    DEV MODE: This violation would have been filtered out by GPT-5.

Reasoning:
GPT-5: Stylistic only; quoted forward refs are used throughout the module and avoid forward-ref evaluation issues. No functional impact.

Prompt for AI agents
Address the following comment on backend/onyx/context/search/models.py at line 362:

<comment>Quoted type annotation reduces clarity and type tool effectiveness; remove quotes for consistency with the rest of the module.

        DEV MODE: This violation would have been filtered out by GPT-5.
Reasoning:
• **GPT-5**: Stylistic only; quoted forward refs are used throughout the module and avoid forward-ref evaluation issues. No functional impact.</comment>

<file context>
@@ -355,6 +356,97 @@ class SearchDoc(BaseModel):
+    @classmethod
+    def chunks_or_sections_to_search_docs(
+        cls,
+        items: &quot;Sequence[InferenceChunk | InferenceSection] | None&quot;,
+    ) -&gt; list[&quot;SearchDoc&quot;]:
+        &quot;&quot;&quot;Convert a sequence of InferenceChunk or InferenceSection objects to SearchDoc objects.&quot;&quot;&quot;
</file context>

[internal] Confidence score: 8/10

[internal] Posted by: General AI Review Agent

Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants