Skip to content

Conversation

@sinha7y
Copy link
Collaborator

@sinha7y sinha7y commented Jan 9, 2026

Changes

  1. VLM enrichment for ObjectDB labels: YOLO "person" → "person in white shirt" for precise navigation
  2. Object navigation with rich labels: Agent can navigate to specific objects like "person in white" or "red coffee mug"
  3. Environment awareness: New skill allows agent to list all detected objects when asked "what do you see?"
  4. New blueprint agentic_detection: Enables object detection navigation (can swap to agentic if we want everything in one)
  5. Fix RPC timeouts: Return lightweight dicts instead of full Object3D (120s → 0.2s)
  6. Fix SpatialMemory: Resolve TF timing issues and remove image fetching (120s → 0.15s)
  7. New skills: navigate_to_detected_object() and list_detected_objects()

@sinha7y sinha7y requested a review from a team January 9, 2026 00:57
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds VLM-enriched object detection for precise navigation and fixes performance issues in semantic navigation.

Key Changes

New Navigation Skills:

  • navigate_to_detected_object(): Navigate to objects detected by vision system using ObjectDB lookup
  • list_detected_objects(): Query and list all detected objects for situational awareness

VLM Enrichment:

  • ObjectDB now uses VLM to enrich YOLO labels (e.g., "person" → "person in white shirt")
  • VLM enrichment occurs on object creation and re-enriches every 10 detections
  • Fallback to YOLO labels when VLM fails or is disabled

Performance Fixes:

  • RPC methods now return lightweight dicts instead of full Object3D objects (fixes 120s timeouts)
  • SpatialMemory no longer fetches images from visual memory during queries
  • Commented out verbose logging

Frame ID Changes:

  • Changed TF frame from "map" to "world" across navigation and spatial perception modules

New Blueprint:

  • Added agentic_detection blueprint combining detection, spatial memory, and agent capabilities

Critical Issue

The Object3D initialization never sets yolo_label from detection.name, causing the entire VLM enrichment and object tracking system to fail with None values.

Confidence Score: 1/5

  • Critical bug will cause object detection and navigation to fail completely
  • Object3D.init never initializes yolo_label from detection.name, leaving it as None. This breaks: (1) object tracking matching logic that compares yolo_label, (2) VLM enrichment that uses yolo_label in prompts and fallbacks, (3) object name assignment that falls back to yolo_label. The entire object detection navigation feature will fail.
  • dimos/perception/detection/moduleDB.py - Missing yolo_label initialization in Object3D.init

Important Files Changed

File Analysis

Filename Score Overview
dimos/perception/detection/moduleDB.py 1/5 Critical bug: Object3D.init never initializes yolo_label from detection.name, causing None values throughout VLM enrichment and object tracking logic
dimos/agents/skills/navigation.py 4/5 Added navigate_to_detected_object and list_detected_objects skills. Changed frame_id from "map" to "world" throughout
dimos/perception/spatial_perception.py 4/5 Changed SpatialMemory base class to SkillModule, updated frame_id from "map" to "world", added 1-second sleep in start(), improved error handling
dimos/agents_deprecated/memory/spatial_vector_db.py 5/5 Commented out image fetching and logging to fix RPC timeouts (performance optimization)
dimos/robot/all_blueprints.py 5/5 Added new blueprint "unitree-go2-agentic-detection" to registry
dimos/robot/unitree_webrtc/unitree_go2_blueprints.py 5/5 Added agentic_detection blueprint combining detection, spatial_memory, utilization, llm_agent, and common agentic components

Sequence Diagram

sequenceDiagram
    participant User
    participant Agent
    participant NavigationSkill
    participant ObjectDB
    participant VLM
    participant Navigator

    User->>Agent: "Navigate to person in white shirt"
    Agent->>NavigationSkill: navigate_to_detected_object("person in white shirt")
    NavigationSkill->>ObjectDB: lookup("person in white shirt")
    
    Note over ObjectDB: Search objects by name<br/>(matches VLM-enriched labels)
    
    alt Object found
        ObjectDB-->>NavigationSkill: [{"name": "person in white shirt", "pos_x": 1.2, "pos_y": 3.4, ...}]
        NavigationSkill->>Navigator: set_goal(PoseStamped)
        Navigator-->>NavigationSkill: Navigation complete
        NavigationSkill-->>Agent: "Successfully navigated to 'person in white shirt'"
    else Object not found
        ObjectDB-->>NavigationSkill: []
        NavigationSkill-->>Agent: "No detected object matching..."
    end

    Note over User,Navigator: Parallel: Detection & Enrichment Flow
    
    participant Camera
    participant Detection3D
    
    Camera->>Detection3D: Image + PointCloud
    Detection3D->>ObjectDB: Detection3DPC (name="person")
    
    alt New object
        ObjectDB->>ObjectDB: create_new_object()
        Note over ObjectDB: BUG: yolo_label not set!<br/>Should be: yolo_label = detection.name
        ObjectDB->>VLM: _enrich_with_vlm(obj)
        Note over VLM: Uses obj.yolo_label (None!)<br/>in prompt and fallbacks
        VLM-->>ObjectDB: rich_label
        ObjectDB->>ObjectDB: Store as Object3D<br/>(name=rich_label, yolo_label=None)
    else Existing object
        ObjectDB->>ObjectDB: add_to_object()
        alt Every 10 detections
            ObjectDB->>VLM: Re-enrich
            VLM-->>ObjectDB: Updated rich_label
        end
    end
Loading

Comment on lines 145 to 146
except Exception:
return f"Error querying ObjectDB for '{object_name}'"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] Bare except Exception: swallows the actual error. Consider logging the exception for debugging: except Exception as e: logger.error(f"Error querying ObjectDB: {e}") before returning the error message.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@greptile-apps
Copy link

greptile-apps bot commented Jan 9, 2026

Additional Comments (1)

dimos/perception/detection/moduleDB.py
[P0] yolo_label is never initialized in __init__, causing it to remain None. This breaks object tracking (line 242 matches on yolo_label), VLM enrichment (lines 217, 223, 225, 236 use yolo_label), and object creation (line 296 sets name = yolo_label). Add self.yolo_label = detection.name after line 69.

@sinha7y sinha7y closed this Jan 9, 2026
@sinha7y sinha7y reopened this Jan 9, 2026
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR introduces VLM-enriched object detection for navigation, enabling the robot to navigate to specific objects with rich descriptions (e.g., "person in white shirt" instead of just "person"). Key improvements include:

New Features:

  • VLM enrichment pipeline that enhances YOLO labels with detailed descriptions using Qwen VLM
  • Two new navigation skills: navigate_to_detected_object() for object-based navigation and list_detected_objects() for environment awareness
  • New agentic_detection blueprint combining detection, navigation, spatial memory, and agent capabilities
  • Lightweight RPC methods (lookup, get_all_detected_objects) that return dicts instead of full Object3D instances

Performance Fixes:

  • RPC timeout fix: Returns lightweight dicts reduce response time from 120s to 0.2s
  • SpatialMemory improvements: TF timing fixes and image fetching removal reduce latency from 120s to 0.15s
  • Reduced logging verbosity in spatial memory components

Architecture Changes:

  • Frame reference migration: Changes from "map" frame to "world" frame across navigation and spatial perception
  • SpatialMemory now extends SkillModule instead of Module
  • Object tracking now separates YOLO labels (for matching) from VLM labels (for display)

The changes enable more precise object-based navigation by allowing agents to distinguish between similar objects based on visual characteristics.

Confidence Score: 1/5

  • Critical runtime error will occur when agent_encode() is called
  • The PR contains a P0 blocking bug in moduleDB.py line 311 where len(obj.detections) is called on an integer field. This will cause TypeError at runtime whenever the agent_encode() method is invoked. Additionally, there are P1 type safety issues with potential None returns that violate the function contract. While the VLM enrichment feature and performance improvements are valuable, the critical bug must be fixed before merging.
  • dimos/perception/detection/moduleDB.py requires immediate attention due to TypeError bug in agent_encode() method

Important Files Changed

File Analysis

Filename Score Overview
dimos/agents/skills/navigation.py 3/5 Adds two new navigation skills (navigate_to_detected_object, list_detected_objects) and changes frame references from "map" to "world". Contains typos in error messages.
dimos/perception/detection/moduleDB.py 1/5 Major changes: VLM enrichment for object labels, new RPC methods (lookup, get_all_detected_objects), and lightweight dict returns. Critical bug: len() called on int in agent_encode().
dimos/perception/spatial_perception.py 4/5 Changes SpatialMemory to extend SkillModule, adds 1s startup delay, changes frame references from "map" to "world", reduces logging verbosity. Minor warning message inconsistency.
dimos/agents_deprecated/memory/spatial_vector_db.py 4/5 Comments out logging and image retrieval from visual memory to improve performance.
dimos/robot/unitree_webrtc/unitree_go2_blueprints.py 5/5 Adds new agentic_detection blueprint combining detection with navigation, spatial memory, and agent capabilities.

Sequence Diagram

sequenceDiagram
    participant Agent
    participant NavigationSkill
    participant ObjectDB
    participant VLM as Qwen VLM
    participant Detection as YOLO Detection
    participant Nav as Navigation

    Detection->>ObjectDB: Detection3DPC (YOLO label)
    ObjectDB->>ObjectDB: Check closest_object()
    
    alt New Object
        ObjectDB->>VLM: _enrich_with_vlm(obj)
        VLM-->>ObjectDB: Rich description
        ObjectDB->>ObjectDB: Create Object3D<br/>(yolo_label + vlm_label)
    else Existing Object (every 10 detections)
        ObjectDB->>ObjectDB: Add to existing
        ObjectDB->>VLM: _enrich_with_vlm(obj)
        VLM-->>ObjectDB: Updated description
    end

    Agent->>NavigationSkill: "Navigate to person in white"
    NavigationSkill->>ObjectDB: lookup("person in white")
    ObjectDB-->>NavigationSkill: [dict with pos_x, pos_y, frame_id]
    NavigationSkill->>NavigationSkill: Create PoseStamped
    NavigationSkill->>Nav: _navigate_to(goal_pose)
    Nav-->>NavigationSkill: Success/Failure
    NavigationSkill-->>Agent: "Successfully navigated to..."

    Agent->>NavigationSkill: "What do you see?"
    NavigationSkill->>ObjectDB: get_all_detected_objects()
    ObjectDB-->>NavigationSkill: [list of dicts]
    NavigationSkill-->>Agent: "I've detected N objects..."
Loading

# we need at least 3 detectieons to consider it a valid object
# for this to be serious we need a ratio of detections within the window of observations
if len(obj.detections) < 4: # type: ignore[arg-type]
if len(obj.detections) < 4:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P0] obj.detections is an int, not a list. Calling len() on it will cause TypeError: object of type 'int' has no len() at runtime. Should be if obj.detections < 4: instead.

Suggested change
if len(obj.detections) < 4:
if obj.detections < 4:

Comment on lines 217 to 218
if not self.enable_vlm_enrichment or self.vlm_model is None:
return obj.yolo_label # Fall back to YOLO label
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] Function signature promises str return, but returns obj.yolo_label which is typed as str | None. If yolo_label is None (possible per field definition), this violates the type contract and could cause issues in callers expecting a string.

logger.warning(f"No image for {obj.track_id}, using YOLO label")
return obj.yolo_label

prompt = f"Describe this {obj.yolo_label} in detail. Include color, appearance, and distinguishing features. Keep it concise (under 10 words)."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] If obj.yolo_label is None, the prompt becomes "Describe this None in detail...", which is semantically incorrect. Should handle None case or assert yolo_label is not None before this point.

return

if tf is None:
logger.warning("No TF transform (map -> base_link) available")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P3] Warning message says "(map -> base_link)" but code at line 223 actually requests "world" to "base_link" transform. Message should match the actual frame names.

Suggested change
logger.warning("No TF transform (map -> base_link) available")
logger.warning("No TF transform (world -> base_link) available")

@greptile-apps
Copy link

greptile-apps bot commented Jan 9, 2026

Additional Comments (2)

dimos/agents/skills/navigation.py
[P3] Typo: "Faild" should be "Failed"

            return "Error: Failed to reach the tagged location."

dimos/agents/skills/navigation.py
[P3] Typo: "Successfuly" should be "Successfully"

            f"Successfully arrived at location tagged '{robot_location.name}' from query '{query}'."

return f"Failed to reach '{object_name}'. Navigation was cancelled or failed."

@skill()
def list_detected_objects(self) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in the objectDBModule, otherwise its possible that if you have a blueprint without that module it will fail


# Get objects
try:
get_all_rpc = self.get_rpc_calls("ObjectDBModule.get_all_detected_objects")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah see here this is a hardcoded ObjectDBModule check - not good practice easier to just put this skill in there directly

agentic,
MCPModule.blueprint(),
)
agentic_detection = autoconnect(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good


return "\n".join(lines)

def navigate_with_text(self, query: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def navigate_with_text(self, query: str) -> str:
@skill()
def navigate_with_text(self, query: str) -> str:

…itance

- Add DetectedObjectInfo dataclass for passing object data to navigation (can move to dimos.msgs if needed)
- Improve lookup with hybrid substring/token matching
- Clean up skill logging
@sinha7y
Copy link
Collaborator Author

sinha7y commented Jan 10, 2026

@greptile

@greptile-apps
Copy link

greptile-apps bot commented Jan 10, 2026

Greptile Overview

Greptile Summary

Overview

This PR adds VLM-enriched object detection for precise robot navigation. Objects detected by YOLO are enhanced with rich descriptions from a Vision Language Model (e.g., "person" → "person in white shirt"), enabling navigation to specific objects like "go to the red coffee mug" or "navigate to person in white."

Key Features:

  1. VLM Label Enrichment: ObjectDB automatically enriches YOLO labels using QwenVLM with detailed descriptions including color and appearance
  2. New Navigation Skill: navigate_to_detected_object() allows agents to navigate to detected objects by description
  3. Object Listing: list_detected_objects() skill lets agents report what they see
  4. Performance Fixes: Removed image fetching from RPC responses (120s → 0.2s) and fixed TF timing issues in SpatialMemory
  5. Frame ID Standardization: Changed from "map" to "world" frame_id consistently across navigation and spatial memory

Architecture:

  • ObjectDB enriches detections on creation and re-enriches every 10 detections
  • Navigation skill performs fuzzy label matching with token-based search
  • DetectedObjectInfo dataclass provides lightweight RPC responses instead of full Object3D

Critical Issues Found

🔴 Object Tracking Completely Broken (moduleDB.py)

The Object3D.__add__() method creates uninitialized objects, causing AttributeError when adding detections to existing objects. This breaks the entire object tracking system since objects can't accumulate detections over time.

🔴 Navigation Always Reports Failure (navigation.py)

Type mismatch in navigate_to_detected_object(): compares boolean return value to string "SUCCESS", making all navigation attempts report failure even when successful.

🔴 TypeError in Object Listing (moduleDB.py)

Calling len() on integer obj.detections will crash when agents try to list detected objects.

Performance Improvements

The RPC optimizations are well-executed:

  • Removed image fetching from query results (fixes 120s timeout)
  • Commented out verbose logging in spatial_vector_db
  • Return lightweight DetectedObjectInfo instead of full Object3D
  • Added TF initialization sleep in SpatialMemory to prevent timing issues

Confidence Score: 1/5

  • This PR has multiple critical bugs that will cause runtime failures and break core functionality
  • Score of 1/5 due to three critical logic errors: (1) Object3D.add creates uninitialized objects causing AttributeError on every object update, completely breaking object tracking; (2) Type mismatch in navigate_to_detected_object makes navigation always report failure; (3) len() called on integer will crash object listing. These are not edge cases - they affect the primary code paths and will fail immediately when the new features are used
  • Critical attention required: dimos/perception/detection/moduleDB.py (Object3D initialization) and dimos/agents/skills/navigation.py (type mismatch in navigation result checking)

Important Files Changed

File Analysis

Filename Score Overview
dimos/agents/skills/navigation.py 2/5 Added navigate_to_detected_object() skill with critical type mismatch bug: compares bool return value to string "SUCCESS", causing navigation to always report failure
dimos/perception/detection/moduleDB.py 1/5 Major issues: Object3D.add creates uninitialized objects causing AttributeError; len() called on int; type mismatch in DetectedObjectInfo creation. VLM enrichment and lookup() implementation otherwise sound
dimos/perception/spatial_perception.py 4/5 Changed from "map" to "world" frame_id consistently. Minor: error message still references old "map" frame. Added SkillModule inheritance and improved TF timing with sleep
dimos/agents_deprecated/memory/spatial_vector_db.py 5/5 Performance optimization: commented out image fetching and verbose logging to reduce RPC timeouts. Changes are safe and beneficial
dimos/robot/all_blueprints.py 5/5 Added new blueprint "unitree-go2-agentic-detection" mapping. Simple configuration change with no logic issues
dimos/robot/unitree_webrtc/unitree_go2_blueprints.py 5/5 Added agentic_detection blueprint combining detection + spatial_memory + agent. Clean composition of existing modules with proper configuration

Sequence Diagram

sequenceDiagram
    participant Agent as LLM Agent
    participant NavSkill as NavigationSkillContainer
    participant ObjectDB as ObjectDBModule
    participant VLM as QwenVlModel
    participant Detection as Detection3DModule
    participant Nav as NavigationInterface

    Note over Agent,Nav: VLM-Enriched Object Detection Navigation

    Detection->>ObjectDB: ImageDetections3DPC
    ObjectDB->>ObjectDB: add_detection(detection)
    ObjectDB->>ObjectDB: create_new_object(detection)
    ObjectDB->>VLM: query(image, prompt)
    VLM-->>ObjectDB: rich_label (e.g., "person in white shirt")
    ObjectDB->>ObjectDB: Store object with yolo_label + vlm_label

    Agent->>NavSkill: navigate_to_detected_object("person in white")
    NavSkill->>ObjectDB: lookup("person in white")
    ObjectDB->>ObjectDB: Match objects by name/label
    ObjectDB-->>NavSkill: [DetectedObjectInfo]
    NavSkill->>NavSkill: Create goal_pose from obj.pose
    NavSkill->>Nav: set_goal(goal_pose)
    
    loop Poll navigation state
        NavSkill->>Nav: get_state()
        Nav-->>NavSkill: NavigationState
    end
    
    NavSkill->>Nav: is_goal_reached()
    Nav-->>NavSkill: bool (success/failure)
    NavSkill-->>Agent: "Successfully navigated to 'person in white shirt'"

    Note over Agent,NavSkill: Alternative: List Detected Objects
    Agent->>ObjectDB: list_detected_objects()
    ObjectDB-->>Agent: "I've detected 5 object(s):\n- person in white shirt (12 detections)\n- red coffee mug (8 detections)"
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile


result = self._navigate_to(goal_pose)

if result == "SUCCESS":
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch: _navigate_to() returns bool (True/False), but this code compares it to the string "SUCCESS". This condition will always be False, causing navigation to always report failure even when successful.

The method should check if result: instead of if result == "SUCCESS":.

Suggested change
if result == "SUCCESS":
if result:

# we need at least 3 detectieons to consider it a valid object
# for this to be serious we need a ratio of detections within the window of observations
if len(obj.detections) < 4: # type: ignore[arg-type]
if len(obj.detections) < 4:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obj.detections is an int, not a collection, so calling len() on it will raise TypeError: object of type 'int' has no len().

Suggested change
if len(obj.detections) < 4:
if obj.detections < 4:

"""Convert Object3D to DetectedObjectInfo."""
try:
return DetectedObjectInfo(
track_id=obj.track_id,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DetectedObjectInfo expects track_id: str (non-optional), but obj.track_id can be None (type is str | None from line 56). If track_id is None, this will cause a type error when creating the dataclass instance.

While the exception handler at line 366 will catch this and return None, objects with None track_ids should be filtered out before calling _to_detection_info() to avoid unnecessary exceptions.

return

if tf is None:
logger.warning("No TF transform (map -> base_link) available")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message refers to "map -> base_link" but the code at line 223 actually queries "world -> base_link". The error message should be updated to match the actual frame IDs being used.

Suggested change
logger.warning("No TF transform (map -> base_link) available")
logger.warning("No TF transform (world -> base_link) available")

@greptile-apps
Copy link

greptile-apps bot commented Jan 10, 2026

Additional Comments (1)

dimos/perception/detection/moduleDB.py
Critical initialization bug in __add__ method: Creating Object3D(self.track_id) with detection=None causes __init__ to return early at line 78, leaving the new object completely uninitialized. All subsequent attribute assignments (lines 97-116) will fail because the object's attributes were never created.

This will cause AttributeError exceptions when trying to add detections to existing objects, breaking the entire object tracking system.

The __add__ method needs to either:

  1. Call super().__init__() to properly initialize the parent class, or
  2. Manually initialize all required attributes before the early return in __init__, or
  3. Restructure the initialization to avoid the early return pattern

if not self._skill_started:
raise ValueError(f"{self} has not been started.")
tf = self.tf.get("map", "base_link", time_tolerance=2.0)
tf = self.tf.get("world", "base_link", time_tolerance=2.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a breaking change that i thinik was never tested or caught

semantic map for a location matching the description.
CALL THIS SKILL FOR ONE SUBJECT AT A TIME. For example: "Go to the person wearing a blue shirt in the living room",
you should call this skill twice, once for the person wearing a blue shirt and once for the living room.
def navigate_to_detected_object(self, object_name: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object name cant be a string need type

if not self._skill_started:
raise ValueError(f"{self} has not been started.")

lookup_rpc = self.get_rpc_calls("ObjectDBModule.lookup")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paul-nechifor we need to fix this rpc shit with modules super weird and hard coded


goal_pose = PoseStamped(
position=obj.pose.position,
orientation=Quaternion(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets calculate goal_pose via a seperate method. This is goal point which is just position. Goal_pose requires /global_map and /odom and the /goal_point and it picks the closest point within some buffer due to the size of the robot that is not in collision and then points the right way using trig.

if success_msg:
return success_msg

logger.info(f"No tagged location found for {query}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

goal_pose = PoseStamped(
position=make_vector3(*robot_location.position),
orientation=Quaternion.from_euler(Vector3(*robot_location.rotation)),
frame_id="map",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug fix from frame thing

)

logger.info(f"Added image vector {vector_id} with metadata: {metadata}")
#logger.info(f"Added image vector {vector_id} with metadata: {metadata}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

logger = setup_logger()

@dataclass
class DetectedObjectInfo:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use detection2d type or extend (maybe but prob not) with a super minimal thing

self._vlm_model = QwenVlModel()
return self._vlm_model

def _enrich_with_vlm(self, obj: Object3D) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cant be here. Needs to be in the VLM stuff in models/vl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read base.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants