Skip to content

Conversation

@kesmeey
Copy link

@kesmeey kesmeey commented Nov 27, 2025

Motivation

NO.16功能模块 fastdeploy/input/ernie4_5_vl_processor/process.py 单测补充

image

develop 分支:覆盖率47%,Miss行数206(47-70, 169, 189-246, 249-294, 303-336, 371, 383, 395, 407, 413, 428, 430, 433-443, 448-453, 456-464, 497, 549, 587-602, 605-632, 639-671, 696->700, 697->696, 706-712, 718-720)

image

当前pr覆盖率94%,Miss行数19行

完成单测覆盖行数206-19=187

Modifications

add unittest tests/input/test_ernie4_5_vl_processor_process.py

Usage or Command

no need

Accuracy Tests

no need

Checklist

  • Add at least a tag in the PR title.
  • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
  • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • [x ] If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot AI review requested due to automatic review settings November 27, 2025 03:50
@paddle-bot
Copy link

paddle-bot bot commented Nov 27, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Nov 27, 2025
Copilot finished reviewing on behalf of kesmeey November 27, 2025 03:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

本PR为FastDeploy项目中的fastdeploy/input/ernie4_5_vl_processor/process.py模块补充了全面的单元测试,这是Hackathon 9th Sprint的第16号任务。测试覆盖了DataProcessor类的主要功能,包括文本处理、图像处理、视频处理和多模态输入处理。

Key Changes

  • 新增了约1858行的综合单元测试,涵盖DataProcessor类的核心功能
  • 测试包括正常场景、边界情况和错误处理
  • 使用了Mock对象来隔离依赖,确保测试的独立性

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

with patch.object(self.processor, "update_processor_cache") as mock_update_cache:
outputs = self.processor.request2ids(request)
self.assertIn("input_ids", outputs)
# Should call update_processor_cache, as img1 is not in missing_idx
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Comment in Chinese. Please translate to English:

# Should call update_processor_cache, as img1 is not in missing_idx

Copilot uses AI. Check for mistakes.
Comment on lines 313 to 336
def test_text2ids_with_video_placeholder(self):
"""Test text conversion with video placeholder"""
text = "Hello <|video@placeholder|> world"
mock_frames = [Image.new("RGB", (224, 224)) for _ in range(4)]
with patch("fastdeploy.input.ernie4_5_vl_processor.process.read_video_decord") as mock_read:
mock_read.return_value = (None, {"duration": 2.0}, "test_path")
with patch("fastdeploy.input.ernie4_5_vl_processor.process.read_frames_decord") as mock_frames_read:
mock_frames_read.return_value = (
[np.array(f) for f in mock_frames],
None,
[0.0, 0.5, 1.0, 1.5],
)
with patch("fastdeploy.input.ernie4_5_vl_processor.process.render_frame_timestamp") as mock_render:
mock_render.side_effect = lambda img, ts: (
Image.fromarray(img) if isinstance(img, np.ndarray) else img
)
# Mock preprocess to return correct keys
self.mock_image_preprocessor.preprocess.return_value = {
"pixel_values_videos": np.random.rand(4, 256, 3 * 14 * 14).astype(np.float32),
"video_grid_thw": np.array([[4, 16, 16]]),
}
outputs = self.processor.text2ids(text, videos=["test_video.mp4"])
self.assertGreater(len(outputs["input_ids"]), 0)

Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test contains significant code duplication with other video tests. The pattern of patching read_video_decord, read_frames_decord, and render_frame_timestamp is repeated many times throughout the test file.

Consider extracting this into a helper method or fixture to improve maintainability:

@contextmanager
def mock_video_processing(self, num_frames=4, duration=2.0):
    """Context manager to mock video processing with standard defaults"""
    mock_frames = [Image.new("RGB", (224, 224)) for _ in range(num_frames)]
    with patch("fastdeploy.input.ernie4_5_vl_processor.process.read_video_decord") as mock_read, \
         patch("fastdeploy.input.ernie4_5_vl_processor.process.read_frames_decord") as mock_frames_read, \
         patch("fastdeploy.input.ernie4_5_vl_processor.process.render_frame_timestamp") as mock_render:
        
        mock_read.return_value = (None, {"duration": duration}, "test_path")
        mock_frames_read.return_value = (
            [np.array(f) for f in mock_frames],
            None,
            [i * duration / num_frames for i in range(num_frames)],
        )
        mock_render.side_effect = lambda img, ts: (
            Image.fromarray(img) if isinstance(img, np.ndarray) else img
        )
        yield

This would reduce the ~300+ lines of repeated mocking code.

Copilot uses AI. Check for mistakes.
self.processor._add_video(frames, outputs, None)
self.assertGreater(len(outputs["input_ids"]), 0)
self.assertGreater(len(outputs["images"]), 0)
# image_type_ids is extended with [1] * num_frames for each frame
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Comment in Chinese. Please translate to English:

# image_type_ids is extended with [1] * num_frames for each frame

Copilot uses AI. Check for mistakes.
Comment on lines 1451 to 1541
def test_set_video_frame_args_target_frames_positive_fps_error(self):
"""Test _set_video_frame_args with target_frames > 0 and fps >= 0 (line 514)"""
video_meta = {"duration": 10.0}
video_frame_args = {
"target_frames": 10,
"fps": 2, # Positive fps should raise error
"min_frames": 1,
"max_frames": 100,
}
with self.assertRaises(ValueError) as context:
self.processor._set_video_frame_args(video_frame_args, video_meta)
self.assertIn("fps must be negative", str(context.exception))

def test_set_video_frame_args_target_frames_below_min(self):
"""Test _set_video_frame_args with target_frames < min_frames (line 519)"""
video_meta = {"duration": 10.0}
video_frame_args = {
"target_frames": 5,
"fps": -1,
"min_frames": 10,
"max_frames": 100,
}
with self.assertRaises(ValueError) as context:
self.processor._set_video_frame_args(video_frame_args, video_meta)
self.assertIn("target_frames must be larger", str(context.exception))

def test_set_video_frame_args_target_frames_above_max(self):
"""Test _set_video_frame_args with target_frames > max_frames (line 523)"""
video_meta = {"duration": 10.0}
video_frame_args = {
"target_frames": 200,
"fps": -1,
"min_frames": 1,
"max_frames": 100,
}
with self.assertRaises(ValueError) as context:
self.processor._set_video_frame_args(video_frame_args, video_meta)
self.assertIn("target_frames must be smaller", str(context.exception))

def test_set_video_frame_args_fps_negative_without_target_frames(self):
"""Test _set_video_frame_args with fps < 0 and target_frames <= 0 (line 527)"""
video_meta = {"duration": 10.0}
video_frame_args = {
"target_frames": -1,
"fps": -1,
"min_frames": 1,
"max_frames": 100,
}
with self.assertRaises(ValueError) as context:
self.processor._set_video_frame_args(video_frame_args, video_meta)
self.assertIn("Must provide either positive target_fps", str(context.exception))

def test_set_video_frame_args_min_max_invalid(self):
"""Test _set_video_frame_args with min_frames > max_frames (line 535)"""
video_meta = {"duration": 10.0}
video_frame_args = {
"target_frames": -1,
"fps": 2,
"min_frames": 100,
"max_frames": 10,
}
with self.assertRaises(ValueError) as context:
self.processor._set_video_frame_args(video_frame_args, video_meta)
self.assertIn("min_frames must be smaller", str(context.exception))

def test_set_video_frame_args_frames_too_few_adjustment(self):
"""Test _set_video_frame_args when frames_to_extract < min_frames (line 538)"""
video_meta = {"duration": 1.0} # Short duration
video_frame_args = {
"target_frames": -1,
"fps": 1, # Will extract only 1 frame
"min_frames": 10, # But min is 10
"max_frames": 100,
}
result = self.processor._set_video_frame_args(video_frame_args, video_meta)
self.assertEqual(result["target_frames"], 10)
self.assertEqual(result["fps"], -1)

def test_set_video_frame_args_frames_too_many_adjustment(self):
"""Test _set_video_frame_args when frames_to_extract > max_frames (line 541)"""
video_meta = {"duration": 100.0} # Long duration
video_frame_args = {
"target_frames": -1,
"fps": 10, # Will extract 1000 frames
"min_frames": 1,
"max_frames": 100, # But max is 100
}
result = self.processor._set_video_frame_args(video_frame_args, video_meta)
self.assertEqual(result["target_frames"], 100)
self.assertEqual(result["fps"], -1)

Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test file structure shows significant duplication in test patterns. Multiple tests follow nearly identical patterns with only minor variations. Consider using parameterized tests to reduce duplication and improve maintainability.

For example, the video frame argument tests (test_set_video_frame_args_*) could be consolidated:

@parameterized.expand([
    ("target_frames_with_positive_fps", {"target_frames": 10, "fps": 2, ...}, ValueError, "fps must be negative"),
    ("target_frames_below_min", {"target_frames": 5, "fps": -1, ...}, ValueError, "must be larger"),
    ("target_frames_above_max", {"target_frames": 200, "fps": -1, ...}, ValueError, "must be smaller"),
    # ... more cases
])
def test_set_video_frame_args_validation(self, name, args, expected_error, expected_msg):
    """Test video frame argument validation with various error conditions"""
    with self.assertRaises(expected_error) as context:
        self.processor._set_video_frame_args(args, {"duration": 10.0})
    self.assertIn(expected_msg, str(context.exception))

This would reduce ~200 lines of similar test code while improving clarity.

Copilot uses AI. Check for mistakes.
self.assertIn("not found in processor cache", str(context.exception))

def test_request2ids_video_type(self):
"""Test request2ids handling video type (lines 293-295)"""
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test docstring references specific line numbers (e.g., "lines 293-295"). This creates maintenance burden as these line numbers become outdated when source code changes. Use descriptive test names without line references:

def test_request2ids_video_type(self):
    """Test request2ids properly handles video content type in messages"""

This pattern appears in many tests throughout the file and should be corrected consistently.

Suggested change
"""Test request2ids handling video type (lines 293-295)"""
"""Test that request2ids properly handles messages containing video content type."""

Copilot uses AI. Check for mistakes.
Comment on lines 26 to 28
# Note: This test requires dependencies like paddleformers, paddle, etc.
# In CI environment, these should be available.
# For local testing without dependencies, you may need to install them or use CI.
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This comment mentions "paddleformers" but should be "paddlenformers" or similar package name based on the project structure. However, more importantly, this comment is too specific about implementation details. Consider making it more general:

# Note: This test requires dependencies like paddleformers, paddle, etc.
# Ensure these are available in the test environment.
Suggested change
# Note: This test requires dependencies like paddleformers, paddle, etc.
# In CI environment, these should be available.
# For local testing without dependencies, you may need to install them or use CI.
# Note: This test requires all necessary dependencies to be installed in the test environment.

Copilot uses AI. Check for mistakes.
"cur_position": 0,
}
mock_frames = [Image.new("RGB", (224, 224)) for _ in range(4)]
# Mock preprocess 返回正确的键
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Comment in Chinese mixed with English code. While the code works correctly, for consistency and maintainability, consider translating the comment to English:

# Mock preprocess to return correct keys

This keeps the codebase language consistent.

Suggested change
# Mock preprocess 返回正确的键
# Mock preprocess to return correct keys

Copilot uses AI. Check for mistakes.
],
}
]
# Mock get_processor_cache, as request2ids will check missing_hashes
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Comment in Chinese. For consistency with the rest of the English codebase, please translate:

# Mock get_processor_cache, as request2ids will check missing_hashes

Copilot uses AI. Check for mistakes.
pos_ids = self.processor._compute_3d_positions(t=2, h=32, w=32, start_idx=100)
gh = 32 // self.processor.spatial_conv_size
gw = 32 // self.processor.spatial_conv_size
t_eff = 2 // self.processor.temporal_conv_size if 2 != 1 else 1
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison of constants; use 'True' or 'False' instead.

Suggested change
t_eff = 2 // self.processor.temporal_conv_size if 2 != 1 else 1
t_eff = t // self.processor.temporal_conv_size if t != 1 else 1

Copilot uses AI. Check for mistakes.
pos_ids = self.processor._compute_3d_positions(t=2, h=32, w=32, start_idx=100)
gh = 32 // self.processor.spatial_conv_size
gw = 32 // self.processor.spatial_conv_size
t_eff = 2 // self.processor.temporal_conv_size if 2 != 1 else 1
Copy link

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing a constant will always give the same result.

Suggested change
t_eff = 2 // self.processor.temporal_conv_size if 2 != 1 else 1
t_eff = 2 // self.processor.temporal_conv_size

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link

codecov-commenter commented Nov 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@620d1da). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5264   +/-   ##
==========================================
  Coverage           ?   59.74%           
==========================================
  Files              ?      325           
  Lines              ?    40220           
  Branches           ?     6089           
==========================================
  Hits               ?    24031           
  Misses             ?    14354           
  Partials           ?     1835           
Flag Coverage Δ
GPU 59.74% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kesmeey
Copy link
Author

kesmeey commented Nov 27, 2025

image 当前ci中运行失败的测试用例并未本pr提交的测试用例

@kesmeey
Copy link
Author

kesmeey commented Dec 1, 2025

@luotao1 涛姐这样算过吗

@luotao1
Copy link
Collaborator

luotao1 commented Dec 1, 2025

CI过了后,不要频繁点merge develop,请耐心等待review即可。

@CSWYF3634076
Copy link
Collaborator

感谢贡献!
1.需要精简一下代码,可以对miss的行数针对性增加单测
2.增加和develop分支的miss行数对比,方便评⭐️

@CSWYF3634076 CSWYF3634076 self-assigned this Dec 4, 2025
@kesmeey
Copy link
Author

kesmeey commented Dec 4, 2025

感谢贡献! 1.需要精简一下代码,可以对miss的行数针对性增加单测 2.增加和develop分支的miss行数对比,方便评⭐️

已完成修改

@CSWYF3634076
Copy link
Collaborator

补充些review
1.多个text2ids是否可以合并,或者直接干掉,因为request2ids里面是调用了text2ids的
2.extract_mm_items有重复的调用,精简重复的
3.TestDataProcessor和TestDataProcessorTargetMethods有重复的test方法,精简重复的,或者这两个类是否可以合并成一个
4.整体刷一下看是否有重复覆盖

@kesmeey
Copy link
Author

kesmeey commented Dec 5, 2025

补充些review 1.多个text2ids是否可以合并,或者直接干掉,因为request2ids里面是调用了text2ids的 2.extract_mm_items有重复的调用,精简重复的 3.TestDataProcessor和TestDataProcessorTargetMethods有重复的test方法,精简重复的,或者这两个类是否可以合并成一个 4.整体刷一下看是否有重复覆盖

已经精简了一版

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants