[CI]【Hackathon 9th Sprint Example NO 16】功能模块 fastdeploy/input/ernie4_5_vl_processor/process.py 单测补充 #5264

kesmeey · 2025-11-27T03:50:35Z

Motivation

NO.16功能模块 fastdeploy/input/ernie4_5_vl_processor/process.py 单测补充

develop 分支：覆盖率47%，Miss行数206（47-70, 169, 189-246, 249-294, 303-336, 371, 383, 395, 407, 413, 428, 430, 433-443, 448-453, 456-464, 497, 549, 587-602, 605-632, 639-671, 696->700, 697->696, 706-712, 718-720）

当前pr覆盖率94%，Miss行数19行

完成单测覆盖行数206-19=187

Modifications

add unittest tests/input/test_ernie4_5_vl_processor_process.py

Usage or Command

no need

Accuracy Tests

no need

Checklist

Add at least a tag in the PR title.
Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
[x ] If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-11-27T03:50:42Z

Thanks for your contribution!

Copilot

Pull request overview

本PR为FastDeploy项目中的fastdeploy/input/ernie4_5_vl_processor/process.py模块补充了全面的单元测试，这是Hackathon 9th Sprint的第16号任务。测试覆盖了DataProcessor类的主要功能，包括文本处理、图像处理、视频处理和多模态输入处理。

Key Changes

新增了约1858行的综合单元测试，涵盖DataProcessor类的核心功能
测试包括正常场景、边界情况和错误处理
使用了Mock对象来隔离依赖，确保测试的独立性

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Copilot · 2025-11-27T03:55:53Z

tests/input/test_process.py

+                    with patch.object(self.processor, "update_processor_cache") as mock_update_cache:
+                        outputs = self.processor.request2ids(request)
+                        self.assertIn("input_ids", outputs)
+                        # Should call update_processor_cache, as img1 is not in missing_idx


[nitpick] Comment in Chinese. Please translate to English:

# Should call update_processor_cache, as img1 is not in missing_idx

Copilot · 2025-11-27T03:55:53Z

tests/input/test_process.py

+    def test_text2ids_with_video_placeholder(self):
+        """Test text conversion with video placeholder"""
+        text = "Hello <|video@placeholder|> world"
+        mock_frames = [Image.new("RGB", (224, 224)) for _ in range(4)]
+        with patch("fastdeploy.input.ernie4_5_vl_processor.process.read_video_decord") as mock_read:
+            mock_read.return_value = (None, {"duration": 2.0}, "test_path")
+            with patch("fastdeploy.input.ernie4_5_vl_processor.process.read_frames_decord") as mock_frames_read:
+                mock_frames_read.return_value = (
+                    [np.array(f) for f in mock_frames],
+                    None,
+                    [0.0, 0.5, 1.0, 1.5],
+                )
+                with patch("fastdeploy.input.ernie4_5_vl_processor.process.render_frame_timestamp") as mock_render:
+                    mock_render.side_effect = lambda img, ts: (
+                        Image.fromarray(img) if isinstance(img, np.ndarray) else img
+                    )
+                    # Mock preprocess to return correct keys
+                    self.mock_image_preprocessor.preprocess.return_value = {
+                        "pixel_values_videos": np.random.rand(4, 256, 3 * 14 * 14).astype(np.float32),
+                        "video_grid_thw": np.array([[4, 16, 16]]),
+                    }
+                    outputs = self.processor.text2ids(text, videos=["test_video.mp4"])
+                    self.assertGreater(len(outputs["input_ids"]), 0)
+


This test contains significant code duplication with other video tests. The pattern of patching read_video_decord, read_frames_decord, and render_frame_timestamp is repeated many times throughout the test file.

Consider extracting this into a helper method or fixture to improve maintainability:

@contextmanager def mock_video_processing(self, num_frames=4, duration=2.0): """Context manager to mock video processing with standard defaults""" mock_frames = [Image.new("RGB", (224, 224)) for _ in range(num_frames)] with patch("fastdeploy.input.ernie4_5_vl_processor.process.read_video_decord") as mock_read, \ patch("fastdeploy.input.ernie4_5_vl_processor.process.read_frames_decord") as mock_frames_read, \ patch("fastdeploy.input.ernie4_5_vl_processor.process.render_frame_timestamp") as mock_render: mock_read.return_value = (None, {"duration": duration}, "test_path") mock_frames_read.return_value = ( [np.array(f) for f in mock_frames], None, [i * duration / num_frames for i in range(num_frames)], ) mock_render.side_effect = lambda img, ts: ( Image.fromarray(img) if isinstance(img, np.ndarray) else img ) yield

This would reduce the ~300+ lines of repeated mocking code.

Copilot · 2025-11-27T03:55:53Z

tests/input/test_process.py

+        self.processor._add_video(frames, outputs, None)
+        self.assertGreater(len(outputs["input_ids"]), 0)
+        self.assertGreater(len(outputs["images"]), 0)
+        # image_type_ids is extended with [1] * num_frames for each frame


[nitpick] Comment in Chinese. Please translate to English:

# image_type_ids is extended with [1] * num_frames for each frame

Copilot · 2025-11-27T03:55:53Z

tests/input/test_process.py

+    def test_set_video_frame_args_target_frames_positive_fps_error(self):
+        """Test _set_video_frame_args with target_frames > 0 and fps >= 0 (line 514)"""
+        video_meta = {"duration": 10.0}
+        video_frame_args = {
+            "target_frames": 10,
+            "fps": 2,  # Positive fps should raise error
+            "min_frames": 1,
+            "max_frames": 100,
+        }
+        with self.assertRaises(ValueError) as context:
+            self.processor._set_video_frame_args(video_frame_args, video_meta)
+        self.assertIn("fps must be negative", str(context.exception))
+
+    def test_set_video_frame_args_target_frames_below_min(self):
+        """Test _set_video_frame_args with target_frames < min_frames (line 519)"""
+        video_meta = {"duration": 10.0}
+        video_frame_args = {
+            "target_frames": 5,
+            "fps": -1,
+            "min_frames": 10,
+            "max_frames": 100,
+        }
+        with self.assertRaises(ValueError) as context:
+            self.processor._set_video_frame_args(video_frame_args, video_meta)
+        self.assertIn("target_frames must be larger", str(context.exception))
+
+    def test_set_video_frame_args_target_frames_above_max(self):
+        """Test _set_video_frame_args with target_frames > max_frames (line 523)"""
+        video_meta = {"duration": 10.0}
+        video_frame_args = {
+            "target_frames": 200,
+            "fps": -1,
+            "min_frames": 1,
+            "max_frames": 100,
+        }
+        with self.assertRaises(ValueError) as context:
+            self.processor._set_video_frame_args(video_frame_args, video_meta)
+        self.assertIn("target_frames must be smaller", str(context.exception))
+
+    def test_set_video_frame_args_fps_negative_without_target_frames(self):
+        """Test _set_video_frame_args with fps < 0 and target_frames <= 0 (line 527)"""
+        video_meta = {"duration": 10.0}
+        video_frame_args = {
+            "target_frames": -1,
+            "fps": -1,
+            "min_frames": 1,
+            "max_frames": 100,
+        }
+        with self.assertRaises(ValueError) as context:
+            self.processor._set_video_frame_args(video_frame_args, video_meta)
+        self.assertIn("Must provide either positive target_fps", str(context.exception))
+
+    def test_set_video_frame_args_min_max_invalid(self):
+        """Test _set_video_frame_args with min_frames > max_frames (line 535)"""
+        video_meta = {"duration": 10.0}
+        video_frame_args = {
+            "target_frames": -1,
+            "fps": 2,
+            "min_frames": 100,
+            "max_frames": 10,
+        }
+        with self.assertRaises(ValueError) as context:
+            self.processor._set_video_frame_args(video_frame_args, video_meta)
+        self.assertIn("min_frames must be smaller", str(context.exception))
+
+    def test_set_video_frame_args_frames_too_few_adjustment(self):
+        """Test _set_video_frame_args when frames_to_extract < min_frames (line 538)"""
+        video_meta = {"duration": 1.0}  # Short duration
+        video_frame_args = {
+            "target_frames": -1,
+            "fps": 1,  # Will extract only 1 frame
+            "min_frames": 10,  # But min is 10
+            "max_frames": 100,
+        }
+        result = self.processor._set_video_frame_args(video_frame_args, video_meta)
+        self.assertEqual(result["target_frames"], 10)
+        self.assertEqual(result["fps"], -1)
+
+    def test_set_video_frame_args_frames_too_many_adjustment(self):
+        """Test _set_video_frame_args when frames_to_extract > max_frames (line 541)"""
+        video_meta = {"duration": 100.0}  # Long duration
+        video_frame_args = {
+            "target_frames": -1,
+            "fps": 10,  # Will extract 1000 frames
+            "min_frames": 1,
+            "max_frames": 100,  # But max is 100
+        }
+        result = self.processor._set_video_frame_args(video_frame_args, video_meta)
+        self.assertEqual(result["target_frames"], 100)
+        self.assertEqual(result["fps"], -1)
+


The test file structure shows significant duplication in test patterns. Multiple tests follow nearly identical patterns with only minor variations. Consider using parameterized tests to reduce duplication and improve maintainability.

For example, the video frame argument tests (test_set_video_frame_args_*) could be consolidated:

@parameterized.expand([ ("target_frames_with_positive_fps", {"target_frames": 10, "fps": 2, ...}, ValueError, "fps must be negative"), ("target_frames_below_min", {"target_frames": 5, "fps": -1, ...}, ValueError, "must be larger"), ("target_frames_above_max", {"target_frames": 200, "fps": -1, ...}, ValueError, "must be smaller"), # ... more cases ]) def test_set_video_frame_args_validation(self, name, args, expected_error, expected_msg): """Test video frame argument validation with various error conditions""" with self.assertRaises(expected_error) as context: self.processor._set_video_frame_args(args, {"duration": 10.0}) self.assertIn(expected_msg, str(context.exception))

This would reduce ~200 lines of similar test code while improving clarity.

Copilot · 2025-11-27T03:55:54Z

tests/input/test_process.py

+                    self.assertIn("not found in processor cache", str(context.exception))
+
+    def test_request2ids_video_type(self):
+        """Test request2ids handling video type (lines 293-295)"""


Test docstring references specific line numbers (e.g., "lines 293-295"). This creates maintenance burden as these line numbers become outdated when source code changes. Use descriptive test names without line references:

def test_request2ids_video_type(self): """Test request2ids properly handles video content type in messages"""

This pattern appears in many tests throughout the file and should be corrected consistently.

Suggested change

"""Test request2ids handling video type (lines 293-295)"""

"""Test that request2ids properly handles messages containing video content type."""

Copilot · 2025-11-27T03:55:56Z

tests/input/test_process.py

+# Note: This test requires dependencies like paddleformers, paddle, etc.
+# In CI environment, these should be available.
+# For local testing without dependencies, you may need to install them or use CI.


[nitpick] This comment mentions "paddleformers" but should be "paddlenformers" or similar package name based on the project structure. However, more importantly, this comment is too specific about implementation details. Consider making it more general:

# Note: This test requires dependencies like paddleformers, paddle, etc. # Ensure these are available in the test environment.

Suggested change

# Note: This test requires dependencies like paddleformers, paddle, etc.

# In CI environment, these should be available.

# For local testing without dependencies, you may need to install them or use CI.

# Note: This test requires all necessary dependencies to be installed in the test environment.

Copilot · 2025-11-27T03:55:57Z

tests/input/test_process.py

+            "cur_position": 0,
+        }
+        mock_frames = [Image.new("RGB", (224, 224)) for _ in range(4)]
+        # Mock preprocess 返回正确的键


[nitpick] Comment in Chinese mixed with English code. While the code works correctly, for consistency and maintainability, consider translating the comment to English:

# Mock preprocess to return correct keys

This keeps the codebase language consistent.

Suggested change

# Mock preprocess 返回正确的键

# Mock preprocess to return correct keys

Copilot · 2025-11-27T03:55:57Z

tests/input/test_process.py

+                        ],
+                    }
+                ]
+                # Mock get_processor_cache, as request2ids will check missing_hashes


[nitpick] Comment in Chinese. For consistency with the rest of the English codebase, please translate:

# Mock get_processor_cache, as request2ids will check missing_hashes

Copilot · 2025-11-27T03:55:57Z

tests/input/test_process.py

+        pos_ids = self.processor._compute_3d_positions(t=2, h=32, w=32, start_idx=100)
+        gh = 32 // self.processor.spatial_conv_size
+        gw = 32 // self.processor.spatial_conv_size
+        t_eff = 2 // self.processor.temporal_conv_size if 2 != 1 else 1


Comparison of constants; use 'True' or 'False' instead.

Suggested change

t_eff = 2 // self.processor.temporal_conv_size if 2 != 1 else 1

t_eff = t // self.processor.temporal_conv_size if t != 1 else 1

Copilot · 2025-11-27T03:55:58Z

tests/input/test_process.py

+        pos_ids = self.processor._compute_3d_positions(t=2, h=32, w=32, start_idx=100)
+        gh = 32 // self.processor.spatial_conv_size
+        gw = 32 // self.processor.spatial_conv_size
+        t_eff = 2 // self.processor.temporal_conv_size if 2 != 1 else 1


Testing a constant will always give the same result.

Suggested change

t_eff = 2 // self.processor.temporal_conv_size if 2 != 1 else 1

t_eff = 2 // self.processor.temporal_conv_size

codecov-commenter · 2025-11-27T09:03:35Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@620d1da). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5264   +/-   ##
==========================================
  Coverage           ?   59.74%           
==========================================
  Files              ?      325           
  Lines              ?    40220           
  Branches           ?     6089           
==========================================
  Hits               ?    24031           
  Misses             ?    14354           
  Partials           ?     1835

Flag	Coverage Δ
GPU	`59.74% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kesmeey · 2025-11-27T11:36:17Z

当前ci中运行失败的测试用例并未本pr提交的测试用例

kesmeey · 2025-12-01T06:49:13Z

@luotao1 涛姐这样算过吗

luotao1 · 2025-12-01T06:54:44Z

CI过了后，不要频繁点merge develop，请耐心等待review即可。

CSWYF3634076 · 2025-12-04T08:27:35Z

感谢贡献！
1.需要精简一下代码，可以对miss的行数针对性增加单测
2.增加和develop分支的miss行数对比，方便评⭐️

kesmeey · 2025-12-04T12:23:58Z

感谢贡献！ 1.需要精简一下代码，可以对miss的行数针对性增加单测 2.增加和develop分支的miss行数对比，方便评⭐️

已完成修改

CSWYF3634076 · 2025-12-05T08:17:38Z

补充些review
1.多个text2ids是否可以合并，或者直接干掉，因为request2ids里面是调用了text2ids的
2.extract_mm_items有重复的调用，精简重复的
3.TestDataProcessor和TestDataProcessorTargetMethods有重复的test方法，精简重复的，或者这两个类是否可以合并成一个
4.整体刷一下看是否有重复覆盖

kesmeey · 2025-12-05T09:59:31Z

补充些review 1.多个text2ids是否可以合并，或者直接干掉，因为request2ids里面是调用了text2ids的 2.extract_mm_items有重复的调用，精简重复的 3.TestDataProcessor和TestDataProcessorTargetMethods有重复的test方法，精简重复的，或者这两个类是否可以合并成一个 4.整体刷一下看是否有重复覆盖

已经精简了一版

test: add unit tests for process.py (NO.16)

837d0c2

Copilot AI review requested due to automatic review settings November 27, 2025 03:50

paddle-bot bot added the contributor External developers label Nov 27, 2025

Copilot started reviewing on behalf of kesmeey November 27, 2025 03:50 View session

Copilot finished reviewing on behalf of kesmeey November 27, 2025 03:53

Copilot AI reviewed Nov 27, 2025

View reviewed changes

luotao1 added the PaddlePaddle Hackathon label Nov 27, 2025

luotao1 assigned luotao1 and YuanRisheng Nov 27, 2025

kesmeey added 2 commits November 28, 2025 10:13

Merge branch 'develop' into test/no16-process

0004557

Merge branch 'develop' into test/no16-process

5c3fc96

CSWYF3634076 self-assigned this Dec 4, 2025

kesmeey added 2 commits December 4, 2025 19:14

update

e5fd7e0

update filename

0f859f2

kesmeey and others added 5 commits December 4, 2025 20:31

update filename

115308d

Merge branch 'develop' into test/no16-process

f394704

update

afd87d5

update

f8e6347

fix failed testcases

8fa368e

kesmeey force-pushed the test/no16-process branch from 68a3c77 to 69f919e Compare December 5, 2025 09:56

simplify the code

26ca5a8

kesmeey force-pushed the test/no16-process branch from 69f919e to 26ca5a8 Compare December 5, 2025 16:27

	"""Test request2ids handling video type (lines 293-295)"""
	"""Test that request2ids properly handles messages containing video content type."""

	# Mock preprocess 返回正确的键
	# Mock preprocess to return correct keys

	t_eff = 2 // self.processor.temporal_conv_size if 2 != 1 else 1
	t_eff = t // self.processor.temporal_conv_size if t != 1 else 1

[CI]【Hackathon 9th Sprint Example NO 16】功能模块 fastdeploy/input/ernie4_5_vl_processor/process.py 单测补充 #5264

Are you sure you want to change the base?

[CI]【Hackathon 9th Sprint Example NO 16】功能模块 fastdeploy/input/ernie4_5_vl_processor/process.py 单测补充 #5264

Uh oh!

Conversation

kesmeey commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Nov 27, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kesmeey commented Nov 27, 2025

Uh oh!

kesmeey commented Dec 1, 2025

Uh oh!

luotao1 commented Dec 1, 2025

Uh oh!

CSWYF3634076 commented Dec 4, 2025

Uh oh!

kesmeey commented Dec 4, 2025

Uh oh!

CSWYF3634076 commented Dec 5, 2025

Uh oh!

kesmeey commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kesmeey commented Nov 27, 2025 •

edited

Loading

codecov-commenter commented Nov 27, 2025 •

edited

Loading