- 
                Notifications
    
You must be signed in to change notification settings  - Fork 2.8k
 
[GPU] Fix access dimension error in dynamic shape #32626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] Fix access dimension error in dynamic shape #32626
Conversation
The regression was observed in the recent memory reset bugfix PR. The issue is get_dims() failure that occurs when calling the convolution feature in dynamic shape. Solution: The fix integrates the logic from the existing get_conv_channel_count() utility to correctly and safely determine the channel dimension of the convolution's input/output under dynamic condition, preventing the dimension access failure. Signed-off-by: hyunback <[email protected]>
| 
           Looks like this is a fix for the same problem #32601  | 
    
| // If the channel count is dynamic, we cannot verify feature alignment, | ||
| // so we conservatively skip the reset and return false for this condition. | ||
| if (in_channel_count == -1) | ||
| return false; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the conservative behavior here is to reset memory for potential accuracy issue. Do we have a case where this makes difference in performance? If not, let's return true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, return true is correct. Applied
| // if layout is single blocked and feature size is not aligned with the blocking size, need to reset output so that we can guarantee zero-filling | ||
| // NOTE: We may improve this logic to avoid reset if we are sure that it is not "corrupted" by other layers. | ||
| if (output_layout.feature() % feature_block_size != 0) { | ||
| if (in_channel_count % feature_block_size != 0) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add test case in mem_reset_test.cpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied
Signed-off-by: hyunback <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
### Description of the issue(symptom, root-cause, how it was resolved) The regression was observed in the recent memory reset bugfix PR. The issue is get_dims() failure that occurs when calling the convolution feature in dynamic shape. Solution: The fix integrates the logic from the existing get_conv_channel_count() utility to correctly and safely determine the channel dimension of the convolution's input/output under dynamic condition, preventing the dimension access failure. #### Reproduction step and snapshot (if applicable. Do not attach for customer model) ##### E2E python tools/llm_bench/benchmark.py -m ov-share-05.sclab.intel.com/cv_bench_cache/latest_models_llm/qwen2-vl-7b-instruct/pytorch/ov/OV_FP16-4BIT_DEFAULT -d GPU.1 -mc 1 -ic 256 -n 3 -pf frameworks.ai.openvino.llm.prompts/32_1024/qwen2-vl-7b-instruct.jsonl ###### Benchmark_app benchmark_app -d GPU.1 --hint none -nireq 1 -niter 1 -m ov-share-05.sclab.intel.com/cv_bench_cache/latest_models_llm/qwen2-vl-7b-instruct/pytorch/ov/OV_FP16-4BIT_DEFAULT/openvino_vision_embeddings_model.xml -data_shape hidden_states[1,1176] #### Checklist - [x] Is it a proper fix? Yes - [x] Did you include test case for this fix, if necessary? - [x] Did you review existing test that can be extended to cover this scenario? Which test did you review? mem_reset_test.cpp ### Tickets: - *CVS-175613* --------- Signed-off-by: hyunback <[email protected]>
Description of the issue(symptom, root-cause, how it was resolved)
The regression was observed in the recent memory reset bugfix PR. The issue is get_dims() failure that occurs when calling the convolution feature in dynamic shape.
Solution:
The fix integrates the logic from the existing get_conv_channel_count() utility to correctly and safely determine the channel dimension of the convolution's input/output under dynamic condition, preventing the dimension access failure.
Reproduction step and snapshot (if applicable. Do not attach for customer model)
E2E
python tools/llm_bench/benchmark.py -m ov-share-05.sclab.intel.com/cv_bench_cache/latest_models_llm/qwen2-vl-7b-instruct/pytorch/ov/OV_FP16-4BIT_DEFAULT -d GPU.1 -mc 1 -ic 256 -n 3 -pf frameworks.ai.openvino.llm.prompts/32_1024/qwen2-vl-7b-instruct.jsonl
Benchmark_app
benchmark_app -d GPU.1 --hint none -nireq 1 -niter 1 -m ov-share-05.sclab.intel.com/cv_bench_cache/latest_models_llm/qwen2-vl-7b-instruct/pytorch/ov/OV_FP16-4BIT_DEFAULT/openvino_vision_embeddings_model.xml -data_shape hidden_states[1,1176]
Checklist
Tickets: