Skip to content

Conversation

@anmolgupt
Copy link
Contributor

The changes in the PR support 2 main items:

  1. the GPU memory is allocated based on the selected TensorRT profile and not based on the profile that consumes max memory even when it's not selected.
  2. Avoid the creation of profile 0 execution context if it's required.

@dongfengy
Copy link

For kUSER_MANAGED , the user (in this case the triton server) would need to actually allocate a piece of device memory and pass to execution context. Do you have support for this behavior? If not I would suggest you to only add kSTATIC and kON_PROFILE_CHANGE

removed user_managed option
@yinggeh yinggeh requested a review from rmccorm4 April 17, 2025 18:35
Comment on lines 1697 to 1699
// the first context creation. As currently triton supports one
// context per engine, in order to set the specified profile_index,
// another context is created and the previous context is destroyed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As currently triton supports one context per engine, in order to set the specified profile_index, another context is created and the previous context is destroyed.

Is the comment still valid? From the code, each profile_index holds a context.

    if (profile_index == 0) {
      res.first->second.context_ = std::move(default_trt_context);
    } else {
      res.first->second.context_.reset(engine_->createExecutionContext());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested at my end, the changes work for me.

@anmolgupt anmolgupt marked this pull request as ready for review April 17, 2025 21:19
@anmolgupt anmolgupt changed the title Support for allocating GPU memory based on the selected profile feat: Support for allocating GPU memory based on the selected profile Apr 17, 2025
anmolgupt and others added 2 commits April 17, 2025 15:26
@anmolgupt
Copy link
Contributor Author

@yinggeh: New changes look good to me; I got the expected results on the models with these updates.

@yinggeh
Copy link
Contributor

yinggeh commented Apr 18, 2025

Updated README.md

@yinggeh
Copy link
Contributor

yinggeh commented Apr 18, 2025

LGTM. Thanks for your contribution.

@yinggeh yinggeh merged commit 9d7ba1d into triton-inference-server:main Apr 18, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants