Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 15 additions & 10 deletions apix/v1alpha2/inferencemodelrewrite_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -57,20 +57,25 @@ type InferenceModelRewriteSpec struct {
// If multiple InferenceModelRewrite resources target the same
// InferencePool, the controller will merge them based on precedence.
//
// **Timestamp Wins:** If two rules from different rewrites all matches,
// the rule from the *oldest*
// InferenceModelRewrite resource (determined by
// metadata.creationTimestamp) will be used.
// Across all rules specified on applicable rewrites, precedence MUST be
// given to the match having an "Exact" model match over a generic match
// (a rule with an empty `matches` array).
//
// If ties still exist across multiple InferenceModelRewrite resources (e.g.
// two rewrites both have an exact match for the same model), matching
// precedence MUST be determined by the oldest resource based on
// creation timestamp.
//
// If ties still exist within a single InferenceModelRewrite resource, the
// FIRST matching rule (in list order) is used.
// +required
Comment on lines +60 to 71
Copy link
Contributor Author

@zetxqx zetxqx Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nirrozenbaum @ahg-g @kfswain

I've updated the precedence rules for conflicting matches to better align with the HTTPRoute specification in the Kubernetes Gateway API. https://github.com/kubernetes-sigs/gateway-api/blob/f24f3a61f398c65ab629da1843cb65fd5ec9419f/apis/v1/httproute_types.go#L148-L209

The new precedence order is:

  1. More specific wins: An Exact match always takes precedence over an All match (where the matches array is empty).
  2. Tie-Breaker (Oldest Rule): If the specificity of the rules is the same (a tie), the rule that was created or deployed first (the older rule) wins.

This approach is more intuitive and simplifies the implementation of efficient RewriteRule fetching per request. Specifically, when we find an exact match, we no longer need to compare it against less specific, generic rules.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG, this matches what we had with InferenceModel also.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Rules []InferenceModelRewriteRule `json:"rules"`
}

// InferenceModelRewriteRule defines the match criteria and corresponding action.
//
// A specific model name can only be matched by one rule across all
// rules attached to the same InferencePool. If multiple rules attempt
// to match the same model name, the oldest rule (by creationTimestamp)
// will be the only one considered valid.
// For details on how precedence is determined across multiple rules and
// InferenceModelRewrite resources, see the "Precedence and Conflict Resolution"
// section in InferenceModelRewriteSpec.
type InferenceModelRewriteRule struct {
// Matches defines the criteria for matching a request.
// If multiple match criteria are specified, a request matches if
Expand All @@ -87,7 +92,7 @@ type InferenceModelRewriteRule struct {
// +optional
// +kubebuilder:validation:MinItems=1
//
Targets []TargetModel `json:"split,omitempty"`
Targets []TargetModel `json:"targets,omitempty"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this is a miss. updated the "split" to "targets"

}

// TargetModel defines a weighted model destination for traffic distribution.
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion config/charts/inferencepool/templates/rbac.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ metadata:
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
rules:
- apiGroups: ["inference.networking.x-k8s.io"]
resources: ["inferenceobjectives"]
resources: ["inferenceobjectives", "inferencemodelrewrites"]
verbs: ["get", "watch", "list"]
- apiGroups: ["{{ (split "/" .Values.inferencePool.apiVersion)._0 }}"]
resources: ["inferencepools"]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,9 @@ spec:
items:
description: |-
InferenceModelRewriteRule defines the match criteria and corresponding action.
A specific model name can only be matched by one rule across all
rules attached to the same InferencePool. If multiple rules attempt
to match the same model name, the oldest rule (by creationTimestamp)
will be the only one considered valid.
For details on how precedence is determined across multiple rules and
InferenceModelRewrite resources, see the "Precedence and Conflict Resolution"
section in InferenceModelRewriteSpec.
properties:
matches:
items:
Expand Down Expand Up @@ -110,7 +108,7 @@ spec:
- model
type: object
type: array
split:
targets:
items:
description: TargetModel defines a weighted model destination
for traffic distribution.
Expand Down
23 changes: 14 additions & 9 deletions docs/proposals/1816-inferenceomodelrewrite/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,20 +64,25 @@ type InferenceModelRewriteSpec struct {
// If multiple InferenceModelRewrite resources target the same
// InferencePool, the controller will merge them based on precedence.
//
// **Timestamp Wins:** If two rules from different rewrite all matches,
// the rule from the *oldest*
// InferenceModelRewrite resource (determined by
// metadata.creationTimestamp) will be used.
// Across all rules specified on applicable rewrites, precedence MUST be
// given to the match having an "Exact" model match over a generic match
// (a rule with an empty `matches` array).
//
// If ties still exist across multiple InferenceModelRewrite resources (e.g.
// two rewrites both have an exact match for the same model), matching
// precedence MUST be determined by the oldest resource based on
// creation timestamp.
//
// If ties still exist within a single InferenceModelRewrite resource, the
// FIRST matching rule (in list order) is used.
// +required
Rules []InferenceModelRewriteRule `json:"rules"`
}

// InferenceModelRewriteRule defines the match criteria and corresponding action.
//
// A specific model name can only be matched by one rule across all
// rewrites attached to the same InferencePool. If multiple rules attempt
// to match the same model name, the oldest rule (by creationTimestamp)
// will be the only one considered valid.
// For details on how precedence is determined across multiple rules and
// InferenceModelRewrite resources, see the "Precedence and Conflict Resolution"
// section in InferenceModelRewriteSpec.
type InferenceModelRewriteRule struct {
// Matches defines the criteria for matching a request.
// If multiple match criteria are specified, a request matches if
Expand Down
92 changes: 92 additions & 0 deletions pkg/epp/controller/inferencemodelrewrite_reconciler.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
/*
Copyright 2025 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package controller

import (
"context"
"fmt"

"k8s.io/apimachinery/pkg/api/errors"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/event"
"sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/controller-runtime/pkg/predicate"

"sigs.k8s.io/gateway-api-inference-extension/apix/v1alpha2"
"sigs.k8s.io/gateway-api-inference-extension/pkg/common"
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/datastore"
logutil "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/logging"
)

type InferenceModelRewriteReconciler struct {
client.Reader
Datastore datastore.Datastore
PoolGKNN common.GKNN
}

func (c *InferenceModelRewriteReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx).V(logutil.DEFAULT)
ctx = ctrl.LoggerInto(ctx, logger)

logger.Info("Reconciling InferenceModelRewrite")

infModelRewrite := &v1alpha2.InferenceModelRewrite{}
notFound := false
if err := c.Get(ctx, req.NamespacedName, infModelRewrite); err != nil {
if !errors.IsNotFound(err) {
return ctrl.Result{}, fmt.Errorf("unable to get InferenceModelRewrite - %w", err)
}
notFound = true
}

isDeleted := !infModelRewrite.DeletionTimestamp.IsZero()
isPooRefUnmatch := infModelRewrite.Spec.PoolRef == nil ||
infModelRewrite.Spec.PoolRef.Name != v1alpha2.ObjectName(c.PoolGKNN.Name) ||
infModelRewrite.Spec.PoolRef.Group != v1alpha2.Group(c.PoolGKNN.Group)

if notFound || isDeleted || isPooRefUnmatch {
// InferenceModelRewrite object got deleted or changed the referenced pool.
c.Datastore.ModelRewriteDelete(req.NamespacedName)
return ctrl.Result{}, nil
}

// Add or update if the InferenceModelRewrite instance has a creation timestamp older than the existing entry of the model.
logger = logger.WithValues("poolRef", infModelRewrite.Spec.PoolRef)
c.Datastore.ModelRewriteSet(infModelRewrite)
logger.Info("Added/Updated InferenceModelRewrite")

return ctrl.Result{}, nil
}

func (c *InferenceModelRewriteReconciler) SetupWithManager(ctx context.Context, mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&v1alpha2.InferenceModelRewrite{}).
WithEventFilter(predicate.Funcs{
CreateFunc: func(e event.CreateEvent) bool { return c.eventPredicate(e.Object.(*v1alpha2.InferenceModelRewrite)) },
UpdateFunc: func(e event.UpdateEvent) bool {
return c.eventPredicate(e.ObjectOld.(*v1alpha2.InferenceModelRewrite)) || c.eventPredicate(e.ObjectNew.(*v1alpha2.InferenceModelRewrite))
},
DeleteFunc: func(e event.DeleteEvent) bool { return c.eventPredicate(e.Object.(*v1alpha2.InferenceModelRewrite)) },
GenericFunc: func(e event.GenericEvent) bool { return c.eventPredicate(e.Object.(*v1alpha2.InferenceModelRewrite)) },
}).
Complete(c)
}

func (c *InferenceModelRewriteReconciler) eventPredicate(infModelRewrite *v1alpha2.InferenceModelRewrite) bool {
return infModelRewrite.Spec.PoolRef != nil && string(infModelRewrite.Spec.PoolRef.Name) == c.PoolGKNN.Name && string(infModelRewrite.Spec.PoolRef.Group) == c.PoolGKNN.Group
}
Loading
Loading