Skip to content

Conversation

@ljluestc
Copy link

@ljluestc ljluestc commented Dec 1, 2025

Fix NodeSelector kubernetes.io/hostname Reliability Issue

Summary

Fixes #173 - Replaces unreliable kubernetes.io/hostname label-based node selection with direct metadata.name field selector to ensure trace jobs are always scheduled on the correct node, particularly important for AWS and other cloud providers where NodeName may be fully qualified while hostname label is not.

Problem

The kubectl-trace tool was using the kubernetes.io/hostname label for node selection in trace jobs. This approach is unreliable because:

  1. AWS and Cloud Provider Issues: NodeName might be fully qualified (e.g., ip-10-0-1-123.ec2.internal) while the kubernetes.io/hostname label contains only the short hostname (e.g., ip-10-0-1-123)
  2. Label Inconsistencies: The hostname label may not match the actual node name, causing trace jobs to fail scheduling
  3. Configuration Drift: The hostname label can be modified independently of the node name

This results in trace jobs failing to schedule because the NodeSelector doesn't match any available nodes.

Solution

The fix replaces the unreliable kubernetes.io/hostname label-based node selection with direct node name matching using metadata.name field selector. This ensures that trace jobs are always scheduled on the correct node regardless of hostname label configuration.

Key Changes

  1. Job Creation: Updated to use MatchFields with metadata.name instead of MatchExpressions with kubernetes.io/hostname
  2. Node Target Resolution: Simplified to use node.Name directly instead of looking up hostname label
  3. Backward Compatibility: Enhanced jobHostname function to support both new and old approaches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NodeSelector kubernetes.io/hostname not reliable

1 participant