-
Notifications
You must be signed in to change notification settings - Fork 66
JAX-vLLM Offloading k8s AWS EKS #1798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
yhtang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this! Made some comments. Let me know what you think.
|
|
||
| # Configuration | ||
| NAMESPACE="${NAMESPACE:-default}" | ||
| JOBSET_NAME="jax-vllm-multinode" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have these variables defined twice in separate files. Is there anyway that we can provide a single source of truth to avoid unintentional errors (e.g. edited in one place but not another)?
| effect: NoSchedule | ||
| containers: | ||
| - name: gateway-container | ||
| image: 941377147396.dkr.ecr.us-east-1.amazonaws.com/sbosisio/jio:jax-k8s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note that we need to turn it into a placeholder and set it dynamically for each job in the final production workflow.
| echo "Gateway URL: ${GATEWAY_URL}" | ||
| echo "Ray Head IP: ${RAY_HEAD_IP}" | ||
|
|
||
| # 1. Wait for gateway to be ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting that in the long run this could be integrated into the bridge.
@Steboss how long could the gap be, between the start of the gateway and the application (jax/vLLM) pods? Can we make the launch of the application pods dependent on the gateway pod?
| #NCCL | ||
| # - name: NCCL_DEBUG | ||
| # value: "INFO" # Change to WARN after debugging | ||
| - name: NCCL_PROTO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NCCL's official guidelines are to avoid setting this variable explicitly whenever possible. Is this mandated by AWS?
No description provided.