Skip to content

Comments

Update README with token match rate on text backbone#51

Open
sdeeptan-aws wants to merge 1 commit intoaws-neuron:mainfrom
sdeeptan-aws:ovis
Open

Update README with token match rate on text backbone#51
sdeeptan-aws wants to merge 1 commit intoaws-neuron:mainfrom
sdeeptan-aws:ovis

Conversation

@sdeeptan-aws
Copy link
Contributor

Description

Updated Ovis2.5-9B contrib model README with 100% token match accuracy on text backbone. Ovis2.5 is a vision-language model — AutoModelForCausalLM does not work for multimodal models, so the specific text backbone class must be used to load the HF reference for token matching. With the correct text backbone extraction, the model achieves 100% token match. Vision/image modalities are implemented but validated text-only.

Model Information

Model Name: Ovis2.5-9B
Model Architecture: Multimodal vision-language model (decoder-only transformer text backbone)
Purpose: Vision-language understanding and text generation

Checklist

Required Components

  • Accuracy Test (test/integration/test_model.py)
    • Validates model generation and coherence
    • Performance benchmarks (TTFT, throughput)
    • Test can compile and run the model on Neuron
  • README.md with the following sections:
    • Usage Example: Clear code example showing how to use the model
    • Compatibility Matrix: Table showing tested Neuron SDK versions and instance types
    • Example Checkpoints: Links to compatible model checkpoints
    • Testing Instructions: Command to run the test suite for the model
  • Source Code (src/)
    • Modeling code following NxD Inference patterns (unchanged in this PR)

Optional Components

  • Unit Tests (CPU or Neuron-based)

Folder Structure

/contrib/models/Ovis2.5-9B/
  README.md
  /src
    modeling_ovis2_5.py
  /test
    /integration
      test_model.py

Testing

Model was compiled and tested with TP=2, batch_size=1, seq_len=128, bfloat16. Text backbone validated only — vision modalities not yet verified.

  1. Text backbone extraction: AutoModelForCausalLM fails for VLMs — must use the specific text backbone class to load HF reference
  2. Text-only validation sufficient: The LLM backbone can be validated independently of vision components

Test Results:

Test Status Result
Smoke Test ✅ PASS Model loads successfully
Token Matching ✅ PASS 100% match (text backbone)
TTFT (P50) ✅ PASS 32.92ms
Throughput ✅ PASS 30.03 tok/s

Compatibility

Tested with:

  • Instance Type(s): Trn1
  • Configuration: TP=2, batch_size=1, seq_len=128, bfloat16

Additional Information

  • Multimodal model: Ovis2.5 is a vision-language model. This port validates the text backbone only.
  • AutoModelForCausalLM doesn't work: Multimodal models register with AutoModelForVision2Seq or similar, not AutoModelForCausalLM. Use the specific text backbone class for HF reference loading.
  • Text-only validation: The LLM backbone can be validated independently — architecture differences are handled in the model implementation, not validation.

Related Issues

N/A

vLLM Integration

  • This model/feature is intended for use with vLLM
  • Documentation includes vLLM registration instructions

By submitting this PR, I confirm that:

  • I have read and followed the contributing guidelines
  • This is a community contribution and may have limited testing compared to officially-supported models
  • The code follows best practices and is well-documented
  • All required components listed above are included

Copy link

@aws-yishanm aws-yishanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved because Readme and test were present.

Copy link

@aws-yishanm aws-yishanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved because Readme and test were present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants