A collection of demo scripts for interacting with large language models through various providers and serving frameworks.
| File | Description |
|---|---|
src/openai_demo.py |
OpenAI API — sync, streaming, async, async streaming |
src/claude_demo.py |
Anthropic Claude API |
src/litellm_demo.py |
LiteLLM unified interface across multiple providers |
src/bedrock_demo.py |
AWS Bedrock |
src/vllm_demo.py |
vLLM — offline batch inference + OpenAI-compatible server |
Install vLLM and start the server:
pip install vllm
vllm serve Qwen/Qwen2.5-1.5B-Instruct --port 8000Run the demo:
# uses http://localhost:8000 and Qwen/Qwen2.5-1.5B-Instruct by default
python src/vllm_demo.py
# or override via environment variables
VLLM_BASE_URL=http://localhost:8000/v1 VLLM_MODEL=meta-llama/Llama-3.2-1B-Instruct python src/vllm_demo.py