Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .dev.vars.example
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,13 @@

# Required: OAuth2 credentials JSON from Gemini CLI authentication
# Get this by running `gemini auth` and copying the contents of ~/.gemini/oauth_creds.json
# Supports single account (object) or multiple accounts (array) for rate limit avoidance
# Single account example:
GCP_SERVICE_ACCOUNT={"access_token":"ya29.a0AS3H6Nx...","refresh_token":"1//09FtpJYpxOd...","scope":"https://www.googleapis.com/auth/cloud-platform ...","token_type":"Bearer","id_token":"eyJhbGciOiJSUzI1NiIs...","expiry_date":1750927763467}

# Multiple accounts example (for rate limit avoidance):
# GCP_SERVICE_ACCOUNT=[{"access_token":"ya29...","refresh_token":"1//...","scope":"...","token_type":"Bearer","id_token":"eyJ...","expiry_date":1750927763467},{"access_token":"ya29...","refresh_token":"1//...","scope":"...","token_type":"Bearer","id_token":"eyJ...","expiry_date":1750927763467}]

# Optional: Google Cloud Project ID (auto-discovered if not set)
# GEMINI_PROJECT_ID=your-project-id

Expand All @@ -12,6 +17,11 @@ GCP_SERVICE_ACCOUNT={"access_token":"ya29.a0AS3H6Nx...","refresh_token":"1//09Ft
# Example: sk-1234567890abcdef1234567890abcdef
OPENAI_API_KEY=sk-your-secret-api-key-here

# Optional: Enable multi-account rotation for rate limit avoidance (set to "true" to enable)
# When enabled with multiple accounts in GCP_SERVICE_ACCOUNT, the system will automatically
# rotate between accounts when rate limits are encountered, ensuring continuous operation
ENABLE_MULTI_ACCOUNT=true

# Optional: Enable fake thinking output for thinking models (set to "true" to enable)
# When enabled, models marked with thinking: true will generate synthetic reasoning text
# before providing their actual response, similar to OpenAI's o3 model behavior
Expand Down
168 changes: 168 additions & 0 deletions MULTI_ACCOUNT_TESTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Multi-Account Support Testing Guide

This document explains how to test the multi-account support feature for rate limiting avoidance.

## Setup for Testing

### 1. Prepare Multiple Google Accounts

You'll need at least 2 Google accounts authenticated with Gemini CLI:

```bash
# Account 1
gemini auth
# Copy ~/.gemini/oauth_creds.json to account1.json

# Account 2 (use different Google account)
# Delete ~/.gemini/oauth_creds.json first
gemini auth
# Copy ~/.gemini/oauth_creds.json to account2.json
```

### 2. Create Multi-Account Configuration

Combine the credentials into a JSON array:

```bash
# Create combined.json
echo '[' > combined.json
cat account1.json >> combined.json
echo ',' >> combined.json
cat account2.json >> combined.json
echo ']' >> combined.json

# Minify for environment variable (remove newlines and spaces)
cat combined.json | jq -c '.' > credentials.json
```

### 3. Configure Environment Variables

In your `.dev.vars` file:

```bash
# Multi-account credentials
GCP_SERVICE_ACCOUNT=<paste content from credentials.json>

# Enable multi-account rotation
ENABLE_MULTI_ACCOUNT=true

# Optional: Your API key
OPENAI_API_KEY=sk-your-test-key
```

## Testing Scenarios

### Test 1: Basic Account Rotation

1. Start the development server:
```bash
npm run dev
```

2. Make a request to the chat completions endpoint:
```bash
curl -X POST http://localhost:8787/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-your-test-key" \
-d '{
"model": "gemini-2.5-flash",
"messages": [{"role": "user", "content": "Hello"}]
}'
```

3. Check the console logs - you should see:
- `Loaded 2 accounts. Multi-account mode: true`
- `Found available account at index X`

### Test 2: Rate Limit Fallback

To test rate limit handling, you would need to:

1. Generate enough requests to hit the rate limit on account 1
2. The system should automatically switch to account 2
3. Check logs for:
- `Got rate limit error (429) for account 0`
- `Marking account 0 as rate-limited`
- `Switching from account 0 to account 1`

### Test 3: Account Health Tracking

1. Check the KV storage for account health data:
```bash
wrangler kv:key list --binding=GEMINI_CLI_KV
```

2. You should see keys like:
- `oauth_token_cache_account_0`
- `oauth_token_cache_account_1`
- `account_rotation_state`
- `account_health_0` (if an account was rate-limited)

### Test 4: Single Account Compatibility

To verify backward compatibility:

1. Configure a single account (not an array):
```bash
GCP_SERVICE_ACCOUNT={"access_token":"...","refresh_token":"...","scope":"...","token_type":"Bearer","id_token":"...","expiry_date":...}
ENABLE_MULTI_ACCOUNT=false
```

2. The system should work exactly as before with no multi-account logic

## Monitoring in Production

When deployed to Cloudflare Workers, monitor the logs:

```bash
wrangler tail
```

Look for:
- Account rotation events
- Rate limit detections
- Successful failovers
- Account health updates

## Expected Behavior

### Normal Operation
- Requests use accounts in round-robin rotation
- Each account's token is cached independently
- Rotation state is persisted in KV storage

### Rate Limit Scenario
1. Request fails with HTTP 429 or 503
2. Current account is marked as rate-limited
3. System switches to next available account
4. Request is retried (up to 3 times)
5. Rate-limited account enters cooldown (60 seconds)

### All Accounts Rate-Limited
- System will return an error after exhausting all accounts
- Error message: "All accounts are rate-limited. Please try again later."

## Troubleshooting

### Issue: "Authentication failed"
- Verify all accounts have valid refresh tokens
- Check that credentials are properly formatted as JSON array
- Ensure `ENABLE_MULTI_ACCOUNT=true` is set

### Issue: Not switching accounts on rate limit
- Verify `ENABLE_MULTI_ACCOUNT=true` is set
- Check that you have multiple accounts in the array
- Review worker logs for error messages

### Issue: Accounts not recovering from rate limit
- Check KV storage TTL settings (default 60 seconds cooldown)
- Verify account health keys expire properly
- Review timestamp calculations in logs

## Performance Metrics

Expected improvements with N accounts:
- Rate limit capacity: ~N × single account limit
- Failover time: < 100ms (KV lookup + auth)
- Additional storage: ~1KB per account in KV
- Request overhead: Minimal (~10ms for account selection)
84 changes: 82 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Transform Google's Gemini models into OpenAI-compatible endpoints using Cloudfla
- 🆓 **Free Tier Access** - Leverage Google's free tier through Code Assist API
- 📡 **Real-time Streaming** - Server-sent events for live responses with token usage
- 🎭 **Multiple Models** - Access to latest Gemini models including experimental ones
- 🔀 **Multi-Account Support** - Automatic rotation between multiple accounts to avoid rate limiting

## 🤖 Supported Models

Expand Down Expand Up @@ -100,6 +101,55 @@ You need OAuth2 credentials from a Google account that has accessed Gemini. The
}
```

#### Multi-Account Setup (Optional - for Rate Limit Avoidance)

To avoid rate limiting, you can configure multiple Google accounts. The system will automatically rotate between accounts when one hits a rate limit.

1. **Authenticate your first account**:
- Run `gemini auth` and login with your first Google account
- Navigate to the credentials file location:
- **Windows:** `C:\Users\USERNAME\.gemini\oauth_creds.json`
- **macOS/Linux:** `~/.gemini/oauth_creds.json`
- Copy the entire contents to a file named `account1.json`

2. **Authenticate your second account** (repeat for more accounts):
- Delete the existing `~/.gemini/oauth_creds.json` file
- Run `gemini auth` again and login with your second Google account
- Copy the new credentials to `account2.json`

3. **Combine credentials into an array**:
Instead of a single credential object, use a JSON array:
```json
[
{
"access_token": "ya29.a0AS3H6Nx...",
"refresh_token": "1//09FtpJYpxOd...",
"scope": "https://www.googleapis.com/auth/cloud-platform ...",
"token_type": "Bearer",
"id_token": "eyJhbGciOiJSUzI1NiIs...",
"expiry_date": 1750927763467
},
{
"access_token": "ya29.a0Bb2H8Mx...",
"refresh_token": "1//09GtqKZqyPe...",
"scope": "https://www.googleapis.com/auth/cloud-platform ...",
"token_type": "Bearer",
"id_token": "eyJhbGciOiJSUzI1NiIt...",
"expiry_date": 1750927763467
}
]
```

4. **Enable multi-account mode**:
Set the `ENABLE_MULTI_ACCOUNT` environment variable to `"true"` in your `.dev.vars` file (see Step 3: Environment Setup below).

**How it works:**
- The system tracks account health and rate limit status in Cloudflare KV storage
- When a request fails with a rate limit error (HTTP 429 or 503), the system automatically switches to the next available account
- Rate-limited accounts are placed on cooldown (60 seconds by default) before being tried again
- Accounts are rotated in a round-robin fashion for optimal distribution
- Up to 3 retry attempts are made before giving up

### Step 2: Create KV Namespace

```bash
Expand All @@ -118,6 +168,8 @@ kv_namespaces = [
### Step 3: Environment Setup

Create a `.dev.vars` file:

**Single Account (Basic Setup):**
```bash
# Required: OAuth2 credentials JSON from Gemini CLI authentication
GCP_SERVICE_ACCOUNT={"access_token":"ya29...","refresh_token":"1//...","scope":"...","token_type":"Bearer","id_token":"eyJ...","expiry_date":1750927763467}
Expand All @@ -131,6 +183,18 @@ GCP_SERVICE_ACCOUNT={"access_token":"ya29...","refresh_token":"1//...","scope":"
OPENAI_API_KEY=sk-your-secret-api-key-here
```

**Multiple Accounts (Rate Limit Avoidance):**
```bash
# Required: OAuth2 credentials JSON array for multiple accounts
GCP_SERVICE_ACCOUNT=[{"access_token":"ya29...","refresh_token":"1//...","scope":"...","token_type":"Bearer","id_token":"eyJ...","expiry_date":1750927763467},{"access_token":"ya29...","refresh_token":"1//...","scope":"...","token_type":"Bearer","id_token":"eyJ...","expiry_date":1750927763467}]

# Optional: Enable multi-account rotation (required for automatic account switching)
ENABLE_MULTI_ACCOUNT=true

# Optional: API key for authentication
OPENAI_API_KEY=sk-your-secret-api-key-here
```

For production, set the secrets:
```bash
wrangler secret put GCP_SERVICE_ACCOUNT
Expand Down Expand Up @@ -158,9 +222,10 @@ npm run dev

| Variable | Required | Description |
|----------|----------|-------------|
| `GCP_SERVICE_ACCOUNT` | ✅ | OAuth2 credentials JSON string. |
| `GCP_SERVICE_ACCOUNT` | ✅ | OAuth2 credentials JSON string. Supports single account (object) or multiple accounts (array) for rate limit avoidance. |
| `GEMINI_PROJECT_ID` | ❌ | Google Cloud Project ID (auto-discovered if not set). |
| `OPENAI_API_KEY` | ❌ | API key for authentication. If not set, the API is public. |
| `ENABLE_MULTI_ACCOUNT` | ❌ | Enable multi-account rotation for rate limit avoidance (set to `"true"`). Only works when `GCP_SERVICE_ACCOUNT` contains an array of accounts. |

#### Thinking & Reasoning

Expand Down Expand Up @@ -217,11 +282,26 @@ npm run dev
- Only applies to supported model pairs (currently: pro → flash).
- Works for both streaming and non-streaming requests.

**Multi-Account Support:**
- When `ENABLE_MULTI_ACCOUNT` is set to `"true"` and `GCP_SERVICE_ACCOUNT` contains multiple accounts (JSON array), the system automatically rotates between accounts to avoid rate limiting.
- **Intelligent Rotation**: Accounts are rotated in a round-robin fashion, with automatic fallback when one account hits a rate limit.
- **Account Health Tracking**: The system tracks which accounts are rate-limited and automatically skips them until the cooldown period expires (60 seconds default).
- **Seamless Failover**: When a request fails with HTTP 429 or 503, the system immediately switches to the next available account and retries the request (up to 3 attempts).
- **Stateless & Distributed**: Account rotation state is stored in Cloudflare KV, ensuring consistent behavior across all edge locations and worker instances.
- **Token Caching**: Each account's OAuth token is cached independently in KV storage for optimal performance.
- **Works with Auto Model Switching**: Multi-account rotation and auto model switching can be used together for maximum resilience.

**Benefits of Multi-Account:**
- **Increased throughput**: Effectively multiplies your rate limit capacity by the number of accounts.
- **Uninterrupted service**: Automatic failover ensures requests don't fail due to rate limits.
- **Simple setup**: Just authenticate multiple Google accounts and combine their credentials into an array.
- **Production-ready**: Designed for serverless Cloudflare Workers with distributed state management.

### KV Namespaces

| Binding | Purpose |
|---------|---------|
| `GEMINI_CLI_KV` | Token caching and session management |
| `GEMINI_CLI_KV` | Token caching, session management, account rotation state, and account health tracking |

## 🚨 Troubleshooting

Expand Down
Loading