Routing Strategies

Intelligently route your AI requests to optimize for cost, speed, or reliability

💰 Cost-Based Routing

Route to the cheapest provider that supports your requested model. Perfect for batch processing and non-time-sensitive tasks.

{
  "model": "gpt-3.5-turbo",
  "messages": [...],
  "routingStrategy": "cost"
}

⚡ Latency-Based Routing

Route to the fastest provider based on historical response times. Ideal for real-time chat and interactive applications.

{
  "model": "gpt-4",
  "messages": [...],
  "routingStrategy": "latency"
}

🎯 Priority Routing

Use your preferred provider with automatic fallback. Great for compliance requirements or provider-specific features.

{
  "model": "claude-3-opus",
  "messages": [...],
  "routingStrategy": "priority",
  "preferredProvider": "anthropic"
}

🛡️ Fallback Routing

Maximum reliability by automatically trying multiple providers. Essential for mission-critical systems.

{
  "model": "gpt-3.5-turbo",
  "messages": [...],
  "routingStrategy": "fallback"
}

⚡ Response Caching

All routing strategies benefit from intelligent caching. Identical requests return instantly at $0 cost!

Benefits:

  • 0ms latency for cached responses
  • $0 cost for cached responses
  • Enabled by default
  • Automatic cache key generation

Strategy Comparison

StrategyBest ForPros
CostBatch processingLowest cost
LatencyReal-time chatFastest response
PriorityCompliance needsControl + fallback
FallbackCritical systemsMaximum reliability