Model Catalog

AIOrouter is an All-In-One AI Router — providing a single OpenAI-compatible API endpoint that routes to Western models (Google Gemini, Anthropic Claude), Chinese models (DeepSeek, Alibaba Qwen, Moonshot Kimi, Zhipu GLM), and more — all with built-in privacy protection (bidirectional PII pseudonymization, technical secret redaction, AI Firewall) and Canada-resident infrastructure.

LLM Public Token Pricing

Prices below are public provider rates in USD per 1 million tokens before CAD conversion, taxes, and any account-specific retail presentation. The Dashboard and API model response are the customer-facing source for your exact billable rate.

Model	Input (USD/1M tokens)	Output (USD/1M tokens)	Cache pricing
claude-fable-5 (suspended)	$10.00	$50.00	Cache write: $12.50; Cache read: $1.00
claude-haiku-4.5	$1.00	$5.00	Cache write: $1.25; Cache read: $0.100
claude-opus-4.7	$5.00	$25.00	Cache write: $6.25; Cache read: $0.500
claude-opus-4.8	$5.00	$25.00	Cache write: $6.25; Cache read: $0.500
claude-sonnet-4.6	$3.00	$15.00	Cache write: $3.75; Cache read: $0.300
deepseek-v4-flash	$0.140	$0.280	Implicit cache: $0.0028
deepseek-v4-flash-alibaba	$0.138	$0.275	Cache read: $0.028
deepseek-v4-pro	$0.435	$0.870	Implicit cache: $0.0036
gemini-2.5-flash	$0.300	$2.50	Implicit cache: $0.030
gemini-2.5-pro	$1.25 up to 200K / $2.50 above	$10.00	Implicit cache: $0.125
gemini-embedding-2	$0.025	$0.0000	Not configured
glm-5	$0.180	$0.720	Not configured
glm-5.1	$0.825	$3.30	Cache read: $0.083; Implicit cache: $0.165; explicit creation $1.03
glm-5.2	$1.40	$4.40	Implicit cache: $0.260
kimi-k2.6	$0.950	$4.00	Implicit cache: $0.160
kimi-k2.7-code	$0.950	$4.00	Implicit cache: $0.190
qwen-deep-research	$7.74	$23.37	Not configured
qwen-flash-character	$0.034	$0.203	Cache read: $0.0070
qwen3-235b	$0.700	$2.80 non-thinking / $8.40 thinking	Not configured
qwen3.5-omni-flash-realtime	$0.110	$0.440	Not configured
qwen3.5-omni-plus-realtime	$0.550	$2.20	Not configured
qwen3.5-plus-2026-04-20	$0.573	$0.688	Cache read: $0.011; explicit creation $0.359
qwen3.6-27b	$0.413	$2.48	Not configured
qwen3.6-35b-a3b	$0.248	$1.49	Not configured
qwen3.6-flash	$0.165	$0.990	Cache read: $0.017; Implicit cache: $0.033; explicit creation $0.206
qwen3.6-flash-2026-04-16	$0.165	$0.990	Cache read: $0.017; explicit creation $0.206
qwen3.6-plus	$0.276	$1.65	Cache read: $0.028; Implicit cache: $0.055; explicit creation $0.345
qwen3.6-plus-2026-04-02	$0.276	$1.65	Cache read: $0.028; explicit creation $0.345
qwen3.7-max	$1.65	$4.95	Cache read: $0.165; Implicit cache: $0.330; explicit creation $2.06

Cache-Aware Billing

When a provider reports cached input tokens, AIOrouter uses the provider's public cache price instead of treating those tokens as regular input. The same calculation applies to subscription allowance, Top-Up balance, and prepaid compatibility billing paths.

Alibaba Qwen and Alibaba GLM may report implicit cache hits, explicit cache reads, and explicit cache-creation tokens. Explicit cache creation can cost more than regular input on providers that price creation separately.
Gemini implicit cache hits are passed through when Gemini reports cached input token counts. Gemini Pro cache rates follow the same prompt-size tier as regular Gemini Pro input pricing.
Anthropic Claude uses explicit prompt caching (cache_control). Cache write tokens are billed at 1.25x input price, cache read tokens at 0.10x input price. Both token types are reported in usage records.
Models without configured cache prices are billed at regular input price even if a request includes cache-like metadata.

Usage records may include cached_tokens and cache_creation_input_tokens when those fields are available from the provider.

Auto-Routing

If you do not specify a model, AIOrouter can select a provider based on availability, model capability, context fit, and current routing policy. You can see which provider handled a request in the X-Provider response header.

Current Exchange Rate

Date	Source	USD → CAD Rate	2% Buffer Applied
2026-06-20	Estimated (estimated — live Bank of Canada rate unavailable)	1.38	Yes

Token costs in CAD are computed as: USD_price × CAD_USD_rate. The rate above is refreshed daily at 09:00 EST from the Bank of Canada VALET API with a 2% buffer applied per pricing policy.

Official Pricing Sources

Anthropic Claude pricing: Anthropic API pricing
Google Gemini pricing: Google Gemini API pricing documentation
DeepSeek pricing: DeepSeek API pricing documentation
Alibaba Qwen and GLM pricing: Alibaba Model Studio Console
Kimi pricing: Moonshot public API documentation

Data freshness: This BETA catalog was updated on 2026-06-20. Token rates are public provider rates. Claude Fable 5 service suspended (U.S. Commerce Dept export controls effective 2026-06-13 — Anthropic blocked all external Fable 5/Mythos 5 access). Fallback: use Claude Opus 4.8 for equivalent capabilities.