Blog
Long-form writing and data analysis. Posted infrequently — but every piece starts from real numbers.
-
The 2026 LLM API value benchmark: the cheapest model that is actually good enough
Cheapest is not the same as best value. Using this site’s daily-updated pricing and the Artificial Analysis Intelligence Index, we set a quality floor of AA Index ≥ 40, rank what survives by intelligence per dollar, and work out the real monthly cost for three typical workloads.
Read more →
-
China vs US LLMs in 2026: pricing, capability, and context window compared
This site tracks 228 US models and 128 Chinese models. On one table a clear trend appears: China’s top tier (GLM 5.2, Qwen3.7 Max) now rivals GPT-5.4 on the Intelligence Index at roughly half the input price and a quarter of the output price, sometimes with a higher coding score. We break down pricing, capability, and context with real data — and how to weigh compliance and latency.
Read more →
-
The complete guide to LLM token pricing: input, output, cached, and reasoning tokens
They are all quoted as “$/1M tokens,” but input, output, cached input, and reasoning tokens are billed in completely different ways — and the final number is usually decided by the part you cannot see. Using real model pricing (Claude caches at 10% of input, OpenAI o1 only at 50%), this guide explains how each token type is charged, why output is usually 3–5× input, and how to estimate and cut the bill.
Read more →
-
Picking the open-weight value king: DeepSeek V4, MiniMax M3, Kimi K2.6, and GLM 5.2
In 2026 almost every best-value model is open-weight, and most come from Chinese labs. DeepSeek V4 Flash ($0.09, AA 40), MiniMax M3 ($0.30, AA 44, 1M context), Kimi K2.6 (strong agentic), and GLM 5.2 ($1.2, AA 51, coding 68.8) each occupy a different niche. Using real pricing and coding/agentic scores, we place all four on one price–capability map and say which to pick for which job.
Read more →
-
Why the same model costs different amounts on different platforms
The same model can cost different amounts on the official API, on OpenRouter, and on Bedrock/Vertex/Azure. This piece explains where those gaps come from — routing margin, volume and committed-use discounts, region and hosting cost, batch APIs — and why this site shows OpenRouter’s routed price (not the official price; a feature, not a bug). You will leave knowing how to find the cheapest source for your own usage.
Read more →
-
MiniMax M3 deep dive: 1M context, launch pricing, and the coding-agent price war
MiniMax M3 bundles 1M context, native multimodality, coding-agent benchmarks, OpenRouter routing, and a first-week 50% API discount. It is not a simple Claude replacement; it pressures the cost floor for long-running agents.
Read more →
-
Claude Code 2026 guide: Opus 4.8, dynamic workflows, auto mode, and plugins
Claude Code has changed quickly: Opus 4.8, dynamic workflows, auto mode, agent view, /goal, /usage, /code-review, and the security-guidance plugin all change the practical workflow. This guide turns the new features into a repeatable playbook.
Read more →
-
Claude Opus 4.8 didn't cut standard prices: how premium LLMs defend their value
Claude Opus 4.8 keeps the standard $5/M input and $25/M output price, while fast mode costs $10/$50. This is not a price-war move; it is Anthropic's case for premium models through reliability, long-horizon agents, and less rework.
Read more →
-
DeepSeek V4-Pro and MiMo V2.5 price cuts: has the LLM API price war begun?
DeepSeek V4-Pro made its 75% discount permanent, while Xiaomi MiMo V2.5 announced permanent API price cuts of up to 99%. This is not a one-off promotion; it resets the lower bound for LLM API costs.
Read more →
-
Gemini 2.0 Flash is shutting down: 2026 API pricing and migration options
Google marks Gemini 2.0 Flash and 2.0 Flash-Lite for shutdown on June 1, 2026. This guide compares 2.5 Flash-Lite, 2.5 Flash, Gemini 3 Flash, and 3.1 Flash-Lite as migration paths.
Read more →
-
5 things that actually matter when picking an LLM API in 2026
Headline price doesn't matter much anymore. What actually moves the bill in 2026 is context efficiency, reasoning-token billing, cache hit rate, output speed, and the open-vs-closed source decision.
Read more →