Blog

Long-form writing and data analysis. Posted infrequently — but every piece starts from real numbers.

The 2026 LLM API value benchmark: the cheapest model that is actually good enough

Published 2026-06-19 · About 11 min read

Cheapest is not the same as best value. Using this site’s daily-updated pricing and the Artificial Analysis Intelligence Index, we set a quality floor of AA Index ≥ 40, rank what survives by intelligence per dollar, and work out the real monthly cost for three typical workloads.

Read more →
China vs US LLMs in 2026: pricing, capability, and context window compared

Published 2026-06-18 · About 12 min read

This site tracks 228 US models and 128 Chinese models. On one table a clear trend appears: China’s top tier (GLM 5.2, Qwen3.7 Max) now rivals GPT-5.4 on the Intelligence Index at roughly half the input price and a quarter of the output price, sometimes with a higher coding score. We break down pricing, capability, and context with real data — and how to weigh compliance and latency.

Read more →
The complete guide to LLM token pricing: input, output, cached, and reasoning tokens

Published 2026-06-17 · About 10 min read

They are all quoted as “$/1M tokens,” but input, output, cached input, and reasoning tokens are billed in completely different ways — and the final number is usually decided by the part you cannot see. Using real model pricing (Claude caches at 10% of input, OpenAI o1 only at 50%), this guide explains how each token type is charged, why output is usually 3–5× input, and how to estimate and cut the bill.

Read more →
Picking the open-weight value king: DeepSeek V4, MiniMax M3, Kimi K2.6, and GLM 5.2

Published 2026-06-16 · About 11 min read

In 2026 almost every best-value model is open-weight, and most come from Chinese labs. DeepSeek V4 Flash ($0.09, AA 40), MiniMax M3 ($0.30, AA 44, 1M context), Kimi K2.6 (strong agentic), and GLM 5.2 ($1.2, AA 51, coding 68.8) each occupy a different niche. Using real pricing and coding/agentic scores, we place all four on one price–capability map and say which to pick for which job.

Read more →
Why the same model costs different amounts on different platforms

Published 2026-06-15 · About 9 min read

The same model can cost different amounts on the official API, on OpenRouter, and on Bedrock/Vertex/Azure. This piece explains where those gaps come from — routing margin, volume and committed-use discounts, region and hosting cost, batch APIs — and why this site shows OpenRouter’s routed price (not the official price; a feature, not a bug). You will leave knowing how to find the cheapest source for your own usage.

Read more →
MiniMax M3 deep dive: 1M context, launch pricing, and the coding-agent price war

Published 2026-06-05 · About 10 min read

MiniMax M3 bundles 1M context, native multimodality, coding-agent benchmarks, OpenRouter routing, and a first-week 50% API discount. It is not a simple Claude replacement; it pressures the cost floor for long-running agents.

Read more →
Claude Code 2026 guide: Opus 4.8, dynamic workflows, auto mode, and plugins

Published 2026-06-01 · About 12 min read

Claude Code has changed quickly: Opus 4.8, dynamic workflows, auto mode, agent view, /goal, /usage, /code-review, and the security-guidance plugin all change the practical workflow. This guide turns the new features into a repeatable playbook.

Read more →
Claude Opus 4.8 didn't cut standard prices: how premium LLMs defend their value

Published 2026-05-31 · About 9 min read

Claude Opus 4.8 keeps the standard $5/M input and $25/M output price, while fast mode costs $10/$50. This is not a price-war move; it is Anthropic's case for premium models through reliability, long-horizon agents, and less rework.

Read more →
DeepSeek V4-Pro and MiMo V2.5 price cuts: has the LLM API price war begun?

Published 2026-05-28 · About 11 min read

DeepSeek V4-Pro made its 75% discount permanent, while Xiaomi MiMo V2.5 announced permanent API price cuts of up to 99%. This is not a one-off promotion; it resets the lower bound for LLM API costs.

Read more →
Gemini 2.0 Flash is shutting down: 2026 API pricing and migration options

Published 2026-05-24 · About 10 min read

Google marks Gemini 2.0 Flash and 2.0 Flash-Lite for shutdown on June 1, 2026. This guide compares 2.5 Flash-Lite, 2.5 Flash, Gemini 3 Flash, and 3.1 Flash-Lite as migration paths.

Read more →
5 things that actually matter when picking an LLM API in 2026

Published 2026-05-17 · About 8 min read

Headline price doesn't matter much anymore. What actually moves the bill in 2026 is context efficiency, reasoning-token billing, cache hit rate, output speed, and the open-vs-closed source decision.

Read more →