The Frontier Didn't Go Dark: AI Coding Models Worth Using Right Now

You Don't Have to Use Fable and Mythos to Work on the Frontier

TLDR: Anthropic's Claude Fable 5 and Mythos 5 launched to massive benchmarks then disappeared three days later under a US government export control directive. The good news is that GPT-5.5, Nemotron 3 Ultra, MiniMax M3, and Kimi K2.7 Code are all genuinely competitive right now.

Summary: Last week Anthropic dropped Claude Fable 5 and Mythos 5 with numbers that made everyone sit up straight. Fable 5 hit 80.3% on SWE-bench Pro and 85% on OS-World Verified for computer use. These were not incremental improvements. Then, three days after launch, the US government issued an export control directive and Anthropic pulled both models globally, including for enterprise customers who had already built workflows around them. Because Anthropic has no system to instantly verify citizenship for millions of users worldwide, compliance meant disabling the models for everyone, including American citizens and Anthropic's own employees. Production doesn't pause for policy drama, so the question became: what do you actually use now?

The honest answer is that the competitive field is stronger than the Fable 5 coverage cycle made it seem. GPT-5.5 launched at the end of April as the first fully retrained base model since GPT-4.5, and two months of real production use have given the community a much clearer picture. The headline I'd point to is not the marketing benchmark but this one: MRCR v2 at 1M tokens jumped from 36.6% on GPT-5.4 to 74.0%, more than doubling. If you've ever tried to feed a full codebase to a model and watched it forget what it read three files ago, that number is what you care about. On KiloBench, GPT-5.5 holds first place with a 74.2% completion rate. The price doubled from GPT-5.4, but token efficiency improvements mean the real-world cost increase is closer to 20%. I can live with that.

NVIDIA's Nemotron 3 Ultra is the open-weight story of the year so far, and it's currently free to use in Kilo. Jensen Huang introduced it at Computex and the framing was direct: this is NVIDIA's argument that open-weight models can operate at the frontier. The architecture is a hybrid Mamba-Transformer Mixture-of-Experts with 550 billion total parameters but only 55 billion active, delivering five times higher throughput than comparable open models. Artificial Analysis puts its intelligence index score at 48, highest among US-based open-weight models. But the more important thing after the Fable 5 situation is this: open-weight self-hosted means a government directive cannot abruptly end your access. Hardware sovereignty went from a theoretical concern to a concrete one in about 72 hours last week.

MiniMax M3 arrived earlier this month with a benchmark story that should not get lost in the noise. It scored 59.0% on SWE-Bench Pro and 66.0% on Terminal Bench 2.1, edging past GPT-5.5 on both, at an API price of $0.30 per million input tokens. That's one-sixteenth the cost of Opus 4.8. The architectural innovation is MiniMax Sparse Attention, which cuts per-token compute at 1M context to roughly one-twentieth of the prior generation. It also became the third most popular Kilo model on OpenRouter in the week after the Fable news broke, which tells you something real about adoption. M3 is hosted by a Chinese lab, so data governance questions apply, but the open weights are available for self-hosted deployment, which removes the API dependency.

Moonshot AI's Kimi K2.7 Code landed on June 12th, same week as the Fable situation, and it's a 1-trillion-parameter Mixture-of-Experts model with 32 billion active parameters. The headline improvement over K2.6 is a 30% reduction in reasoning-token usage. Every agentic loop that previously burned 1,000 tokens to think through a code change now burns around 700. At scale across thousands of daily operations, that adds up. The MCP tool-use score of 81.1 on MCP Mark Verified means it reliably invokes tools correctly through the Model Context Protocol, which matters for CI checks, ticket updates, and multi-file edits in one loop. At $0.75 and $3.50 per million input and output tokens respectively, it's priced for daily use.

Key takeaways:

Claude Fable 5 and Mythos 5 were pulled globally three days after launch due to US export controls, affecting all users including enterprise customers
GPT-5.5 leads KiloBench at 74.2% completion rate; its long-context reasoning more than doubled from GPT-5.4 to GPT-5.5
Nemotron 3 Ultra is free in Kilo, self-hostable, open-weight, and the top-scoring US-based open model on Artificial Analysis
MiniMax M3 beats GPT-5.5 on two coding benchmarks at 1/16th the cost of Opus 4.8, with a permanent discount now locked in
Kimi K2.7 Code cut reasoning-token usage 30% and scores 81.1 on MCP tool-use, making it strong for agentic coding loops

Why do I care: The Fable 5 situation is a scenario I've been mentally flagging for a while. Betting your production stack on a single hosted model, no matter how capable, means you've outsourced your reliability to decisions you can't control. This isn't a critique of Anthropic specifically. Any hosted model can disappear through regulation, pricing changes, or a company pivoting. The lesson for architects is the same one I keep coming back to: model routing, fallback chains, and ideally at least one self-hostable open-weight option in your workflow. The Kilo framing around "the frontier is wider than one lab" is correct, and the numbers on GPT-5.5, Nemotron 3 Ultra, MiniMax M3, and Kimi K2.7 Code back it up. None of this is consolation prize territory.

You Don't Have to Use Fable and Mythos to Work on the Frontier