This is the April update to my March 2026 ranking. Same methodology - I use these models every day for production work and rank them based on what actually ships code, not what scores highest on a leaderboard.
March was wild. GPT-5.4 dropped on the 5th and reshuffled the top tier. Now we're a few weeks in and the dust has settled. Some models climbed. Some fell. A couple of new entrants showed up that weren't on anyone's radar.
Here's where things stand as of early April.
What Changed Since March
The big shifts this month:
- GPT-5.4 had time to prove itself - a month of daily use changed my opinion on a few things. The Thinking variant is better than I initially gave it credit for. Standard mode still over-engineers simple tasks.
- Gemini 3.1 Pro keeps getting cheaper - Google dropped prompt caching costs again. At this point it's almost irresponsible not to use it for cost-sensitive workloads.
- Qwen3.5 is eating into Sonnet's share - I'm using it more than I expected. The free API tiers are hard to ignore when the quality gap is this small.
- Still no Behemoth, no DeepSeek V4, no Grok 5 - the vaporware list hasn't changed. I'll stop listing them when they actually ship.
Quick Reference: Every Model at a Glance
| Model | Provider | Input / Output (per 1M tokens) | Context | Best For |
|---|---|---|---|---|
| GPT-5.4 Pro | OpenAI | Premium tier | 128K+ | Reasoning, math, analysis |
| Claude Opus 4.6 | Anthropic | $5 / $25 | 1M | Production code, debugging |
| Gemini 3.1 Pro | $2 / $12 | 1M+ | Price/performance king | |
| Claude Sonnet 4.6 | Anthropic | $3 / $15 | 1M | Best daily driver |
| Grok 4.20 | xAI | Competitive | 128K+ | Speed, coding benchmarks |
| Qwen3.5 | Alibaba | Free / cheap APIs | Large | Best open source |
| DeepSeek V3 | DeepSeek | ~$0.27 / $1.10 | 128K | Budget coding |
My Tier List - April Update
Mostly the same as March. Two changes worth noting.
Qwen3.5 moved up. After a month of use, I can't justify keeping it in B tier anymore. The quality on coding tasks is too close to Sonnet to pretend it's a tier below.
Coding: Writing Production Code
Same top 3 as March. Opus, GPT-5.4, Grok. The only shift is I'm now more confident about where Qwen3.5 sits - it's above Gemini for pure code generation, which I didn't expect a month ago.
- Claude Opus 4.6 - Still my #1. The commit-without-changes rate is holding at about 70%. I've been using it with my CLAUDE.md setup and the consistency is the thing that keeps it on top. Other models have better days, but Opus has fewer bad days.
- GPT-5.4 (Thinking) - Better than I said in March. The over-engineering problem is still real, but I've learned to prompt around it. Adding "keep it simple, no abstractions" to the end of my prompts cut the factory-pattern-for-no-reason problem by about half.
- Grok 4.20 - Steady. No changes from March. Fast, accurate on single files, still can't coordinate across a whole project.
- Qwen3.5 - Moving up. I used it for an entire side project last week and the code quality was indistinguishable from Sonnet on 80% of tasks. The 20% where it fell short were all multi-step refactors.
- Claude Sonnet 4.6 - Still the best value if you're paying. But Qwen3.5 is breathing down its neck on the free tier. If Alibaba keeps improving at this rate, Sonnet's value proposition gets harder to justify.
- Gemini 3.1 Pro - Reliable but I find myself reaching for Qwen over Gemini now. Google's pricing advantage doesn't matter when Qwen is free.
Reasoning: Complex Problem Solving
No changes from March. GPT-5.4 Pro still leads on pure reasoning. Opus still wins on turning reasoning into working code. The gap between them hasn't moved.
Cost: April 2026 Pricing
Google dropped Gemini caching costs again in late March. If you're doing RAG or batch processing, the effective per-token cost is now under $1 for cached prompts. That's wild for a frontier model.
Open Source Update
Qwen3.5 is the story. It jumped from B to A tier in my rankings after a month of heavy use. The hybrid thinking mode - where you can toggle between fast and deep reasoning without switching models - is something the closed-source providers should be copying.
DeepSeek V4 and R2 are still vaporware. At this point I've stopped predicting launch dates. Llama 4 Behemoth is in the same boat. If you're waiting for either of these before committing to a workflow, stop waiting. Use what exists now.
For running things locally, my recommendation hasn't changed: DeepSeek R1 Distilled 32B for reasoning, Qwen3.5 via cheap API for everything else.
My Stack - April Update
- Claude Sonnet 4.6 - 40% (down from 45%) - still my default but Qwen is eating into this
- Claude Opus 4.6 - 30% - unchanged, still my go-to for agent-style work
- GPT-5.4 - 12% - mostly for second opinions and brainstorming
- Qwen3.5 - 18% (up from 10%) - taking share from both Sonnet and GPT-5.4
The trend is clear. Open source is taking share from paid APIs. Not because the quality is better - it's not, not yet - but because the quality is good enough and the price difference is too big to ignore.
The Bottom Line
Same advice as March. Pick a model, configure it well, and ship code. The only thing that changed is Qwen3.5 earned a promotion. Everything else is noise until one of the vaporware models actually ships.
I'll update this again in May. Or sooner if something big drops.