The next article initially appeared on Medium and is being republished right here with the writer’s permission.
Ask 10 builders which LLM they’d suggest and also you’ll get 10 totally different solutions—and virtually none of them are primarily based on goal comparability. What you’ll get as a substitute is a mirrored image of the fashions they occur to have entry to, those their employer authorised, and those that influencers they comply with have been quietly paid to advertise.
We’re all residing inside recursively nested walled gardens, and most of us don’t notice it.

The entry drawback
In company environments, the mannequin choice typically occurs accidentally. Somebody on the staff tries Claude Code one weekend, will get excited, tells the group on Slack, and instantly the entire group is utilizing it. No one evaluated options. No one ran a bakeoff. The choice was made by whoever had an organization card and a free Saturday.
That’s not a criticism—it’s simply how this stuff go. But it surely signifies that when that very same individual tells you their favourite mannequin, they’re actually telling you which of them mannequin they’ve had essentially the most reps with. There’s a real studying perform at play: You get sooner, your prompts get higher, and the mannequin begins to really feel virtually intuitive. It’s not that the mannequin is objectively superior. It’s that you just’ve gotten good at utilizing it.
This issues greater than individuals admit, as a result of a variety of this house runs on emotions fairly than proof. Folks really feel good about Opus proper now. It feels highly effective; it feels sensible; it feels such as you’re utilizing one of the best instrument obtainable. And perhaps you might be. However ask somebody who’s paying for their very own tokens whether or not they really feel the identical approach, and also you are likely to get a extra calibrated reply. Pores and skin within the sport has a approach of sharpening opinions.
The affect drawback
There’s additionally some huge cash transferring by means of this house in ways in which don’t all the time get disclosed. Mannequin suppliers are spending actual price range to ensure the best individuals have the best experiences—early entry, credit, invites to the best occasions. Anthropic does it. OpenAI does it. This isn’t a scandal; it’s simply advertising and marketing, however it muddies the sign significantly. When somebody you comply with is effusive a few mannequin, it’s value asking whether or not they arrived at that opinion by means of sustained use or by means of a curated demo atmosphere.
In the meantime, some builders—particularly these constructing within the open—will use no matter doesn’t price an arm and a leg. Their enthusiasm for a mannequin could be extra about its pricing tier than its functionality ceiling. That’s additionally a legitimate sign, however it’s not the identical sign.
The alignment drawback (the opposite one)
Then there are the geopolitical issues. Some builders are intentionally avoiding Qwen and GLM attributable to issues in regards to the nations they originate from. Others are utilizing them as a result of they’re compelling, succesful fashions that occur to be dramatically cheaper. Each camps suppose the opposite is being naive. It is a actual dialog that doesn’t have a clear reply, however it’s taking place largely beneath the floor.
What I’ve truly been doing
I’ve been forcing myself to check outdoors my consolation zone. I’ve spent the final week utilizing Codex significantly—not casually—and my expertise to this point is that it’s almost indistinguishable from Claude Sonnet 4.6 for many coding duties, and it’s operating at roughly half the fee if you consider how effectively it makes use of tokens. That’s not a small distinction. I wish to stay with it longer earlier than I’ve a agency opinion, however “every week” is the minimal threshold I’d set for any mannequin analysis. Something much less and also you’re simply score your first impression.
I’ve additionally began utilizing Qwen and GLM-5 significantly. Early outcomes are attention-grabbing. I’ve had some compelling successes and some jarring errors. I’ll reserve judgment.
What I’ve seen with my very own Anthropic utilization is one thing value naming: I default to Haiku for well-scoped, mechanical duties. Sonnet handles virtually the whole lot else with room to spare. Opus solely comes out once I want real breadth—structure questions, strategic framing, something with a genuinely huge scope. However I’ve watched individuals in company environments go away the dial on Opus completely as a result of they’re not paying for tokens themselves. And right here’s the factor—that’s truly not all the time to their benefit. Excessive-powered fashions overthink easy duties. They’ll add abstractions you didn’t ask for, restructure issues that didn’t want restructuring. When I’ve a clearly templated class to write down, Haiku will get it proper at a tenth of the fee, and it doesn’t second-guess the design.
The factor we must be speaking about
Everybody final month was exercised about what Sam Altman stated about power consumption. Positive. However I feel the extra urgent query is about advertising and marketing budgets and the way they’re distorting the collective understanding of those instruments. The benchmarks are beginning to really feel managed. The influencer protection is clearly formed. The entry applications create a optimistic bias amongst individuals with the biggest audiences.
None of this implies the fashions are unhealthy. A few of them are genuinely outstanding. However if you ask somebody which mannequin to make use of, you’re getting a solution that’s filtered by means of their employer’s procurement selections, the influencers they comply with, what they will afford, and the way lengthy they’ve been utilizing that individual instrument. The reply you get tells you a large number about their scenario. It tells you virtually nothing in regards to the mannequin.
Take all of it with applicable skepticism—together with this put up.
