A list went around X in late June that summed up a year of quiet engineering decisions in nine lines. The original poster, @yuhasbeentaken, put it plainly:
here's the list of western companies moving ai workloads to chinese models... it's becoming a procurement story!!!
That last line is the whole thing. To be clear about who did what, because two separate people deserve separate credit: the original poster, @yuhasbeentaken, did all of the actual work here, the list, the hands-on testing, and the follow-up replies that backed each claim with a source. Ken E is a different person who reshared @yuhasbeentaken's thread and put it in front of me. I wrote this article, but the reporting and the receipts are @yuhasbeentaken's, and the rest is my own checking of each name against the public record. For two years the conversation about AI models was a benchmark fight, my model scores higher than your model. Sometime this spring it turned into something far more boring and far more consequential. It became procurement. Companies stopped treating frontier models as exclusive vendor relationships and started treating compute as a commodity they shop for, by the token, by the task, by the invoice.
And more and more, the cheapest capable option on the shelf is Chinese.
This isn't a politics story, whatever the headlines want it to be. It's raw unit economics and a quietly maturing inference stack. Here is where things actually stand.
The cases with receipts
Three of the moves are documented well past the point of rumor.
Lindy went all in on DeepSeek. The AI agent startup, run by Flo Crivello, moved one hundred percent of its core traffic off Anthropic's Claude models and onto DeepSeek v4. Crivello said inference had grown larger than payroll, and that the switch saves the company millions a year while performance actually went up on bread-and-butter tasks like email triage. To keep customer data inside US borders, Lindy runs the model through Atlas Cloud, a domestic inference provider, which Crivello said came out ahead of every major option after a long evaluation. (The New Stack, The Decoder, Crivello on X)
Cursor's coding model was built on Kimi. Cursor's agentic coding model, Composer 2, started from Moonshot AI's open-weight Kimi K2.5 checkpoint. A developer figured it out by intercepting the model ID on a local proxy, and Cursor eventually confirmed it. Roughly a quarter of the total compute behind Composer 2 came from that Kimi base, with the rest going to Cursor's own continued pretraining and reinforcement learning. The base was accessed through Fireworks AI under a commercial license, so the usage was authorized. Cursor's VP later said not crediting Kimi from the start was a mistake. (TechCrunch, VentureBeat, Cursor)
Coinbase cut its AI bill in half without telling engineers to use less. This is the strongest signal in the bunch, because Coinbase rewired the plumbing instead of rationing access. Engineers now default to two Chinese open-weight models, Zhipu's GLM 5.2 and Moonshot's Kimi K2.7 Code, through an internal gateway. The AI bill fell by nearly half even as token usage climbed. The biggest single lever was caching: the hit rate went from 5 percent to 60 percent. Task-based routing and tighter context did the rest. The company found that 91 percent of its engineers never hit a usage limit anyway, so quotas were never the answer. (TechTimes, Analytics India Magazine, PANews)
The wider list
@yuhasbeentaken's thread named six more, and when another user asked for sources, the original poster answered them directly, pointing to a Shopify engineering post, an arXiv paper for Uber Eats, and a Reuters item for Siemens. Treat these as reported rather than confirmed in the way the three above are, but the direction is consistent across them:
- Shopify and Airbnb: reportedly running Alibaba's Qwen.
- Uber Eats: reportedly using Qwen2 for workload routing.
- Siemens: a mix of DeepSeek and Qwen. This one has independent support beyond the thread, since Siemens, Renault, and Orange all described a hybrid American, Chinese, and European model strategy at VivaTech in Paris. (BigGo)
- Chapsvision: reportedly standardized on Qwen.
- Microsoft: reportedly testing DeepSeek v4 for internal or specialized work.
About the source: @yuhasbeentaken isn't a drive-by account. In the same thread the poster mentioned spending six hours testing an inference optimization for DeepSeek v4 and reading the paper, and the read was sober. It doesn't make the model smarter, he said, it makes inference meaningfully faster. That is the kind of detail that tends to survive contact with reality.
Why this is happening, in dollars
Strip away the geopolitics and you're left with a spread that's hard for any finance team to ignore.
By published pricing and the reporting around DeepSeek's latest release, the Pro tier runs roughly 7 times cheaper on input and 17 times cheaper on output than a frontier model like Claude Sonnet or GPT 5.5. The lightweight Flash tier reportedly undercuts entry-level options like Claude Haiku by 10 to 25 times. (VentureBeat)
You can watch the migration in the routing data. A year ago, Chinese models were under 2 percent of tokens on OpenRouter, the open marketplace where developers send prompts to whichever model wins on price and quality. By spring 2026, the combined share of providers like Alibaba, DeepSeek, Zhipu, MiniMax, and others crossed 45 percent of weekly volume. (Rest of World, Digital Applied)
When the gap is that wide, "use the American model by default" stops being a technical decision and starts being a budget line a CFO will ask about.
Everyone is becoming an exchange
The real story isn't that Chinese models got good, though they did. It's that enterprise architecture changed underneath everyone.
Coinbase didn't pick a model. It built a router that reads each prompt for cost and complexity and sends it to whatever fits. Once a company can route any prompt to any model in real time, it stops being a loyal customer of OpenAI or Anthropic. It effectively starts running a little exchange, and loyalty gets repriced every single day.
None of this is friction free. Lindy's Atlas Cloud detour exists precisely because routing customer data to a Chinese model raises real data residency questions, and at least one outlet pointed out that Coinbase's leadership led with the savings and not the legal exposure. (TechTimes) Smart buyers are routing around those concerns with US-hosted inference of open-weight models, not ignoring them.
But the gravity is clear. Enterprise spending has shifted from buying the smartest possible answer to buying the cheapest acceptable one, and the wider market noticed. Even Anthropic and OpenAI watchers now frame the moment as a move from "tokenmaxxing" to efficiency. (CNBC) If the Western frontier labs don't meet that price reality, the flow of routine workloads toward cheaper infrastructure won't slow down. It'll compound, the same way Coinbase's caching did.
The same arithmetic that pushes Coinbase toward cheaper models is what lets a one-person shop run a real AI stack for about twenty dollars a month instead of hiring an agency. That do-it-yourself version is the whole argument of my book The $20 Dollar Agency (search the title on Amazon Kindle): the marketing, search, and content work you can run yourself once you stop overpaying for the tools.
Related reading
- I Cut a Recurring AI Bill by More Than Half in an Afternoon: the same caching-and-routing playbook, scaled down to a solo operator.
- GLM 5.2: What Z.ai's Open-Weight Flagship Actually Ships: a closer look at one of the models in Coinbase's gateway.
- Qwen: When It's the Right Model and How to Run It at Home: the open-weight family showing up on half this list.
- Why Use the Separate Anthropic API, and How It Actually Works: the frontier side of the trade, and when it's still worth paying for.
- Running Your Own AI On-Prem in 2026: when keeping inference in-house beats routing it anywhere at all.
Sources
- The originating thread: @yuhasbeentaken on X, June 28, 2026, surfaced and shared by Ken E (a separate person from the original poster).
- Lindy and DeepSeek: The New Stack, The Decoder.
- Cursor and Kimi K2.5: TechCrunch, VentureBeat.
- Coinbase routing and savings: TechTimes, Analytics India Magazine.
- The procurement shift and pricing: Tech Startups, VentureBeat on DeepSeek pricing, Rest of World.
- European hybrid adoption: BigGo on Siemens, Renault, and Orange.
Mentions of specific companies and models are nominative fair use. No affiliation is implied. The numbered company list originates with @yuhasbeentaken's X thread, shared by Ken E, and the reporting cited above; items flagged as "reported" have not been independently confirmed to the same standard as the Lindy, Cursor, and Coinbase cases.