Ever run an AI model locally? If you want the most capability you need a fast GPU with 32-48gb RAM. And that’s all for you, ONE user.
Copilot has millions of users, with tens or hundreds of thousands of them hitting the AI all at once. Each one needs $thousands worth of GPU and RAM dedicated to them for the length of their query processing.
Where do you think the money to buy all that hardware comes from? You see OpenAI buying a double digit percentage of the world’s RAM production, you think they got it on clearance sale?
No, there are investors. Investors who are pouring hundreds of billions into this AI stuff. And they don’t do this because it’s fun, they do it because they expect a BIG return.
So what’s going on is just like your neighborhood drug pusher, only the drug pusher is more honest. He says ‘first hit’s free, man’. AI company says ‘AI models are an easy and cost effective way to modernize your workflow!’; they don’t tell you that once you’ve integrated them and fired all the humans who know how to do the work, the price is gonna go way up.
Because the fact is, there IS a real cost of AI compute. GPU time, or at the large scale, datacenter space, power, cooling, etc.
In another few months to few years, the C-suites will stop huffing the koolaid and will start doing cost-benefit analysis on where AI is and isn’t cost-effective vs. humans. With any luck (for the AI people) by that time the AIs will be good enough that it’s a clear benefit. If not this bubble’s gonna pop.
It’s more complex than that. The weights of big models are distributed, and then tokens are processed in parallel for multiple users. The setup varies, but it could be 8 GPUs serving many dozens of users at once, or bigger sets with even more parallelism.
I think the bigger problem is that Copilot is… shit.
It’s probably some ancient, inefficient architecture, not something super sparse and hardware efficient like (say) Deepseek V4, or Kimi 2.6, or Gemini Pro.
And literally every interesting dev team Microsoft has ever acquired (Phi, WizardLM, many more), and any interesting innovation they figured out, has just disappeared into a black hole.
They don’t have custom hardware, either, like Huawei NPUs or Cerebras WSEs, or Google TPUs. They’ve written some very interesting papers on that, and proceeded to do squat with them.
Also, it is AWFUL for its size. Tiny models that are basically free run circles around CoPilot.
What I’m getting at is that CoPilot is probably the most inefficient LLM out there. Like, it’s impressive how bad it is.
Umm copilot is just linkage to other models. My work VS instance defaults to claude but there are several others available. “Copilot” itself is not its own model
Really? I don’t use it for work, but I swore I was hitting some internal MS model for chat/code, as it was one of the worst experiences I’ve had with LLMs over 24B.
DARE told so many lies. I was told that if I smoked weed, it would ruin my life and I would become a helpless drug addict. Then I smoked weed and that didn’t happen. Then they got me thinking “what else did they lie to me about?”.
Yeah no. That’s not good business, for small dealers. Only large ones looking to improve their margins and completely without morals.
Pretty much the only people who do that are actual medical doctors (who’ve gotten convinced into believing some lies by pharma reps who are the lowest of the low) and sample offerers at large grocery stores.
I mean I’ve had free hits from my dealers, and given out free hits. Of weed. While someone is buying weed. So it’s not like I’m pushing it on anyone or anyone is pushing anything on me.
That’s why I would rather run locally. I control my data. Better for the environment. And if I ask a programming question, sure ChatGPT will come back in seconds, but I’m fine with waiting 30-60 seconds for my own AI. How impatient are we.
Depends on what you want to do, the model, and optimization or quantization.
A lot of LLM stuff that seemed pretty amazing a few years ago - chatbots and the like that respond to questions in plain language - can run in comparatively light hardware. Coding agents can take more, but could also be optimized against a particular language and spit out useful snippets.
Image stuff can be pretty complex especially at higher resolutions and detail, and creating seamless video segments gets expensive on hardware, fast.
Quite true. The thing is, there aren’t billions and billions of dollars in chatbots. The billions are for the creative stuff and the code.
And that is where the reckoning / correction will come from, the bill has to come due eventually. When top end generative AI starts to have a real cost associated with it, then it’s no longer a blanket ‘everyone start using this immediately’ mandate, it prompts some consideration of cost versus output quality.
So, what you’re saying…is the AI Bubble is going to pop once the pencil pushers do the math? But they’re asking their local LLM for that… so it isn’t happening?
A lot of the managers aggressively pushing AI have little or no understanding of it themselves. They just hear of a technology that can make a human more productive by doing most of the work for them. So absolutely that’s worth a ton of money. It’s why many companies are encouraging if not demanding employees to start using AI- because in their mind, one employee fully utilizing AI can do the work of two standard employees. Of course they believe this because they’ve never actually had to use the damn thing themselves and thus don’t realize it doesn’t do all the work for you. Or worse they think it does and your wonderful code base turns into spaghetti.
Side note- A few companies even had leaderboards for who was using the most AI tokens. This led to ‘tokenmaxxing’, trying to consume as many tokens as possible to prove you are adopting AI. Things like 'Write unit tests for our company code base, then refactor the code base. Spin up an instance of Claude and another of ChatGPT to each generate unit tests of the old code and run them against the new code, then run the tests against each other to check each other’s work, submit full debug output to another instance of gpt 5.5 that will check for hallucinations…
Keep that query going for a few paragraphs and you’ll have an army of AI workers all checking each other’s work while producing zero productive output but costing a fortune to run.
Ever run an AI model locally? If you want the most capability you need a fast GPU with 32-48gb RAM. And that’s all for you, ONE user.
Copilot has millions of users, with tens or hundreds of thousands of them hitting the AI all at once. Each one needs $thousands worth of GPU and RAM dedicated to them for the length of their query processing.
Where do you think the money to buy all that hardware comes from? You see OpenAI buying a double digit percentage of the world’s RAM production, you think they got it on clearance sale?
No, there are investors. Investors who are pouring hundreds of billions into this AI stuff. And they don’t do this because it’s fun, they do it because they expect a BIG return.
So what’s going on is just like your neighborhood drug pusher, only the drug pusher is more honest. He says ‘first hit’s free, man’. AI company says ‘AI models are an easy and cost effective way to modernize your workflow!’; they don’t tell you that once you’ve integrated them and fired all the humans who know how to do the work, the price is gonna go way up.
Because the fact is, there IS a real cost of AI compute. GPU time, or at the large scale, datacenter space, power, cooling, etc.
In another few months to few years, the C-suites will stop huffing the koolaid and will start doing cost-benefit analysis on where AI is and isn’t cost-effective vs. humans. With any luck (for the AI people) by that time the AIs will be good enough that it’s a clear benefit. If not this bubble’s gonna pop.
It’s more complex than that. The weights of big models are distributed, and then tokens are processed in parallel for multiple users. The setup varies, but it could be 8 GPUs serving many dozens of users at once, or bigger sets with even more parallelism.
I think the bigger problem is that Copilot is… shit.
It’s probably some ancient, inefficient architecture, not something super sparse and hardware efficient like (say) Deepseek V4, or Kimi 2.6, or Gemini Pro.
And literally every interesting dev team Microsoft has ever acquired (Phi, WizardLM, many more), and any interesting innovation they figured out, has just disappeared into a black hole.
They don’t have custom hardware, either, like Huawei NPUs or Cerebras WSEs, or Google TPUs. They’ve written some very interesting papers on that, and proceeded to do squat with them.
Also, it is AWFUL for its size. Tiny models that are basically free run circles around CoPilot.
What I’m getting at is that CoPilot is probably the most inefficient LLM out there. Like, it’s impressive how bad it is.
Umm copilot is just linkage to other models. My work VS instance defaults to claude but there are several others available. “Copilot” itself is not its own model
Really? I don’t use it for work, but I swore I was hitting some internal MS model for chat/code, as it was one of the worst experiences I’ve had with LLMs over 24B.
Has anyone ever actually had a drug dealer giving out free hits?
I feel like that’s the biggest lie DARE ever told.
When I was a teen, I was offered drugs for free more than once, so it’s not all a lie. Maybe it is just uncommon in many places.
I’m Brazilian, BTW.
DARE told so many lies. I was told that if I smoked weed, it would ruin my life and I would become a helpless drug addict. Then I smoked weed and that didn’t happen. Then they got me thinking “what else did they lie to me about?”.
Yeah no. That’s not good business, for small dealers. Only large ones looking to improve their margins and completely without morals.
Pretty much the only people who do that are actual medical doctors (who’ve gotten convinced into believing some lies by pharma reps who are the lowest of the low) and sample offerers at large grocery stores.
I mean I’ve had free hits from my dealers, and given out free hits. Of weed. While someone is buying weed. So it’s not like I’m pushing it on anyone or anyone is pushing anything on me.
Drug propaganda is mostly lies yeah.
Legalise, educate, tax and regulate.
That’s why I would rather run locally. I control my data. Better for the environment. And if I ask a programming question, sure ChatGPT will come back in seconds, but I’m fine with waiting 30-60 seconds for my own AI. How impatient are we.
And itbos slower and a whose model then they are using.
Even then, that’s quite small. Top of the line frontier models would be looking at hundreds of gigabytes of video memory, and just as much RAM.
A terabyte of VRAM/RAM needed for something like CoPilot is probably a fairly sensible estimate.
Depends on what you want to do, the model, and optimization or quantization.
A lot of LLM stuff that seemed pretty amazing a few years ago - chatbots and the like that respond to questions in plain language - can run in comparatively light hardware. Coding agents can take more, but could also be optimized against a particular language and spit out useful snippets.
Image stuff can be pretty complex especially at higher resolutions and detail, and creating seamless video segments gets expensive on hardware, fast.
Quite true. The thing is, there aren’t billions and billions of dollars in chatbots. The billions are for the creative stuff and the code.
And that is where the reckoning / correction will come from, the bill has to come due eventually. When top end generative AI starts to have a real cost associated with it, then it’s no longer a blanket ‘everyone start using this immediately’ mandate, it prompts some consideration of cost versus output quality.
So, what you’re saying…is the AI Bubble is going to pop once the pencil pushers do the math? But they’re asking their local LLM for that… so it isn’t happening?
Not pop. Correct.
A lot of the managers aggressively pushing AI have little or no understanding of it themselves. They just hear of a technology that can make a human more productive by doing most of the work for them. So absolutely that’s worth a ton of money. It’s why many companies are encouraging if not demanding employees to start using AI- because in their mind, one employee fully utilizing AI can do the work of two standard employees. Of course they believe this because they’ve never actually had to use the damn thing themselves and thus don’t realize it doesn’t do all the work for you. Or worse they think it does and your wonderful code base turns into spaghetti.
Side note- A few companies even had leaderboards for who was using the most AI tokens. This led to ‘tokenmaxxing’, trying to consume as many tokens as possible to prove you are adopting AI. Things like 'Write unit tests for our company code base, then refactor the code base. Spin up an instance of Claude and another of ChatGPT to each generate unit tests of the old code and run them against the new code, then run the tests against each other to check each other’s work, submit full debug output to another instance of gpt 5.5 that will check for hallucinations… Keep that query going for a few paragraphs and you’ll have an army of AI workers all checking each other’s work while producing zero productive output but costing a fortune to run.