Quick post about a change I made that’s worked out well.
I was using OpenAI API for automations in n8n — email summaries, content drafts, that kind of thing. Was spending ~$40/month.
Switched everything to Ollama running locally. The migration was pretty straightforward since n8n just hits an HTTP endpoint. Changed the URL from api.openai.com to localhost:11434 and updated the request format.
For most tasks (summarization, classification, drafting) the local models are good enough. Complex reasoning is worse but I don’t need that for automation workflows.
Hardware: i7 with 16GB RAM, running Llama 3 8B. Plenty fast for async tasks.
Free bullshit generator
No, not free, OPs power bill just climbed behind the scenes to match. Probably a discount but definitely not free.
Unless OP is running a data center, then there’s not really much of a power increase to run a local Ollama.
Running a thousand watts and not running a thousand watts can be quiet a difference depending on where you live. And then consider buying all of the hardware. In many cases it’s probably cheaper to just pay $40 al month.
That would be true worst case, but you’re never running inference 24/7. It’s no crazier than gaming in that regard.
Do you think it runs at 1000w continuously? On any decent GPU, the responses are nearly instantaneous to maybe a few seconds of runtime at maybe max GPU consumption.
Compare that to playing a few hours of cyberpunk 2077 with raytracing and maxed out settings at 4k.
Don’t get me wrong, there’s a lot to hate about AI/LLMs, but running one locally without data harvesting engines is pretty minimal. The creation of the larger models is where the consumption primarily comes in, and then the data centers that run them are servicing millions of inquiries a minute making the concentration of consumption at a single point significantly higher (plus they retrain the model there on current and user-fed data, including prompts, whereas your computer hosting ollama would not.)
Well it’s winter so any power usage he spends, he gets it back as heat
While this is correct, sometimes it can be free. I live in a cold climate, and over the winter I hooked up a folding@home computer in my office to keep things a bit warmer. Computers are 100% as efficient as a space heater.
Of course now that it’s getting warm things are changing. I’m actually in the middle of doing my last folding@home tasks until the temps drop next fall.
Even in very cold regions heat pumps maintain a COP>1, so even running it as a space heater may not be free if you have access to a more efficient alternative. Also that may be a responsible justification for Folding@Home, but I doubt OP is turning off their LLM in the summer.
I hate that LLMs are called “AI”, but they do have some uses if trained on the right data set (rather than pirating all the data of all of internet and calling making the LLM think it’s valid data). I have been wanting to set one up for my Home Assistant voice control so that it can better understand my speech. Also, for better image component recognition for tagging in Immich.
I wish they would force the companies to release their training data sets considering they are getting a lot of it illegally (not that I’m a big copyright fan, but it’s crappy that copyright applies to individuals and small businesses, but not to big rich people and corporate backed companies. And attribution, and copyleft policy if the creator wants it, is something I agree with strongly.) If we could get the data sets and pick and choose what portions we want to include and then train our own LLMs, it would be better. It’s why scientific LLMs actually are useful. They are primarily only trained with peer reviewed scientific data not 4Chan and Reddit craziness or training it with SciFi and parody works as fact. No wonder it hallucinates.
Bullshit in, bullshit out, to paraphrase. If you teach a toddler that propaganda on 4chan or with SciFi, parodies, and hate speech as fact rather than giving it all context, they turn out to be the people who post thst nonsense. But the people funding it want quick results with no effort, and that’s what they get. A poorly educated child randomly spouting nonsense. LOL
In as much as I rail against regulation, or more so…over regulation, AI needs some heavy regulation. We stand at the crossroads of a very useful tool that is unfortunately hung up in the novelty stage of pretty pictures and AI rice cookers. It could be so much more. I use AI in a few things. For one, I use AI to master the music I create. I am clinically deaf, so there are frequencies that I just can’t hear well enough to make a call. So, I lean on AI to do that, and it does it quite well actually. I use AI to solve small programming issues I’m working on, but I wouldn’t dare release anything I’ve done, AI or not, because I can always see some poor chap who used my ‘code’, and now smoke is billowing out of his computer. It’s also pretty damn good at compose files. I’ve read about medical uses that sound very efficient in ingesting tons of patient records and reports and pinpointing where services could do better in aiding the patient so that people don’t fall through the cracks and get the medical treatment they need. So, it has some great potential if we could just get some regulation and move past this novelty stage.
Stick to Mistral, who are EU based.
deleted by creator
Keep that n8n updated. Theres been several high and critical severity CVE’s recently and I’m betting more to come
Any quality difference?
IMO there’s a significant drop off with local LLMs vs the mainstream ones. This can be mitigated somewhat though by using web search tools or using retrieval augmented generation.
Basically the local models don’t (and can’t) contain the full knowledge of the universe.
BUT they can call tools pretty well and if you give the harness the capability to search Wikipedia for example, it becomes a lot smarter
I’m not a huge fan of AI, but I consider myself pretty open minded and have been considering doing a demo of Claude to at least gain an understanding of the tech I’m constantly talking shit about.
Is there anything self-hostable that compares in quality to what vibe coders claim Claude Opus is capable of?
I actually did an experiment on doing just that. For context, I’m an experienced software engineer, whose company buys him a tom of Claude usage so I had time to test out what it can actually do and I feel like I’m capable of judging where it’s good and where it falls short at.
How Claude Code works is that there are actually multiple models involved, one for doign the coding, one “reasoning” model to keep the chain of thought and the context going, and a bunch of small specialized ones for odd jobs around the thing.
The thing that doesn’t work yet is that the big reasoning model has to still be big, otherwise it will hallucinate frequently enough to break the workflow. If you could get one of the big models to run locally, you’d be there. However, with recent advances in quantization and MoE models, it’s actually getting nearer fast enough that I would expect it to be generally available in a year or two.
Today the best I could do was a tool that could take 150 gigs of RAM, 24 gigs of VRAM and AMD’s top of the line card to take 30 minutes what takes Claude Code 1-2. But surprisingly, the output of the model was not bad at all.
You really only need a little more RAM than your GPU’s VRAM (unless you’re doing CPU offloading, which is extremely slow). Otherwise, I did the same thing recently too, and was surprised I was able to get it a Qwen 9B to fix a bug in a script I had. I think Sonnet would’ve fixed in a lot fewer tries, but the 9B model was eventually able to fix it. I could’ve fixed it myself quicker and cleaner than both, but it was an interesting test.
Locally? You’d need a VERY powerful GPU to really be able to match the capabilities of Opus 4.6 online. I’ve played around with this stuff for the same reasons and while you can absolutely run a model with all of Claude’s capabilities offline, very few people will have the hardware to let it actually run at an acceptable speed and with a sufficient context window. That last part is the most important thing for coding because it’s what allows the model to operate across an entire project and not just a few functions at a time.
Nothing you can run with affordable hardware. The SOTA stuff requires hundreds of gigabytes of memory - and not RAM, GPU memory.
But you can try with stuff like gpt-oss or qwen coder
What’s the model name to pull?
Probably use Gemma4 if your machine has the chops for it.
You could probably get away with using gemma3:4b or phi3.5.
Qwen3.5 and Gemma4 are the best ones for tool calling that don’t need massive amounts of memory
I only ever use my local ai for home assistant voice assistant on my phone, but it’s more of a gimmick/party trick since I only have temperatures sensors currently (only got into ha recently) and it can’t access WiFi so it’s just quietly sitting unloaded on my truenas server
Running any LLM on truenas is not awesome. I’ve tried it with GPU passthrough and it’s just too much overhead. I may just burn all my stuff down and restart with Proxmox, run Truenas core inside just for NAS. The idea of a converged nas+virtualization is wonderful, but it’s just not there.
The host networking model alone is such a pain, then you get into performance stuff. I still like Truenas a lot, but I think that Proxmox is probably still the better platform.
deleted by creator










