Local LLMs
This blog post was released on the thirteenth of January 2026.
Introduction
Initially, it was supposed to be an easy post. I wanted to add some server functions, went really deep inside a rabbit hole and found some other problems I need to solve, retreated for a little (30 minutes), pulled myself together and now you can see this website through Cloudflare’s CDN.
Why
Anyway, I planned to talk about one particular subject matter for a long time – LLMs (Large Language Models).
The only way
I started experimenting with local LLMs in August 2025, starting with ollama. It’s easy to set up, really good for checking what all that stuff is about. It’s built on top of llama.cpp, which is a really cool project by itself. Now I think of ollama as something for beginners who want to try running LLMs on their hardware without too much thought (or GUI).
While llama.cpp is great, it’s more of a backend.
Different kinds
Not long time ago I also tried ComfyUI to run some image‑generating models. I will look more into this class of models, as well as voice and 3D mesh generation. As far as I know, all of them are pretty limited, but it’s important to keep track of the new tech.
The agents of AI
I tried Zed editor with agentic models and it was somewhat acceptable. It definitely can vibe‑code something. Being honest, it even helped with my blog section placeholder. It might take a while to generate, but it was interesting to see that kind of interaction ran completely locally.
I didn’t mention any exact models I tried, because it’s a big topic for another post or even posts.
Details
There are a lot of tricks with parameters, quantization and kinds of models. For example, MoE models don’t activate all parameters at once and can be used with CPUs more comfortably. It requires thorough analysis in order to make any assumptions about viability of this tech at the moment.
RAM Altman
At the same time, we all know the impact of major tech firms on everyday users. RAM prices have spiked, SSDs get more expensive too. Soon GPUs will be hit again and we will live in a situation we had some years ago.
AI, ai, aI, Ai
Probably you have noticed that I haven’t used that holy abbreviation and I did it on purpose. The thing is, the Pandora Box was opened. It’s not possible or in any way viable to ignore so‑called Artificial Intelligence. It’s not intelligence, but it’s artificial. Sometimes it even can help you solve some problems.
The truth nuke
Those who are eager to find out how something works will understand limitations of generative AI and why potential AGI (artificial general intelligence) won’t come from LLMs. It’s not my opinion; I am not smart enough for Data Science, unfortunately. Anyway, it’s pretty easy to find information about this.
Enshitification
Also, it’s pretty obvious that generative AI decreased the quality of everything. Videos, photos and art now contaminated with low‑effort AI‑generated garbage. Google search shows signs of degradation, it's almost impossible to use search engines nowadays. Quality of everything on the Internet is decreased and we still have to find efficient ways to fight slop.
It's because of money
At the same time, layoffs continue, not just because of AI. Because firing employees and justifying it with “our process was enhanced with AI and we need less people” looks better for stock markets than admitting “we are running out of money and we’re in desperate need to cut spending by firing our employees.”
Even biggest and most advanced “new‑gen” AI companies don’t make profit. Google, Microsoft and other old big‑tech firms have a lot of money from other directions. Their AI cannot bring significant revenue. In my opinion it’s a similar idea to create a “paid” entrance to the Internet. Not everything can be monetized.
Does it even make sense?
What is more, we cannot even guarantee that using modern LLMs will help you with your productivity.
In addition, many AI tools are not there yet in terms of quality. There are popular and really helpful coding models, but still something tells me that operating it on expensive hardware in another’s machine is not the best solution. These models are probabilistic by definition; output is not guaranteed. How can you charge people for that?
On the other hand, there are ways to use LLMs that boost productivity. Possible examples would be using them for debugging and processing huge pieces of unreadable text strings.
I am not a marxist
I think it would be absolutely criminally stupid to let this field develop without real free (libre) alternatives. The future of this current wave of generative models can lie in locally run models, not cloud‑API options where nobody knows what they do with your data.
It’s pretty obvious that we are close to an “own nothing and be happy” situation. That is why it’s important to develop systems and models you can run locally without giving up security.
I continue my way to exploring FOSS tools for this particularly interesting modern piece of tech. I hope to find or possibly create opportunities for people to access it.