Published on 12.02.2026
TLDR: This article from AI Supremacy, featuring the work of independent AI researcher Benjamin Marie, walks through running large language models locally using LM Studio. It covers everything from installation to understanding the hardware math behind model performance, making local AI accessible to non-engineers.
Summary:
Look, there is something deeply satisfying about running an AI model on your own hardware. No cloud dependency, no API keys, no sending your prompts to someone else's server. This article, published on AI Supremacy and drawing heavily on the expertise of independent researcher Benjamin Marie, is essentially a beginner's on-ramp to that experience. And honestly, it is about time someone wrote this up properly.
The piece starts by introducing Benjamin Marie, who runs two newsletters, The Kaitchup and The Salt, both focused on practical, hands-on AI work. The Kaitchup in particular publishes weekly tutorials on adapting language models to your own tasks and hardware, with over 160 AI notebooks available. The Salt takes a more research-oriented angle, distilling bleeding-edge AI papers into digestible summaries. The author clearly has enormous respect for Marie's work, and positions him as someone whose opinions on new models carry real weight because they come from direct, hands-on experience rather than armchair speculation.
The core of the article is a practical walkthrough of getting LLMs running locally using LM Studio. What used to require wrestling with CUDA, dealing with scattered model formats, and a whole lot of trial-and-error is now surprisingly approachable. Tools like LM Studio and Ollama have abstracted away the painful parts. You download the app, pick a model, click a few buttons, and you are chatting with an AI running entirely on your machine. The article covers the memory math behind model sizes, which is actually simpler than most people think, how to pick trustworthy GGUF builds and compression levels, and how to sanity-check whether a model is actually giving you reasonable output.
What I appreciate about this piece is that it does not try to turn you into a machine learning engineer. It explicitly states that the goal is to give you enough intuition to choose models confidently and understand what LM Studio is telling you. That is exactly the right framing. Too many tutorials in this space either oversimplify to the point of uselessness or drown you in implementation details that only matter if you are fine-tuning models for production.
There is one area the article touches on that deserves more scrutiny though. It mentions that "thinking" models can be dramatically better on hard prompts but noticeably slower. This is true, but the article does not really dig into when you actually need a thinking model versus when a faster, smaller model would serve you just as well. For most local use cases, people are doing summarization, drafting, and simple Q&A, where the smaller, faster models are more than adequate. The article could have spent more time helping readers understand that running the biggest model your hardware can handle is not always the right answer. Sometimes the right model is the one that responds in two seconds instead of thirty.
Key takeaways:
Tradeoffs: