Running LLMs locally vs using APIs — what I learned
Most of my early experience with LLMs came from APIs.
You send a prompt, get a response, and everything feels instant. There’s no need to think about how the model is running, what resources it’s using, or what’s happening behind the scenes.
For a while, I assumed that was the whole picture.
What I thought before
I treated LLMs like simple tools:
- send input
- get output
- move on
The hard part seemed to be prompting, not the system behind it.
I didn’t think about:
- model size
- memory usage
- hardware constraints
- latency
Everything just worked.
What changed
Running a model locally changed that completely.
Instead of calling an API, I had to actually run the model myself.
That meant dealing with:
- model loading time
- CPU and RAM limits
- slow inference
- performance trade-offs
It became obvious that an LLM is not “magic” — it’s just a heavy process running on a machine.
APIs hide that.
Local setups expose it.
Local vs API
The difference is not just technical — it’s about how you think.
APIs
- easy to use
- fast
- no setup
- scales automatically
But:
- paid usage
- no control over the model
- everything is a black box
Local models
- full control
- no per-request cost
- can run offline
- customizable
But:
- slower (especially on CPU)
- dependent on hardware
- more setup and tuning required
The real difference
APIs optimize for convenience.
Local models optimize for control and understanding.
What was actually hard
The challenge wasn’t writing code.
It was understanding constraints.
- choosing the right model size
- dealing with slow responses
- managing memory and performance
- structuring a clean API layer
- debugging unexpected outputs
These are not problems you see when using APIs.
What I learned
A few things became clear very quickly:
- LLMs are not just “AI tools” — they are infrastructure problems
- Performance trade-offs are real (model size vs latency)
- Running models locally forces you to think like a system builder
The biggest shift was this:
I stopped thinking like a user and started thinking like someone running the system.
Closing
APIs made me comfortable with AI.
Running models locally made me understand it.