← blog
April 5, 2026

Running local LLMs on a six-year-old Windows machine

The practical question with local AI models is not whether your machine can run them — most can. It is which model runs fast enough to actually be useful. Here is what works.

AIOllamaLocal LLMWindows

The practical question when it comes to local AI models is not whether your machine can run them. Most can. It is which model runs fast enough to be worth using. I built Oracle VII around Ollama, which handles the model management and inference side. Getting started takes a few commands.

bash
# Install Ollama from ollama.com, then:
ollama pull phi3      # 2.3B params — fast on almost anything
ollama pull mistral   # 7B params — good quality, needs ~8GB RAM
ollama pull llama3    # 8B params — best quality/speed balance
ollama serve          # start the local server

On a machine with no dedicated GPU (integrated Intel graphics), phi3 is the practical choice. Responses come in at roughly 10 to 15 tokens per second, which is fast enough for real back-and-forth conversation. Mistral drops to around 4 to 6 tokens per second on the same hardware, which starts to feel slow but is fine for one-off queries where you are not waiting on a response.

If you have an older Nvidia GPU (GTX 1060 or newer) you will get GPU offloading and things get significantly faster. Ollama handles this automatically if CUDA is available.

Models worth trying on modest hardware

  • phi3 (3.8B): Fast, surprisingly capable for its size. Good for factual questions and code completion.
  • mistral (7B): Better reasoning than phi3. Noticeably slower without a GPU.
  • gemma2 (2B): Google's small model. Very fast, decent quality.
  • codegemma (7B): Good for code tasks. Slower but worth it if that is your primary use case.

Why local models are different to use

The thing I keep coming back to when using Oracle VII is that it is just fundamentally different to use a model where nothing leaves your machine. No rate limits, no API costs, no terms of service to worry about, no context being logged somewhere. You can be experimental with it in a way you cannot with an API service.

The SQLite knowledge base in Oracle VII stores things you tell it to remember and can retrieve them in future sessions. It is not magic — it is keyword search over stored notes — but it is useful for building up context about ongoing projects without re-explaining everything each session.

← all postsprojects