ThirdShift R&D

Local AI, straight answers

The questions we get asked most about running AI on your own hardware — answered from actually building and running it. No hype, no sales pitch.

Is it actually private, or does it phone home?

Your prompts and documents stay on your machine — that part's real. But "local" doesn't automatically mean airtight: the app running the model (Ollama, LM Studio) has normal network access, and a couple phone home metadata by default — LM Studio collects anonymous usage analytics, and it'll sync your device list if you let it. Not your prompts, but still. Genuine privacy is one step of config: turn off the tool's telemetry and keep it off the internet. Then pull the cable and it still answers — that's the test. It's also exactly how we build our boxes: nothing to turn off, because nothing's phoning home in the first place.

Can it use my own documents?

Yes — and that's the real unlock. You point it at your files (manuals, contracts, notes) and it answers from them, cited to the page, instead of guessing from its training. A grounded mid-size model reading your actual documents beats a giant one on the internet that's never seen them. (It's called RAG.)

What hardware do I need?

Less than you'd think to start. A modest gaming GPU (8–16GB) runs small models fine for learning. A 24GB card runs a 32B comfortably — that's our daily driver. The big unified-memory boxes are for loading huge models slowly. Don't overbuy for a 70B you won't actually use day-to-day.

Do I need internet?

Only once, to download the model. After that it runs fully offline — air-gap it and it keeps working. A dropped connection doesn't take your AI down with it.

Which app — Ollama or LM Studio?

Both run the same models; pick on how you like to work. LM Studio is the friendly GUI — install, click, chat — best for your first week. Ollama is what most people settle on once they want it scriptable: a REST API on by default, runs headless on a server, loads and unloads models on its own. Start with LM Studio to learn the feel, move to Ollama when you want to wire it into things.

Is it hard to set up?

The easy 90% is genuinely easy: install Ollama or LM Studio, download a model, start chatting — no terminal required. The hard 10% is squeezing specific hardware to its limit, and most people never need to touch that part.

Why does it forget what I told it a minute ago?

Almost never the model — it's the context window, and it's the most common "my local model is dumb" complaint there is. Out of the box, Ollama defaults context to 2048 tokens (num_ctx) — maybe 1,500 words of conversation plus the reply. Run past that and it quietly drops the oldest tokens, so the model can't see the start of what you were doing. It's a setting, not amnesia. Fix it in one line: in Ollama, /set parameter num_ctx 8192 (or PARAMETER num_ctx 8192 in a Modelfile); in LM Studio, the context-length slider. Set it to fit your work — bigger context costs VRAM, so match it to the job.

Which model should I run? Is it as good as ChatGPT?

Different tool, not a worse one. The "what model" answer is always two or three names that rotate every few months — right now (mid-2026) most people land on the Qwen 3.6 pair (27B dense for quality, 35B-A3B MoE for speed — on our own 24GB Arc B60 the MoE ran ~45 tok/s, nearly 3× the dense), or Mistral Small if you want lean. Don't chase the biggest number: a fast 32B does real work, and for anything that needs frontier-level reasoning the cloud still wins. Local wins on ownership, privacy, and knowing your own documents.

What's it actually good for?

Drafting and rewriting, summarizing long documents, answering from your own files, coding help, cleaning up messy data, and running quiet little automations. The bread-and-butter, all day, for the cost of electricity.

Why not just use ChatGPT, then?

Ownership and privacy. No subscription, no rate limits, no policy change breaking your setup, and your sensitive data never leaves the building. If none of that matters to you, use the cloud. If it does — and for real business work it usually does — that's the case for local.

How do I start?

Install LM Studio or Ollama, pull a small model, and point it at a few of your own documents. An hour with it on your own files will tell you more than any benchmark.