Buyer’s Guide

What Is an Offline AI Computer?

A plain-language guide to local AI hardware – what it runs, what it costs, and what to look for before you buy.

Most people’s experience with AI is entirely cloud-dependent. You type something into ChatGPT or Claude, the request travels to a data center, a model processes it, and the response comes back. That works fine when you have a reliable internet connection and are comfortable with your prompts being processed on someone else’s servers. An offline AI computer is the alternative – a machine that runs the model locally, responds from its own hardware, and does not send anything anywhere.

This page covers what that actually means in practice: what hardware you need, what you can realistically run, and how the experience compares to the cloud AI most people are used to.

What “Running AI Locally” Actually Means

When someone says they are running AI locally, they mean a large language model – the same kind of software that powers ChatGPT – is installed on a computer they own and running on that machine’s processor. The model responds to your prompts without any internet connection involved.

The software that makes this possible has matured significantly. Ollama, one of the most widely used local AI tools, lets you download and run dozens of open-weight models with a single command. You pick a model, it downloads to your machine, and you interact with it through a browser or command line. No account, no API key, no usage limits.

The catch is hardware. Large language models are computationally expensive. Running them requires either a capable GPU, a lot of RAM, or both. The quality of the experience – how fast the model responds, how large a model you can run – depends almost entirely on your hardware.

The Hardware That Matters: Why GPU Acceleration Changes Everything

Inference speed – how many tokens per second a model generates – is the single most important factor in whether a local AI feels usable or frustrating. A model generating 5 tokens per second produces roughly one word at a time with a noticeable pause between each. At 30 tokens per second, responses stream naturally. At 100 tokens per second or above, local AI is faster than most people read.

CPUs alone are slow for AI inference. A modern Intel or AMD desktop processor can typically run a small 1 to 3 billion parameter model at 5 to 20 tokens per second. That is enough to verify that local AI works, but the experience is noticeably sluggish.

GPUs change the picture. Graphics processors are designed for the kind of parallel matrix math that AI inference requires. An NVIDIA RTX 3060 running a 7 billion parameter model produces around 75 tokens per second. AMD’s integrated Radeon 780M or 890M graphics – found in mid-range Ryzen 7 and Ryzen 9 mini PCs – can hit 30 to 55 tokens per second on the same model size, entirely from the integrated GPU with no discrete card required.

The Project NOMAD benchmark leaderboard tracks real-world results from over 1,270 community builds and illustrates this clearly. The average score across all submissions is 64.6. Builds using AMD Radeon 780M integrated graphics average 73.6. Radeon 890M builds average 76.3. RTX 3060 builds average 75.5. Dedicated GPU builds pull scores above 90, with inference speeds above 200 tokens per second on the right hardware.

What Hardware You Actually Need

Minimum Viable Hardware

A refurbished enterprise mini PC with an Intel processor and integrated graphics can run local AI at a basic level. Expect 5 to 20 tokens per second and model sizes capped at around 3 billion parameters. This is enough to understand what the technology does, but the experience is not what most people picture when they think of AI.

For this tier: budget $150 to $300 for hardware. Any modern mini PC with 16GB+ RAM and an NVMe drive will work.

AMD’s Ryzen 7 and Ryzen 9 processors from the Zen 4 generation include Radeon 780M or 890M integrated graphics – proper GPU compute that runs inference at a genuinely conversational pace. A mini PC in this range costs $400 to $700 and does not require any external GPU. The Minisforum AI X1 Pro (Ryzen AI 9 HX 370 with Radeon 890M) is verified by the NOMAD project at 51.7 tokens per second on 7B models.

At this hardware tier, running a 7 to 8 billion parameter model produces a response quality comparable to early-generation cloud AI products, delivered entirely from your own hardware.

Discrete GPU Builds

Adding an NVIDIA RTX GPU – a 3060, 4070, or 5070 – pushes inference speeds above 100 tokens per second and allows running larger models (13B and above) at high quality. The Minisforum MS-02 Ultra with an RTX 5060 is verified at 281.5 tokens per second. This tier is overkill for most use cases but meaningfully better for anyone doing extended research or working with long documents.

The Model Size Question

Most people asking about offline AI computers ultimately want to know: how good is the AI, really?

The honest answer is that it depends on what you compare it to. A 7 billion parameter model running locally is not GPT-4 or Claude Sonnet. It is closer in capability to what frontier models were doing two to three years ago – quite capable for summarization, question answering, drafting, and explanation, noticeably weaker on complex reasoning or tasks requiring large context windows.

A 13B or 70B model on appropriate hardware closes that gap meaningfully. Meta’s Llama 3 70B running on a high-end GPU rig performs respectably against current cloud models for many everyday tasks. The tradeoff is hardware cost.

For most people’s practical needs – answering questions, summarizing documents, explaining topics, helping with writing – a 7 or 8B model on 30+ tok/s hardware is more than sufficient.

Privacy and the Case for Local AI

Beyond the practical performance question, there is a real privacy argument for local AI that is worth taking seriously.

When you use a cloud AI service, your prompts are sent to and processed on someone else’s infrastructure. The terms of service vary by provider, but the structural reality is that you do not control where your data goes, how long it is retained, or what it is used for.

Running AI locally means your prompts stay on your hardware. A question about a medical condition, a legal situation, a financial problem, or anything else you would rather keep private is answered on your machine and goes nowhere. This is not a fringe concern – it is a legitimate reason many professionals, researchers, and privacy-conscious households are moving to local AI.

Offline AI as Part of a Broader Knowledge System

Most people who want an offline AI computer also want offline reference material alongside it – Wikipedia, medical guides, maps, educational content. Running these together on a single box turns a local AI computer into something more like a private knowledge infrastructure.

Project NOMAD is the platform that does this most completely. It bundles Ollama-powered local AI with offline Wikipedia (via Kiwix), global OpenStreetMap maps, and the full Khan Academy library (via Kolibri) into a single managed system. You access everything through a browser on your local network. The AI can answer questions, the library has the reference material, and the whole thing runs without any internet dependency.

The community around NOMAD is active, the software is free under the Apache 2.0 license, and the benchmark leaderboard provides a practical guide to what different hardware configurations actually deliver.

Building One Yourself vs. Buying Pre-Built

The DIY path for an offline AI computer is genuinely accessible for anyone comfortable with Linux. You source the hardware, install Ubuntu 22.04 or Debian 12, run the NOMAD install script, download your content libraries, and you have a functioning system. The barrier is time and comfort with the command line, not technical expertise beyond that.

The pre-built path exists for people who want the full system without the setup process. The Codex Standard is built on an AMD Ryzen 7 Zen 4 mini PC with Radeon 780M/890M graphics – the recommended hardware tier for NOMAD – configured, content-loaded, and benchmarked before shipping. It plugs into your router via Ethernet and shows up on your network like any other device.

Both paths lead to the same system. The question is how much your time is worth and how comfortable you are with a Linux setup.

What an Offline AI Computer Is Not

It is worth being clear about what this category does not include.

An offline AI computer is not a replacement for a cloud AI service if your work requires the most capable frontier models. GPT-4, Claude 3.5 Sonnet, and Gemini Ultra represent a capability tier that currently requires enormous infrastructure to run. Local AI on consumer hardware is not in the same category for complex reasoning tasks.

It is also not plug-and-play in the way a phone app is. Even the pre-built options require a router connection and a browser on the same network to access. Setup is minimal, but it is not zero.

What it is: a private, independent AI system that works without internet, responds at a conversational pace on the right hardware, and gives you complete control over your data and queries.


Ready to see what the pre-built NOMAD hardware options look like? The Codex Standard is built on the hardware tier NOMAD recommends for GPU-accelerated AI, configured and benchmarked before it ships.