Google’s Gemma 4 12B is one of the most capable open models you can run on a regular laptop right now. Not a server. Not a cloud subscription. Your actual laptop — as long as it has 16GB of RAM.
This guide walks you through the full setup using Ollama, the easiest way to run AI models locally. No Python environment. No CUDA headaches. Just a few terminal commands and you’re talking to the model in under 10 minutes.
What you need before you start
Gemma 4 12B at full precision needs around 26.7GB of memory, which is a lot. But with 4-bit quantization (the default in Ollama), you can get it down to roughly 6.7GB VRAM or 13–16GB system RAM.
Here’s the minimum you need:
- 16GB of system RAM (works fine without a GPU using CPU inference)
- 8–12GB VRAM if you want GPU acceleration for faster responses
- About 8GB of free disk space for the model download
- Windows 10/11, macOS 12+, or any modern Linux distro
No GPU? You can still run it on CPU. Responses will be slower (expect 3–6 tokens per second), but it works.
Step 1: Install Ollama
Head to ollama.com/download and grab the installer for your OS.
Windows: Download OllamaSetup.exe and run it. It installs like any normal app and starts a background service automatically.
macOS: Download the zip, unzip it, and drag the Ollama app into your Applications folder. Launch it and you’ll see the alpaca icon in your menu bar.
Linux: Open a terminal and run this single command:
curl -fsSL https://ollama.com/install.sh | shOnce installed, confirm it’s working:
ollama --versionYou should see a version number. If you do, you’re ready.
Step 2: Pull the Gemma 4 12B model
This is the one command that downloads everything you need:
ollama pull gemma4:12bThe download is around 7–8GB, so it’ll take a few minutes depending on your connection. Ollama shows a progress bar while it runs. Let it finish before moving on.
If you’re not sure which model size fits your hardware, here’s the quick reference:
gemma4:e2b— runs on 4–8GB RAM, fastest, least capablegemma4:e4b— runs on 8GB+ RAM, good all-roundergemma4:12b— runs on 16GB RAM, best quality-to-hardware ratiogemma4:26b— needs 16GB+ VRAM, near-frontier quality
The 12B hits the sweet spot for most people with a standard developer machine.
Step 3: Run the model
Once the download finishes, start a chat session with:
ollama run gemma4:12bOllama loads the model and drops you into an interactive prompt. Type anything and hit Enter. Your first response will take a few extra seconds while it loads into memory. After that, it’s fast.
To exit the session, type /bye and press Enter.
Step 4: Use the API (optional but useful)
Ollama also runs a local REST API at http://localhost:11434 automatically. This means you can connect it to tools like Open WebUI, Obsidian, or any app that supports custom OpenAI-compatible endpoints.
To test it from a second terminal window:
curl http://localhost:11434/api/generate -d '{
"model": "gemma4:12b",
"prompt": "Explain what an LLM is in two sentences.",
"stream": false
}'You’ll get a JSON response with the model’s output. From here, you can build scripts, connect to frontends, or integrate Gemma 4 into your own apps.
What Gemma 4 12B is actually good at
After testing it, here’s what stood out:
- Coding: It writes clean Python and JavaScript. Not perfect, but genuinely useful for everyday tasks
- Image understanding: You can pass an image path and ask questions about it (supported via Ollama’s multimodal API)
- Long context: The model supports up to 128K tokens, so you can paste large documents without truncation issues
- Instruction following: It respects system prompts well, which makes it easy to customize for specific roles or workflows
Where it struggles: very long multi-step reasoning chains and tasks that need real-time information. For those, you still want a cloud model. For everything local and private, the 12B delivers.
Upgrade your setup with Open WebUI
The terminal works fine, but if you want a ChatGPT-style browser interface, install Open WebUI. It connects directly to your local Ollama instance and gives you chat history, model switching, and file uploads — all running 100% offline.
Install it with Docker in one command:
docker run -d -p 3000:80 --add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui ghcr.io/open-webui/open-webui:mainThen open http://localhost:3000 in your browser. Select Gemma 4 12B from the model dropdown and you’re running a private, local AI assistant with a full UI.
Local AI is no longer a hobbyist experiment. A 12B model running on a regular laptop in 2026 is genuinely useful for daily work — and Gemma 4 12B is one of the best options to start with. Download Ollama today at ollama.com and have it running before dinner.






