How to run Gemma 4 12B locally with Ollama

Google’s Gemma 4 12B is one of the most capable open models you can run on a regular laptop right now. Not a server. Not a cloud subscription. Your actual laptop — as long as it has 16GB of RAM.

This guide walks you through the full setup using Ollama, the easiest way to run AI models locally. No Python environment. No CUDA headaches. Just a few terminal commands and you’re talking to the model in under 10 minutes.

What you need before you start

Gemma 4 12B at full precision needs around 26.7GB of memory, which is a lot. But with 4-bit quantization (the default in Ollama), you can get it down to roughly 6.7GB VRAM or 13–16GB system RAM.

Here’s the minimum you need:

16GB of system RAM (works fine without a GPU using CPU inference)
8–12GB VRAM if you want GPU acceleration for faster responses
About 8GB of free disk space for the model download
Windows 10/11, macOS 12+, or any modern Linux distro

No GPU? You can still run it on CPU. Responses will be slower (expect 3–6 tokens per second), but it works.

Step 1: Install Ollama

Head to ollama.com/download and grab the installer for your OS.

Windows: Download OllamaSetup.exe and run it. It installs like any normal app and starts a background service automatically.

macOS: Download the zip, unzip it, and drag the Ollama app into your Applications folder. Launch it and you’ll see the alpaca icon in your menu bar.

Linux: Open a terminal and run this single command:

curl -fsSL https://ollama.com/install.sh | sh

Once installed, confirm it’s working:

ollama --version

You should see a version number. If you do, you’re ready.

Step 2: Pull the Gemma 4 12B model

This is the one command that downloads everything you need:

ollama pull gemma4:12b

The download is around 7–8GB, so it’ll take a few minutes depending on your connection. Ollama shows a progress bar while it runs. Let it finish before moving on.

If you’re not sure which model size fits your hardware, here’s the quick reference:

gemma4:e2b — runs on 4–8GB RAM, fastest, least capable
gemma4:e4b — runs on 8GB+ RAM, good all-rounder
gemma4:12b — runs on 16GB RAM, best quality-to-hardware ratio
gemma4:26b — needs 16GB+ VRAM, near-frontier quality

The 12B hits the sweet spot for most people with a standard developer machine.

Step 3: Run the model

Once the download finishes, start a chat session with:

ollama run gemma4:12b

Ollama loads the model and drops you into an interactive prompt. Type anything and hit Enter. Your first response will take a few extra seconds while it loads into memory. After that, it’s fast.

To exit the session, type /bye and press Enter.

Step 4: Use the API (optional but useful)

Ollama also runs a local REST API at http://localhost:11434 automatically. This means you can connect it to tools like Open WebUI, Obsidian, or any app that supports custom OpenAI-compatible endpoints.

To test it from a second terminal window:

curl http://localhost:11434/api/generate -d '{
  "model": "gemma4:12b",
  "prompt": "Explain what an LLM is in two sentences.",
  "stream": false
}'

You’ll get a JSON response with the model’s output. From here, you can build scripts, connect to frontends, or integrate Gemma 4 into your own apps.

What Gemma 4 12B is actually good at

After testing it, here’s what stood out:

Coding: It writes clean Python and JavaScript. Not perfect, but genuinely useful for everyday tasks
Image understanding: You can pass an image path and ask questions about it (supported via Ollama’s multimodal API)
Long context: The model supports up to 128K tokens, so you can paste large documents without truncation issues
Instruction following: It respects system prompts well, which makes it easy to customize for specific roles or workflows

Where it struggles: very long multi-step reasoning chains and tasks that need real-time information. For those, you still want a cloud model. For everything local and private, the 12B delivers.

Upgrade your setup with Open WebUI

The terminal works fine, but if you want a ChatGPT-style browser interface, install Open WebUI. It connects directly to your local Ollama instance and gives you chat history, model switching, and file uploads — all running 100% offline.

Install it with Docker in one command:

docker run -d -p 3000:80 --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui ghcr.io/open-webui/open-webui:main

Then open http://localhost:3000 in your browser. Select Gemma 4 12B from the model dropdown and you’re running a private, local AI assistant with a full UI.

Local AI is no longer a hobbyist experiment. A 12B model running on a regular laptop in 2026 is genuinely useful for daily work — and Gemma 4 12B is one of the best options to start with. Download Ollama today at ollama.com and have it running before dinner.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Real Tech. No Filter

How to run Gemma 4 12B locally with Ollama

What you need before you start

Step 1: Install Ollama

Step 2: Pull the Gemma 4 12B model

Step 3: Run the model

Step 4: Use the API (optional but useful)

What Gemma 4 12B is actually good at

Upgrade your setup with Open WebUI

Tags:

How to run Gemma 4 locally with Unsloth AI

Aji

Leave a Reply Cancel reply

NVIDIA GeForce RTX 5090 vs RTX 4090: A Comprehensive Performance Analysis

Google’s AI Search Faces Legal Reckoning: Antitrust, Copyright, and the Future of Online Search

The US just forced Anthropic to pull its most powerful AI models