To set up whisper.cpp, clone the repository from GitHub, build it with CMake, download a ggml model file (such as base or small), then run the command-line tool on a 16 kHz WAV file. On Apple Silicon Macs you can enable Metal acceleration for a 3x+ speed boost. The whole process takes about 15 minutes for a developer comfortable with the terminal.

Introduction

Running speech recognition locally has never been more practical. Whisper.cpp brings OpenAI’s Whisper model to your own machine with no cloud, no API keys, and no data leaving your device. This whisper.cpp setup guide walks through every step to run Whisper locally on both macOS and Windows.

We will clone the project, build it, download the ggml models, and transcribe a real audio file. This is a genuine local speech recognition setup tutorial — technical, but achievable in an afternoon.

By the end you will have a working offline transcriber. We will also be honest about the friction involved, and point to a packaged alternative for anyone who would rather it just worked.

What is whisper.cpp and why run Whisper locally?

Whisper.cpp is a high-performance C/C++ port of OpenAI’s Whisper speech recognition model that runs entirely offline. It needs no Python runtime and no internet connection once the model is downloaded.

Speech recognition is the process of converting spoken audio into written text. Whisper is the underlying neural model; whisper.cpp is the lightweight engine that runs it efficiently on consumer hardware.

Running it locally gives you three concrete advantages:

This is the same approach we explored in our deeper look at edge AI and local processing, where on-device inference replaces the cloud round-trip entirely.

How do you set up whisper.cpp on macOS?

On macOS you clone the repo, build with CMake, and download a model — three commands and you are transcribing. Apple Silicon Macs get the best results thanks to Metal and Neural Engine acceleration.

Step 1: Install the build tools

You need Xcode command-line tools and CMake. Install them with Homebrew:

xcode-select --install
brew install cmake

Step 2: Clone and build

Clone the repository and compile it with CMake. The build produces a whisper-cli binary inside the build directory.

git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build
cmake --build build --config Release

On Apple Silicon, Metal acceleration is enabled by default in recent builds. For an extra boost, you can build with Core ML so the encoder runs on the Apple Neural Engine, which the project reports can exceed a 3x speed-up over CPU alone.

Step 3: Download a ggml model

Models are distributed as ggml files — a single binary that bundles the weights, vocabulary, and mel filters. Use the included script to fetch one:

sh ./models/download-ggml-model.sh base.en

Swap base.en for small, medium, or large-v3 depending on the accuracy you need. Larger models are more accurate but slower and heavier on memory.

Step 4: Transcribe a file

Whisper.cpp expects a 16 kHz mono WAV file. Convert any audio with ffmpeg, then run the CLI:

ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav
./build/bin/whisper-cli -m models/ggml-base.en.bin -f output.wav

The transcript prints to your terminal. Add -otxt to save it as a text file.

How do you set up whisper.cpp on Windows?

On Windows the steps mirror macOS, but you build with Visual Studio’s compiler and the CMake tooling that ships with it. NVIDIA GPU owners can enable CUDA for faster transcription.

Step 1: Install prerequisites

Install these three components:

  1. Visual Studio 2022 with the “Desktop development with C++” workload
  2. CMake (bundled with Visual Studio or installed separately)
  3. ffmpeg for audio conversion, added to your PATH

Step 2: Clone and build

Open a “Developer Command Prompt for VS” and run:

git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build
cmake --build build --config Release

To enable NVIDIA GPU acceleration, add -DGGML_CUDA=1 to the first CMake command. You will need the CUDA Toolkit installed first.

Step 3: Download a model and transcribe

The model download script also works in a Git Bash or WSL shell:

sh ./models/download-ggml-model.sh base.en

Then convert and transcribe exactly as on macOS:

ffmpeg -i input.mp3 -ar 16000 -ac 1 output.wav
.\build\bin\Release\whisper-cli.exe -m models\ggml-base.en.bin -f output.wav

Which whisper.cpp model should you choose?

Choose your model by balancing accuracy against speed and memory. Smaller models transcribe faster and use less RAM; larger models are more accurate but heavier. The table below summarises the trade-offs.

ModelParametersApprox. RAMRelative speedBest for
tiny39M~1 GB~10xQuick tests, low-power devices
base74M~1 GB~7xGeneral use, fast drafts
small244M~2 GB~4xBalanced accuracy and speed
medium769M~5 GB~2xProfessional transcription
large-v31,550M~10 GB1x (baseline)Highest accuracy, multilingual

Whisper supports multilingual transcription across dozens of languages, though accuracy varies by language. For an English-only workflow, the .en model variants are smaller and often more accurate than their multilingual equivalents.

If raw throughput matters more than the ggml format, the faster-whisper project uses the CTranslate2 backend and reports up to 4x faster transcription than the original OpenAI implementation. We compared the wider model landscape in our breakdown of open-source speech models.

Not keen on managing model files yourself? You can try Weesper free for 15 days — it runs the same whisper.cpp engine with the right model preconfigured, no terminal required.

What are the limitations of a DIY whisper.cpp setup?

A self-built whisper.cpp setup is powerful but demands ongoing maintenance: you manage builds, model files, audio conversion, and updates yourself. It is a command-line tool, not a dictation app.

Be aware of these practical limits:

For developers and tinkerers, this control is the whole point. But if you simply want accurate offline dictation that works system-wide, the setup overhead is real. Our guide to the best offline speech recognition software compares packaged options for exactly this reason.

The packaged alternative: Weesper Neon Flow

If you want the power of whisper.cpp without the build process, Weesper Neon Flow packages it for you. It is the same open-source engine, configured with Metal acceleration, custom prompts, and 50+ languages, in a desktop app for 5 EUR/month.

Here is how the two approaches compare:

FeatureDIY whisper.cppWeesper Neon Flow
Enginewhisper.cppwhisper.cpp
Offline✅ 100%
Setup time~15+ min + maintenanceInstall and go
Metal accelerationManual build✅ Built in
Global dictation hotkey
Custom prompts
LanguagesModel-dependent50+
Audio conversionManual (ffmpeg)✅ Automatic
PriceFree (your time)5 EUR/month

Weesper keeps the same privacy guarantee — your audio never leaves your device — while removing the terminal work. You download the app once and dictate into any application with a keyboard shortcut, no WAV conversion required.

Conclusion

Whisper.cpp is a remarkable piece of open-source engineering: genuine, accurate, offline speech recognition that you fully control. For developers and privacy advocates willing to manage builds and model files, it is hard to beat.

If you would rather skip the setup and start dictating immediately, the same engine comes ready-to-use in Weesper. You can start a free 15-day trial or browse our Help Center documentation to see how it fits your workflow.

Ready to dictate offline? Get Weesper Neon Flow and run whisper.cpp without the command line — or read more on our blog about local AI and privacy-first transcription.