Enhanced Speech Recognition is the optional, downloadable speech model that makes Windows 11 voice typing more accurate. You enable it in Settings > Time & language > Speech, where you select Download to install the recognition resource for your language. It is available on every Windows 11 PC, is required for dictation to start, and is distinct from Fluid Dictation, which needs Copilot+ hardware.

Introduction

If you have pressed Windows + H, watched the microphone panel appear, and then found that nothing was being transcribed, the missing piece is almost always Enhanced Speech Recognition for Windows 11. This optional download is the recognition model that powers accurate voice typing — and many users never realise they need to install it.

This guide explains what Enhanced Speech Recognition is, how to enable and download it step by step, how much accuracy it adds, and where it still falls short. We will also clarify the common confusion between Enhanced Speech Recognition (every PC) and Fluid Dictation (Copilot+ PCs only), and show when an offline alternative makes more sense.

What is Enhanced Speech Recognition in Windows 11?

Enhanced Speech Recognition is the downloadable language resource that Windows 11 uses to convert your speech into text during voice typing. It is an optional component you install per language, and without it dictation will not start even when your microphone is working.

In plain terms, it is the speech recognition model behind the Win+H toolbar. Microsoft ships Windows 11 with minimal speech components, then lets you download the fuller recognition resource for whichever display language you use. Once installed, voice typing transcribes more reliably and supports the auto-punctuation and voice commands you expect.

Key facts about Enhanced Speech Recognition:

Enhanced Speech Recognition vs Voice Typing: what’s the difference?

Voice typing is the feature (the Win+H toolbar). Enhanced Speech Recognition is the model that voice typing depends on. You can think of voice typing as the engine and Enhanced Speech Recognition as the fuel — the engine turns over, but it cannot run without it.

This distinction matters because Windows surfaces them in different places. The toolbar lives wherever you type; the model lives in Settings > Time & language > Speech.

How do I download and enable Enhanced Speech Recognition?

Open Settings > Time & language > Speech, then select Download next to Enhanced Speech Recognition (or download the speech pack for your language). You need an internet connection, and you should restart the PC once the download finishes.

Here is the full process, step by step:

  1. Open Settings (Windows + I)
  2. Go to Time & language > Speech
  3. Under the Speech recognition section, locate Enhanced Speech Recognition
  4. Select Download — Windows fetches the recognition resource for your active display language
  5. Wait for the download to complete (a few hundred megabytes, depending on language and connection speed)
  6. Restart your PC so voice typing picks up the new model
  7. Press Windows + H in any text field to start dictating

If you do not see the model for the language you want, add that language first under Settings > Time & language > Language & region > Add a language, then return to the Speech page and download its recognition resource.

What if the download fails or dictation still won’t start?

A failed download or stalled dictation usually traces back to one of three causes: a missing language pack, a paused download, or an OEM shortcut conflict. Address them in that order.

For a deeper walkthrough of the toolbar itself — settings, voice commands, and language switching — see our complete Windows 11 dictation toolbar guide.

How much accuracy does Enhanced Speech Recognition add?

With the Enhanced Speech Recognition model installed and a clear microphone, Windows 11 voice typing reaches roughly 85-90% accuracy for conversational English. Without it, dictation either fails to start or relies on minimal recognition that misreads far more words.

The accuracy gain comes from the fuller acoustic and language model that the download provides. Combined with auto-punctuation — which you enable from the toolbar’s gear icon — the result is usable for emails, notes, drafts, and casual writing.

AspectWithout Enhanced modelWith Enhanced Speech Recognition
Dictation startsOften failsYes
Conversational accuracyPoor / minimal~85-90%
Auto-punctuationLimitedFull support
Voice commandsUnreliableReliable
Technical vocabularyWeakStill weak (no custom dictionary)

Accuracy still drops sharply for proper nouns, brand names, medical terms, legal citations, and programming identifiers, because Windows 11 voice typing has no user-editable dictionary. To understand the factors that drive recognition quality across systems, read our analysis of voice dictation accuracy and speech recognition.

Is Enhanced Speech Recognition the same as Fluid Dictation?

No — and conflating the two is the single most common mistake. Enhanced Speech Recognition runs on any Windows 11 PC and improves transcription accuracy. Fluid Dictation runs only on Copilot+ PCs and rewrites grammar, punctuation, and filler words after transcription.

FeatureEnhanced Speech RecognitionFluid Dictation
Hardware requiredAny Windows 11 PCCopilot+ PC (40+ TOPS NPU)
What it doesImproves recognition accuracyRewrites grammar & filler words
Where to get itSettings > Speech > DownloadShips automatically on Copilot+
ProcessingRecognition resource on device; Win+H still uses Azure onlineOn-device small language models
AvailabilityEvery userCopilot+ owners only

If your PC is a standard (non-Copilot+) machine, Enhanced Speech Recognition is the best native accuracy you can get — Fluid Dictation simply is not available to you, regardless of settings.

Does Enhanced Speech Recognition work offline?

Not fully. The downloaded recognition resources live on your device, but standard Windows 11 voice typing (Win+H) still routes audio through Microsoft’s online Azure speech services and requires an active internet connection. Enhanced Speech Recognition improves accuracy and is required for dictation to function — but it does not make Win+H a private, offline tool.

This is an important privacy nuance. Even with the model downloaded locally, your dictated audio can still leave your device for cloud processing. For professionals handling confidential material — doctors, lawyers, journalists, consultants — that is a hard limitation.

When you need genuinely offline dictation

For fully on-device transcription with no cloud round-trip, you need a local-only application rather than the native toolbar. This is precisely the gap Weesper Neon Flow fills: it processes speech entirely on your device using local Whisper-class models, so audio never leaves your computer.

CapabilityWindows 11 Voice TypingWeesper Neon Flow
PriceFree5 EUR / month
Recognition modelEnhanced Speech Recognition (download)Local Whisper-class model
ProcessingOnline (Azure) for Win+H100% on-device
Internet requiredYesNo
Custom vocabularyNoneYes (custom prompts)
AI rewrite on any PCNo (Copilot+ only)Yes
Works on macOSNoYes (Metal-accelerated)
PrivacyAudio sent to MicrosoftAudio stays local

For the full technical comparison of local versus cloud transcription — latency, accuracy, and energy use — see our breakdown of on-device versus cloud transcription. The short version: a Whisper-class model on consumer hardware now matches cloud accuracy with strictly better privacy.

When should you use Enhanced Speech Recognition vs an alternative?

Use Enhanced Speech Recognition when you want free, native voice typing on Windows 11 for everyday, non-sensitive writing. Choose an offline alternative when privacy, custom vocabulary, cross-platform support, or sustained professional use matters more than zero cost.

Enhanced Speech Recognition is the right choice if you:

A dedicated tool like Weesper Neon Flow is the better fit if you:

If you already followed our Windows 11 voice dictation setup guide and found the native experience limiting, the offline route is the logical next step.

Try Weesper Neon Flow free for 15 days — fully on-device, no cloud account, works on Windows and macOS today.

Conclusion: get the model, then decide if it’s enough

Enhanced Speech Recognition is the download that turns Windows 11 voice typing from “won’t start” into “good enough for everyday dictation.” Install it from Settings > Time & language > Speech, restart, enable auto-punctuation, and you will reach roughly 85-90% accuracy on conversational English at no cost.

But know its boundaries: it does not provide custom vocabulary, it does not make Win+H offline, and it does not unlock Fluid Dictation on standard hardware. If you dictate for hours, handle sensitive material, or need domain-specific accuracy, the native model alone will not get you there.

Ready to compare? Download Weesper Neon Flow and run it side-by-side with Windows voice typing on your next dictation task. The free trial works on macOS and Windows, processes everything on-device, and requires no cloud account.