Enhanced Speech Recognition is the optional, downloadable speech model that makes Windows 11 voice typing more accurate. You enable it in Settings > Time & language > Speech, where you select Download to install the recognition resource for your language. It is available on every Windows 11 PC, is required for dictation to start, and is distinct from Fluid Dictation, which needs Copilot+ hardware.
Introduction
If you have pressed Windows + H, watched the microphone panel appear, and then found that nothing was being transcribed, the missing piece is almost always Enhanced Speech Recognition for Windows 11. This optional download is the recognition model that powers accurate voice typing — and many users never realise they need to install it.
This guide explains what Enhanced Speech Recognition is, how to enable and download it step by step, how much accuracy it adds, and where it still falls short. We will also clarify the common confusion between Enhanced Speech Recognition (every PC) and Fluid Dictation (Copilot+ PCs only), and show when an offline alternative makes more sense.
What is Enhanced Speech Recognition in Windows 11?
Enhanced Speech Recognition is the downloadable language resource that Windows 11 uses to convert your speech into text during voice typing. It is an optional component you install per language, and without it dictation will not start even when your microphone is working.
In plain terms, it is the speech recognition model behind the Win+H toolbar. Microsoft ships Windows 11 with minimal speech components, then lets you download the fuller recognition resource for whichever display language you use. Once installed, voice typing transcribes more reliably and supports the auto-punctuation and voice commands you expect.
Key facts about Enhanced Speech Recognition:
- It is an optional download, not enabled by default on every install
- It is installed per language (English, French, German, and so on)
- It is available to all Windows 11 PCs — no special hardware required
- It is required for voice typing to actually transcribe speech
- It is separate from Fluid Dictation, the Copilot+ rewrite feature
Enhanced Speech Recognition vs Voice Typing: what’s the difference?
Voice typing is the feature (the Win+H toolbar). Enhanced Speech Recognition is the model that voice typing depends on. You can think of voice typing as the engine and Enhanced Speech Recognition as the fuel — the engine turns over, but it cannot run without it.
This distinction matters because Windows surfaces them in different places. The toolbar lives wherever you type; the model lives in Settings > Time & language > Speech.
How do I download and enable Enhanced Speech Recognition?
Open Settings > Time & language > Speech, then select Download next to Enhanced Speech Recognition (or download the speech pack for your language). You need an internet connection, and you should restart the PC once the download finishes.
Here is the full process, step by step:
- Open Settings (Windows + I)
- Go to Time & language > Speech
- Under the Speech recognition section, locate Enhanced Speech Recognition
- Select Download — Windows fetches the recognition resource for your active display language
- Wait for the download to complete (a few hundred megabytes, depending on language and connection speed)
- Restart your PC so voice typing picks up the new model
- Press Windows + H in any text field to start dictating
If you do not see the model for the language you want, add that language first under Settings > Time & language > Language & region > Add a language, then return to the Speech page and download its recognition resource.
What if the download fails or dictation still won’t start?
A failed download or stalled dictation usually traces back to one of three causes: a missing language pack, a paused download, or an OEM shortcut conflict. Address them in that order.
- Missing language resource — re-open Settings > Time & language > Speech and confirm the download finished, then restart
- Active language mismatch — switch to the installed language with Windows + Spacebar before pressing Win+H
- Shortcut conflict — disable vendor utilities (HP, Dell, Lenovo, ASUS) that may capture the H key or require the Fn modifier
For a deeper walkthrough of the toolbar itself — settings, voice commands, and language switching — see our complete Windows 11 dictation toolbar guide.
How much accuracy does Enhanced Speech Recognition add?
With the Enhanced Speech Recognition model installed and a clear microphone, Windows 11 voice typing reaches roughly 85-90% accuracy for conversational English. Without it, dictation either fails to start or relies on minimal recognition that misreads far more words.
The accuracy gain comes from the fuller acoustic and language model that the download provides. Combined with auto-punctuation — which you enable from the toolbar’s gear icon — the result is usable for emails, notes, drafts, and casual writing.
| Aspect | Without Enhanced model | With Enhanced Speech Recognition |
|---|---|---|
| Dictation starts | Often fails | Yes |
| Conversational accuracy | Poor / minimal | ~85-90% |
| Auto-punctuation | Limited | Full support |
| Voice commands | Unreliable | Reliable |
| Technical vocabulary | Weak | Still weak (no custom dictionary) |
Accuracy still drops sharply for proper nouns, brand names, medical terms, legal citations, and programming identifiers, because Windows 11 voice typing has no user-editable dictionary. To understand the factors that drive recognition quality across systems, read our analysis of voice dictation accuracy and speech recognition.
Is Enhanced Speech Recognition the same as Fluid Dictation?
No — and conflating the two is the single most common mistake. Enhanced Speech Recognition runs on any Windows 11 PC and improves transcription accuracy. Fluid Dictation runs only on Copilot+ PCs and rewrites grammar, punctuation, and filler words after transcription.
| Feature | Enhanced Speech Recognition | Fluid Dictation |
|---|---|---|
| Hardware required | Any Windows 11 PC | Copilot+ PC (40+ TOPS NPU) |
| What it does | Improves recognition accuracy | Rewrites grammar & filler words |
| Where to get it | Settings > Speech > Download | Ships automatically on Copilot+ |
| Processing | Recognition resource on device; Win+H still uses Azure online | On-device small language models |
| Availability | Every user | Copilot+ owners only |
If your PC is a standard (non-Copilot+) machine, Enhanced Speech Recognition is the best native accuracy you can get — Fluid Dictation simply is not available to you, regardless of settings.
Does Enhanced Speech Recognition work offline?
Not fully. The downloaded recognition resources live on your device, but standard Windows 11 voice typing (Win+H) still routes audio through Microsoft’s online Azure speech services and requires an active internet connection. Enhanced Speech Recognition improves accuracy and is required for dictation to function — but it does not make Win+H a private, offline tool.
This is an important privacy nuance. Even with the model downloaded locally, your dictated audio can still leave your device for cloud processing. For professionals handling confidential material — doctors, lawyers, journalists, consultants — that is a hard limitation.
When you need genuinely offline dictation
For fully on-device transcription with no cloud round-trip, you need a local-only application rather than the native toolbar. This is precisely the gap Weesper Neon Flow fills: it processes speech entirely on your device using local Whisper-class models, so audio never leaves your computer.
| Capability | Windows 11 Voice Typing | Weesper Neon Flow |
|---|---|---|
| Price | Free | 5 EUR / month |
| Recognition model | Enhanced Speech Recognition (download) | Local Whisper-class model |
| Processing | Online (Azure) for Win+H | 100% on-device |
| Internet required | Yes | No |
| Custom vocabulary | None | Yes (custom prompts) |
| AI rewrite on any PC | No (Copilot+ only) | Yes |
| Works on macOS | No | Yes (Metal-accelerated) |
| Privacy | Audio sent to Microsoft | Audio stays local |
For the full technical comparison of local versus cloud transcription — latency, accuracy, and energy use — see our breakdown of on-device versus cloud transcription. The short version: a Whisper-class model on consumer hardware now matches cloud accuracy with strictly better privacy.
When should you use Enhanced Speech Recognition vs an alternative?
Use Enhanced Speech Recognition when you want free, native voice typing on Windows 11 for everyday, non-sensitive writing. Choose an offline alternative when privacy, custom vocabulary, cross-platform support, or sustained professional use matters more than zero cost.
Enhanced Speech Recognition is the right choice if you:
- Dictate casual emails, notes, and search queries
- Have a reliable internet connection
- Do not handle confidential or regulated content
- Mostly use common, everyday vocabulary
A dedicated tool like Weesper Neon Flow is the better fit if you:
- Need transcription that never sends audio to the cloud
- Work in a specialist domain with technical terminology
- Switch between Windows and macOS
- Want AI-quality rewriting without buying a Copilot+ PC
If you already followed our Windows 11 voice dictation setup guide and found the native experience limiting, the offline route is the logical next step.
Try Weesper Neon Flow free for 15 days — fully on-device, no cloud account, works on Windows and macOS today.
Conclusion: get the model, then decide if it’s enough
Enhanced Speech Recognition is the download that turns Windows 11 voice typing from “won’t start” into “good enough for everyday dictation.” Install it from Settings > Time & language > Speech, restart, enable auto-punctuation, and you will reach roughly 85-90% accuracy on conversational English at no cost.
But know its boundaries: it does not provide custom vocabulary, it does not make Win+H offline, and it does not unlock Fluid Dictation on standard hardware. If you dictate for hours, handle sensitive material, or need domain-specific accuracy, the native model alone will not get you there.
Ready to compare? Download Weesper Neon Flow and run it side-by-side with Windows voice typing on your next dictation task. The free trial works on macOS and Windows, processes everything on-device, and requires no cloud account.