Whisper for Medical Dictation: How to Use OpenAI's Speech Model Clinically
OpenAI's Whisper is the best open-source speech model available, but using it for medical dictation requires medical-vocabulary biasing, push-to-talk workflow, and on-device hosting. Sapience Med wraps Whisper-large-v3-turbo with 2,500+ medication names, DSM-5 terminology, and Metal/Vulkan GPU acceleration so it runs locally with HIPAA-friendly architecture for therapists and psychiatrists.
What is OpenAI Whisper?
Whisper is an automatic speech recognition (ASR) model released by OpenAI in 2022 and continuously updated since. The model weights are open-source under the MIT license, which means anyone can run it locally. Whisper comes in several sizes — tiny, base, small, medium, large, large-v2, large-v3, large-v3-turbo — trading off accuracy against compute requirements.
The breakthrough with Whisper was accuracy. On general English dictation, large-v3 reaches word error rates competitive with commercial cloud APIs, and because it runs locally you can use it without sending audio off-device. For privacy-sensitive domains (legal, medical, journalism) this is a meaningful architectural advantage.
Can I use Whisper directly for medical dictation?
You can — but with two practical caveats. Whisper is trained on a broad web corpus, so it knows medical terms only as frequently as they appear in general internet text. Common medications (Aspirin, Tylenol, Ibuprofen) it handles well. Psychiatric medications like Vraylar, Latuda, Lamotrigine, or Vyvanse it routinely mis-transcribes into phonetic neighbors. DSM-5 terminology and clinical abbreviations (MSE, HPI, SI/HI, PHQ-9, GAD-7) are also inconsistently recognized.
The second caveat is workflow. Whisper itself is a model — to use it as a dictation tool you need an application layer around it: a push-to-talk hotkey, audio capture, the runtime (PyTorch, whisper.cpp, ONNX), text injection into your EHR or notes field, and ideally a custom-vocabulary bias for medical terms. Without that layer, you have a transcription engine but not a dictation tool.
How Sapience Med uses Whisper plus medical tuning
Sapience Med runs a quantized Whisper-large-v3-turbo model on-device — Metal-accelerated on Apple Silicon, Vulkan-accelerated on Windows GPUs. The model itself is unchanged from the public OpenAI release; we add three layers on top for clinical use.
Custom-vocabulary biasing.A curated dictionary of 2,500+ medication names, DSM-5 terms, and clinical abbreviations is biased into the recognition pass. This shifts the model's priors so phonetically-ambiguous tokens ("lamictal" vs "lamictal") resolve to the correct medical spelling.
Push-to-talk hotkey and text injection. A configurable hotkey (Option+Space default on Mac) starts and stops recording. When you release, the transcribed text types into whatever text field is focused — EHR (SimplePractice, TherapyNotes), Apple Notes, Word, Gmail, anywhere. No per-application integration needed.
Filler stripping and clinical formatting.Common filler words ("um", "uh", false-start repetitions) are stripped by default. The clinician can also define personal abbreviations ("hpi" expands to "History of Present Illness") that fire during typing.
Why on-device Whisper matters for medical use
The reason to use a local Whisper-based tool rather than a cloud speech API is HIPAA. Medical dictation — even of progress notes the clinician is writing themselves, not session audio — generally contains PHI (client identifiers, presenting problems, diagnostic impressions, medication decisions). When that audio is processed by a third party, that third party becomes a HIPAA Business Associate and a BAA is required.
Cloud speech APIs (OpenAI's hosted Whisper API, AssemblyAI, Deepgram, Google Speech-to-Text) do offer enterprise tiers with BAA execution, but the operational overhead is real: BAA negotiation, vendor risk assessment, ongoing compliance review. For a solo practice or small group, on-device Whisper sidesteps the entire question — there is no third party in the audio path.
See our HIPAA-friendly dictation page for the full architectural argument.
Can I just roll my own Whisper-based dictation tool?
Yes, if you are comfortable with command-line tools. whisper.cpp (the C/C++ port) runs Whisper models on Mac and Windows with a few hundred MB of model weights and reasonable CPU/GPU usage. Open-source dictation wrappers (whisper-typer, BetterDictation) wire whisper.cpp to a hotkey and text-injection pipeline.
The work that Sapience Med has done on top of that base — medical vocabulary curation, low-latency on-device inference tuning, text injection compatibility across Electron and web-based EHRs, license/billing infrastructure for clinical deployment, ongoing model updates — adds up to roughly a person-year of engineering. For a clinician who wants to spend their evenings clinically rather than on yak-shaving a custom dictation tool, paying $399 per year for a maintained product is the simpler path.
Frequently asked questions
Is Whisper accurate enough for medical dictation?
Does Sapience Med use the cloud Whisper API or local model?
Why not just use the OpenAI Whisper API?
What about Whisper on iPhone — does Sapience Med work on iPad / iPhone?
Is the medical vocabulary biasing visible — can I add my own terms?
What hardware do I need to run Whisper-grade dictation locally?
Try Sapience Med free for 14 days.
$45/month or $399/year (save 24%) after the trial. No card required to start.