Whisper for Medical Dictation: How to Use OpenAI's Speech Model Clinically

Quick answer

OpenAI's Whisper is the best open-source speech model available, but using it for medical dictation requires medical-vocabulary biasing, push-to-talk workflow, and on-device hosting. Sapience Med wraps Whisper-large-v3-turbo with 2,500+ medication names, DSM-5 terminology, and Metal/Vulkan GPU acceleration so it runs locally with HIPAA-friendly architecture for therapists and psychiatrists.

Download for Mac Download for Windows14-day free trial · No card required

What is OpenAI Whisper?

Whisper is an automatic speech recognition (ASR) model released by OpenAI in 2022 and continuously updated since. The model weights are open-source under the MIT license, which means anyone can run it locally. Whisper comes in several sizes — tiny, base, small, medium, large, large-v2, large-v3, large-v3-turbo — trading off accuracy against compute requirements.

The breakthrough with Whisper was accuracy. On general English dictation, large-v3 reaches word error rates competitive with commercial cloud APIs, and because it runs locally you can use it without sending audio off-device. For privacy-sensitive domains (legal, medical, journalism) this is a meaningful architectural advantage.

Can I use Whisper directly for medical dictation?

You can — but with two practical caveats. Whisper is trained on a broad web corpus, so it knows medical terms only as frequently as they appear in general internet text. Common medications (Aspirin, Tylenol, Ibuprofen) it handles well. Psychiatric medications like Vraylar, Latuda, Lamotrigine, or Vyvanse it routinely mis-transcribes into phonetic neighbors. DSM-5 terminology and clinical abbreviations (MSE, HPI, SI/HI, PHQ-9, GAD-7) are also inconsistently recognized.

The second caveat is workflow. Whisper itself is a model — to use it as a dictation tool you need an application layer around it: a push-to-talk hotkey, audio capture, the runtime (PyTorch, whisper.cpp, ONNX), text injection into your EHR or notes field, and ideally a custom-vocabulary bias for medical terms. Without that layer, you have a transcription engine but not a dictation tool.

How Sapience Med uses Whisper plus medical tuning

Sapience Med runs a quantized Whisper-large-v3-turbo model on-device — Metal-accelerated on Apple Silicon, Vulkan-accelerated on Windows GPUs. The model itself is unchanged from the public OpenAI release; we add three layers on top for clinical use.

Custom-vocabulary biasing.A curated dictionary of 2,500+ medication names, DSM-5 terms, and clinical abbreviations is biased into the recognition pass. This shifts the model's priors so phonetically-ambiguous tokens ("lamictal" vs "lamictal") resolve to the correct medical spelling.

Push-to-talk hotkey and text injection. A configurable hotkey (Option+Space default on Mac) starts and stops recording. When you release, the transcribed text types into whatever text field is focused — EHR (SimplePractice, TherapyNotes), Apple Notes, Word, Gmail, anywhere. No per-application integration needed.

Filler stripping and clinical formatting.Common filler words ("um", "uh", false-start repetitions) are stripped by default. The clinician can also define personal abbreviations ("hpi" expands to "History of Present Illness") that fire during typing.

Why on-device Whisper matters for medical use

The reason to use a local Whisper-based tool rather than a cloud speech API is HIPAA. Medical dictation — even of progress notes the clinician is writing themselves, not session audio — generally contains PHI (client identifiers, presenting problems, diagnostic impressions, medication decisions). When that audio is processed by a third party, that third party becomes a HIPAA Business Associate and a BAA is required.

Cloud speech APIs (OpenAI's hosted Whisper API, AssemblyAI, Deepgram, Google Speech-to-Text) do offer enterprise tiers with BAA execution, but the operational overhead is real: BAA negotiation, vendor risk assessment, ongoing compliance review. For a solo practice or small group, on-device Whisper sidesteps the entire question — there is no third party in the audio path.

See our HIPAA-friendly dictation page for the full architectural argument.

Can I just roll my own Whisper-based dictation tool?

Yes, if you are comfortable with command-line tools. whisper.cpp (the C/C++ port) runs Whisper models on Mac and Windows with a few hundred MB of model weights and reasonable CPU/GPU usage. Open-source dictation wrappers (whisper-typer, BetterDictation) wire whisper.cpp to a hotkey and text-injection pipeline.

The work that Sapience Med has done on top of that base — medical vocabulary curation, low-latency on-device inference tuning, text injection compatibility across Electron and web-based EHRs, license/billing infrastructure for clinical deployment, ongoing model updates — adds up to roughly a person-year of engineering. For a clinician who wants to spend their evenings clinically rather than on yak-shaving a custom dictation tool, paying $399 per year for a maintained product is the simpler path.

Frequently asked questions

Is Whisper accurate enough for medical dictation?

Out of the box, Whisper large-v3 has roughly 95% accuracy on general English dictation and noticeably lower accuracy on psychiatric medication names and DSM-5 terms — the kind of vocabulary that doesn't appear often in Whisper's training corpus. With medical-vocabulary biasing on top, accuracy on those clinical terms approaches 99%. The bias layer is what Sapience Med adds.

Does Sapience Med use the cloud Whisper API or local model?

Local model only. Sapience Med ships with quantized Whisper-large-v3-turbo weights (about 600 MB), runs them on-device using Metal (Mac) or Vulkan (Windows) GPU acceleration. No audio is sent to any server — not to OpenAI, not to Sapience servers. This is the architectural reason HIPAA Business Associate review isn't needed for the voice path.

Why not just use the OpenAI Whisper API?

The hosted Whisper API requires sending audio to OpenAI's servers, which means OpenAI becomes a HIPAA Business Associate for any medical use. OpenAI does offer enterprise BAAs, but the per-clinician cost and operational overhead make it less practical than running the same model locally. For a solo or small practice, local Whisper is the simpler answer.

What about Whisper on iPhone — does Sapience Med work on iPad / iPhone?

Not today. Sapience Med is a Mac + Windows desktop app. Whisper does run on iPhone via Core ML, but the desktop-focused workflow (push-to-talk hotkey, text injection into any focused field) doesn't translate cleanly to iOS. For mobile dictation in SimplePractice or TherapyNotes, the iOS built-in dictation is currently the only practical option.

Is the medical vocabulary biasing visible — can I add my own terms?

Yes. The dictionary is a configurable file in Sapience Med's settings. You can add specialty-specific terms (modalities you use, custom abbreviations, less common medications). New entries are picked up immediately without restarting the app. Built-in coverage includes psychotropics (SSRIs, SNRIs, mood stabilizers, atypical antipsychotics, ADHD meds, sleep meds), DSM-5 diagnoses, and assessment instruments.

What hardware do I need to run Whisper-grade dictation locally?

Mac: any Apple Silicon (M1 or newer) handles it comfortably with ~0.4-0.7s latency. Intel Macs work but with higher CPU load. Windows: a recent x64 machine (2019 or newer) with a Vulkan-capable GPU. The model weights are about 600MB, total app size is ~519MB packaged.

Try Sapience Med free for 14 days.

$45/month or $399/year (save 24%) after the trial. No card required to start.

Download for Mac Download for Windows