Offline Medical Dictation for Mac: No Cloud Audio, No BAA Required

Quick answer

Most medical dictation tools (Dragon Medical One, AWS Transcribe Medical, Google Speech-to-Text Medical) stream your audio to a server, creating a HIPAA Business Associate relationship. Sapience Med runs the entire speech model on your Mac — Apple Silicon's Neural Engine processes the audio buffer locally and discards it after transcription. No audio leaves the device, no BAA needed for the voice path.

Download for Mac Download for Windows14-day free trial · No card required

Why most medical dictation tools are cloud-based

Until recently, accurate speech recognition required compute power not available on a laptop. Cloud services (AWS Transcribe Medical, Google Speech-to-Text Medical, Nuance's Dragon Medical One, AssemblyAI, Deepgram) host large speech models on GPU servers and stream user audio to those servers for transcription. Latency over a fast connection is roughly 150-300ms, accuracy on medical vocabulary is high, and the vendor handles the ongoing model maintenance.

The architecture made sense for hospital deployments and large medical groups where a dedicated IT team can negotiate BAAs, monitor audit logs, manage network configurations, and absorb the operational overhead. For solo therapists and small mental- health practices, the same architecture imposes a much heavier relative compliance burden — the same BAA negotiation, vendor risk review, and ongoing compliance cost, with one or two clinicians to amortize it across.

Why privacy matters more for mental-health dictation

Therapy and psychiatric notes contain particularly sensitive PHI. A progress note dictated by a therapist might include presenting problems (sexual, substance use, suicidal ideation), trauma history, family-of-origin details, and identifying information about the client. The same content protected under HIPAA also tends to be the kind of content a clinician would be most uncomfortable sending to a third-party server, even with a BAA in place.

Practical concern: even with the strongest contracts, audio sent to a cloud service exists on that vendor's servers for some retention window. Breach is rare but possible. For mental-health practitioners, "technically compliant but architecturally avoidable" is not the bar — the bar is "audio of session-adjacent notes does not leave my computer."

On-device speech recognition on Apple Silicon

Apple Silicon (M1, M2, M3, M4) makes local speech recognition practical on a laptop. The Neural Engine in these chips runs quantized speech models — Whisper-large-v3-turbo, OpenAI's state-of-the-art open-source ASR — at sub-second latency on a typical clinical dictation.

Concretely on an M1 MacBook Pro: a 15-second progress note dictation finishes transcription roughly 0.5 seconds after you release the hotkey. The text appears in your EHR or Notes field, the audio buffer is discarded, and the next dictation starts fresh. CPU usage is minimal (a brief spike during inference, idle otherwise). Battery impact is negligible — comparable to other intermittent CPU tasks.

The accuracy is comparable to cloud APIs on general English and on medical vocabulary when the model is biased with a medical dictionary (which Sapience Med ships by default). There is no quality trade-off for running locally — only an architectural simplification.

What "offline" actually means in Sapience Med

Sapience Med uses the word "offline" specifically about the audio path: your audio buffer is never transmitted off-device. Microphone capture → on-device transcription → text injection into your EHR or notes field. The audio buffer is discarded after transcription; the transcript itself is in your clipboard and the target text field, where you control it.

A few non-audio network calls do happen for legitimate operational reasons:

License verification against license.sapience.systems — sends license_id + machine ID hash, no audio, no notes, no PHI. Happens every few hours in background.
Update check for new app versions. Sends just the current version number; updates are signed with our Ed25519 key and verified locally before applying.
Crash reports (opt-in) if the app crashes — stack trace only, no audio or notes content.

None of these touch your microphone, your transcripts, or your clinical notes. The app works fully offline for dictation — no internet connection needed for transcription itself. License verification gracefully tolerates network outage (cached JWT remains valid until its expiry).

What this means for HIPAA / BAA

A Business Associate Agreement is required when a vendor creates, receives, maintains, or transmits PHI on behalf of a covered entity. The triggering action is the vendor having access to PHI — audio of medical dictation typically counts as PHI.

Sapience Med's architecture is designed so that we never have access to your audio or your notes. We don't receive them, we don't transmit them, we don't store them. The license server (which we do operate) handles license_id and machine ID hash for the payment/activation flow — these are not PHI. So the BAA question for the voice path is functionally moot.

For full architectural detail and the legal reasoning, see our HIPAA architecture brief. The short version: Sapience Med is not your Business Associate because we never touch the protected health information.

Frequently asked questions

Is Sapience Med truly offline, or does it phone home?

Truly offline for audio and notes. The microphone audio is processed by an on-device speech model and discarded after transcription. The transcribed text goes to your clipboard and target field, never to our servers. Non-audio network calls happen for license verification (hash only, no PHI) and update checks — both can be paused without affecting dictation itself.

Can I use Sapience Med without an internet connection?

Yes for the dictation path. Hotkey, audio capture, on-device transcription, text injection — all work offline. Trial start and paid license activation require a one-time internet connection to our license server. After that, license verification refreshes opportunistically (every few hours when online, gracefully tolerated when offline up to the license expiry).

What chip do I need for offline Mac dictation to be fast?

Apple Silicon (M1, M2, M3, M4 or newer) gives the best experience — Neural Engine accelerates the speech model to sub-second latency. Intel Macs work too, with somewhat higher CPU load during transcription. We recommend Apple Silicon for daily clinical use.

How does this compare to dictating into Apple Notes with macOS Dictation?

macOS Dictation is also on-device on Apple Silicon (it was cloud-based earlier; now local). The key difference is the vocabulary: Apple's model is general-purpose. Sapience Med adds 2,500+ medication names, DSM-5 terms, and clinical abbreviations biased into the recognition. On clinical content, Sapience is materially more accurate.

Is the model that runs on my Mac the same as the cloud one?

Sapience Med ships Whisper-large-v3-turbo (quantized for on-device inference). This is OpenAI's state-of-the-art open-source speech model — the same architecture used by many cloud transcription services. Quantization trades a small accuracy delta for fitting on consumer hardware; on general dictation the delta is in single-digit-percent territory.

Can I verify the audio actually doesn't leave my Mac?

Yes. On macOS open Activity Monitor → Network tab while dictating. You will see no upload spike on Sapience Med during transcription. The only Sapience Med network traffic is the license verification (a few hundred bytes every few hours) and update checks (small). You can also run Little Snitch or similar firewall app to whitelist exactly what Sapience Med is allowed to do.

Try Sapience Med free for 14 days.

$45/month or $399/year (save 24%) after the trial. No card required to start.

Download for Mac Download for Windows