Open Medical Scribe

Open-source AI medical documentation — fully local, fully private. Available as a web app, desktop app, or CLI.

Get Started View on GitHub

Features

Everything you need for AI-assisted clinical documentation

A modular, privacy-first platform that connects real-time audio capture to structured medical notes, with your choice of local or cloud providers.

Real-Time Transcription

Stream audio from the browser via WebSocket for live speech-to-text. Powered by Whisper ONNX running entirely on your local machine -- no audio ever leaves your network.

AI Note Generation

Transform raw transcripts into structured clinical notes using Ollama (local), OpenAI, Anthropic Claude, or Google Gemini. Choose quality, cost, and privacy tradeoffs per deployment.

Multiple Note Formats

Generate SOAP, H&P, Progress, DAP, and Procedure notes out of the box. Swedish journal format supported alongside international standards.

Privacy-First Design

Run the entire pipeline on your own hardware. In local mode, no patient data ever leaves your machine. Configurable PHI handling for every deployment environment.

Modular CLI Tools

Use scribe-transcribe and scribe-note as standalone binaries for scripting and batch workflows. Build pipelines without the web server.

Pluggable Providers

Swap transcription and LLM backends via settings or environment variables. Run fully local, fully cloud, or any hybrid combination for your needs.

Architecture

How it works

A four-stage pipeline from microphone to structured clinical note, with each component independently replaceable.

Step 1
Audio Capture
Browser records audio from the clinician's microphone and streams chunks over a WebSocket connection.
Step 2
WebSocket Server
Node.js server receives audio streams, buffers them, and routes to the configured transcription provider.
Step 3
Local Transcription
Whisper ONNX transcribes speech to text locally. No audio data is sent to external services.
Step 4
Note Generation
LLM transforms the transcript into a structured clinical note (SOAP, H&P, Progress, and more).

Downloads

Desktop Application

Download the Electron desktop app for macOS and Windows. Choose between a light build that downloads models on first run, or a full offline build with everything bundled.

Light Build (macOS arm64)

Clean install with no bundled models. The Whisper ONNX speech recognition model downloads automatically on first use (~1.1 GB). Requires external Ollama for note generation.

Best for: users who want a smaller download and have internet access

Download DMG (~1 GB)

Full Build (~6 GB)

Everything bundled for offline use. Includes Whisper ONNX model for local speech recognition and Llama 3.1 8B for local note generation via llama-server. Works completely offline.

Best for: air-gapped environments, offline clinics, maximum privacy

Download DMG (~1.9 GB)

The full build includes the bundled Whisper ONNX model for instant local transcription. Both builds require Ollama for note generation. Additional platforms (macOS x64, Windows) will be available in future releases. See all releases.

Getting Started

Up and running in under a minute

Clone the repository, install dependencies, and start the server. The browser UI opens automatically.

# Clone the repository
git clone https://github.com/BirgerMoell/open-medical-scribe
cd open-medical-scribe

# Install dependencies
npm install

# Start the server
npm start

# Open http://localhost:8787

Requires Node.js 22 or later. Copy .env.example to .env and configure your preferred providers and API keys before starting.

Building the Desktop App

# Light build (models download on first run)
npm run electron:build:light:mac

# Full build (download models first, then build)
npm run models:download:whisper
npm run models:download:llm
npm run electron:build:full:mac

# For Windows, replace :mac with :win

The full build bundles a llama-server binary and Llama 3.1 8B GGUF model (~4.7 GB). The Electron app automatically detects bundled models and configures providers accordingly.

Providers

Supported providers

Mix and match local and cloud providers for transcription and note generation. Switch backends with a single configuration change.

Transcription

Provider Type
Whisper ONNX Local
faster-whisper Local
whisper.cpp Local
OpenAI Whisper Cloud
Deepgram Cloud
Google Cloud Speech Cloud
Berget AI Cloud (EU)

Note Generation

Provider Type
Ollama Local
OpenAI Cloud
Anthropic Claude Cloud
Google Gemini Cloud

Important Safety Note

Open Medical Scribe is an assistive documentation tool, not a diagnostic system. All generated notes are drafts that require clinical review and sign-off before use in patient care. Always verify the content against your own clinical judgment.

Note Formats

Structured clinical note styles

Choose the documentation format that fits your clinical workflow. Each style generates structured JSON with typed section keys, coding hints, and follow-up questions.

SOAP

The standard four-section format: Subjective (patient-reported symptoms and history), Objective (clinician observations and vitals), Assessment (diagnosis and clinical reasoning), and Plan (treatment, follow-up, referrals). Widely used in primary care and general practice.

History & Physical

Comprehensive intake documentation with sections for Chief Complaint, History of Present Illness, Past Medical History, Medications, Allergies, Family History, Social History, Review of Systems, Physical Examination, Assessment, and Plan. Ideal for new patient encounters and hospital admissions.

Progress Note

Follow-up visit documentation structured as Interval History (changes since last visit), Current Medications, Examination Findings, Assessment, and Plan. Designed for ongoing care where a full H&P is not required.

DAP (Behavioral Health)

A three-section format for mental health and counseling sessions: Data (objective and subjective information from the session), Assessment (clinical interpretation and progress toward therapeutic goals), and Plan (next steps, homework assignments, next session agenda).

Procedure Note

Structured documentation for clinical procedures with sections for Procedure Name, Indication, Pre-procedure Diagnosis, Anesthesia, Description of Procedure, Findings, Specimens, Complications, Post-procedure Condition, and Plan.

Swedish Journal / Journalanteckning

Documentation following Swedish clinical standards and Patientdatalagen (PDL). Sections: Aktuellt (presenting concern in the patient's own words), Anamnes (medical history, heredity, current symptoms), Status (physical examination and clinical findings), Bedömning (clinical assessment and diagnosis with ICD-10 codes), and Planering (treatment plan, prescriptions, referrals, follow-up). Written in Swedish using standard medical abbreviations (AT, BT, Cor, Pulm, Buk).

Streaming

Real-time transcription pipeline

Audio flows from the browser microphone through a WebSocket connection to the server, where Whisper ONNX processes it locally and streams interim results back in real time.

1

Audio capture via AudioWorklet

The browser captures microphone audio using the Web Audio API's AudioWorklet interface. Audio is downsampled to 16 kHz mono and encoded as PCM16 (signed 16-bit little-endian) before being sent as binary WebSocket frames to the server at ws://host/v1/stream.

2

WebSocket session initialization

On connection, the client sends a JSON configuration message specifying language and diarization preferences. The server creates a streaming transcription session with the configured provider and responds with a ready message. All subsequent binary frames are routed to the session's audio buffer.

3

Buffered Whisper inference

The server accumulates incoming PCM16 audio into a buffer. At a configurable interval (default 5 seconds, set via STREAMING_WHISPER_INTERVAL_MS), the buffer is converted to Float32 and fed to the Whisper ONNX model via @huggingface/transformers. Segments shorter than 0.5 seconds are skipped. An RMS-based silence check prevents hallucinated transcriptions on quiet audio.

4

Interim results stream to the UI

Each inference cycle sends a transcript message back over the WebSocket containing the current text, speaker ID, and finality flag. The browser UI displays these interim results live, updating in place until the segment is finalized. Non-final results are shown as in-progress text so the clinician sees transcription happening in real time.

5

Silence detection finalizes segments

When the RMS energy of incoming audio drops below the silence threshold for 1.5 seconds, the current segment is marked as final. The server sends an utterance_end message, commits the text to the transcript, and advances the buffer pointer. This creates natural sentence boundaries without requiring the clinician to press any buttons.

6

Session end and note generation

When the clinician stops recording (or the WebSocket closes), the server performs a final transcription pass on any remaining buffered audio. A session_end message is sent containing the full assembled transcript, speaker information, utterance count, and audio duration. The transcript is then passed to the configured note generation provider to produce a structured clinical note. Optional post-session diarization via a pyannote sidecar can refine speaker attribution.

CLI Tools

Command-line interface

Two standalone CLI tools for scripting, batch processing, and integration into existing workflows. Pipe them together or use independently.

scribe-transcribe

Transcribes audio files to text. Accepts an audio file path as an argument, or reads from stdin. Outputs plain text to stdout.

Flags: --provider (transcription backend), --language (language hint, e.g. "sv"), --country (country code, e.g. "SE"), --locale (full locale, e.g. "sv-SE"), --mime-type (audio MIME type, auto-detected from extension), --model (override provider model)

scribe-note

Generates a structured clinical note from a transcript. Reads transcript text from stdin, --transcript, or --transcript-file. Outputs JSON to stdout with noteText, sections, codingHints, followUpQuestions, and warnings.

Flags: --provider (note backend), --note-style (soap, hp, progress, dap, procedure, journal), --specialty (medical specialty), --custom-prompt (override system prompt), --model (override provider model)

Pipeline example

# Transcribe an audio file and generate a SOAP note in one pipeline
scribe-transcribe recording.mp3 | scribe-note --note-style soap

# Use a specific provider and specialty
scribe-transcribe --provider openai --language en consultation.wav \
  | scribe-note --provider anthropic --note-style hp --specialty cardiology

# Generate a Swedish journal note from a transcript file
scribe-note --transcript-file transcript.txt --note-style journal

Building the CLI binaries

# Build both CLI tools as standalone Bun executables
npm run build:cli

# Or build individually
npm run build:cli:transcribe
npm run build:cli:note

# Compiled binaries are output to dist/scribe-transcribe and dist/scribe-note

CLI tools are compiled with Bun into self-contained executables. They share the same provider and configuration logic as the web server, reading settings from environment variables or .env.

Configuration

Environment variables

Configure the server, CLI tools, and desktop app using environment variables. Set them in a .env file at the project root, or export them in your shell. The desktop app also provides a graphical Settings page.

General

Variable Description Default
SCRIBE_MODE Operating mode: api (cloud providers), local (Ollama + whisper.cpp), or hybrid (mix both) hybrid
PORT HTTP server listen port 8787
TRANSCRIPTION_PROVIDER Transcription backend: whisper-onnx, whisper.cpp, faster-whisper, openai, deepgram, google, berget depends on mode
NOTE_PROVIDER Note generation backend: ollama, openai, anthropic, gemini depends on mode
DEFAULT_NOTE_STYLE Default note format: soap, hp, progress, dap, procedure, journal journal
DEFAULT_SPECIALTY Default medical specialty context (e.g. primary-care, cardiology, psychiatry) primary-care
DEFAULT_COUNTRY Default country code for locale-aware behavior (e.g. SE, US) (empty)
ENABLE_WEB_UI Enable the browser-based UI (set false for API-only mode) true

Privacy

Variable Description Default
PHI_REDACTION_MODE Protected Health Information redaction strategy: basic (regex patterns) or none basic
REDACT_BEFORE_API_CALLS Apply PHI redaction before sending data to cloud providers true
AUDIT_LOG_FILE Path to an audit log file for tracking usage events (disabled)

Provider API Keys and Models

Variable Description Default
OPENAI_API_KEY OpenAI API key for transcription and note generation (required if using OpenAI)
OPENAI_NOTE_MODEL OpenAI model for note generation gpt-4.1-mini
OPENAI_TRANSCRIBE_MODEL OpenAI model for transcription gpt-4o-mini-transcribe
ANTHROPIC_API_KEY Anthropic API key for Claude-based note generation (required if using Anthropic)
ANTHROPIC_MODEL Anthropic Claude model claude-sonnet-4-20250514
GEMINI_API_KEY Google Gemini API key for note generation (required if using Gemini)
GEMINI_MODEL Gemini model gemini-2.0-flash
OLLAMA_BASE_URL Ollama server URL for local note generation http://localhost:11434
OLLAMA_MODEL Ollama model for note generation llama3.1:8b
DEEPGRAM_API_KEY Deepgram API key for cloud transcription (required if using Deepgram)
DEEPGRAM_MODEL Deepgram transcription model nova-3-medical
BERGET_API_KEY Berget AI API key for EU-sovereign transcription (required if using Berget)
BERGET_TRANSCRIBE_MODEL Berget AI transcription model KBLab/kb-whisper-large

Streaming (Live Transcription)

Variable Description Default
STREAMING_TRANSCRIPTION_PROVIDER Streaming transcription backend: whisper-stream, deepgram-stream, or mock-stream mock-stream
STREAMING_WHISPER_MODEL ONNX Whisper model for streaming transcription onnx-community/kb-whisper-large-ONNX
STREAMING_WHISPER_LANGUAGE Language code for streaming Whisper inference sv
STREAMING_WHISPER_INTERVAL_MS Milliseconds between Whisper inference cycles during streaming 5000
DIARIZE_SIDECAR_URL URL of the pyannote diarization sidecar service http://localhost:8786
DIARIZE_ON_END Run speaker diarization after session ends false