Open Medical Scribe and Eir Scribe — Open-Source AI Medical Documentation

Features

Everything you need for AI-assisted clinical documentation

A modular, privacy-first platform that connects real-time audio capture to structured medical notes, with your choice of local or cloud providers.

Real-Time Transcription

Stream audio from the browser via WebSocket for live speech-to-text. Powered by Whisper ONNX running entirely on your local machine -- no audio ever leaves your network.

AI Note Generation

Transform raw transcripts into structured clinical notes using Ollama (local), OpenAI, Anthropic Claude, or Google Gemini. Choose quality, cost, and privacy tradeoffs per deployment.

Multiple Note Formats

Generate SOAP, H&P, Progress, DAP, and Procedure notes out of the box. Swedish journal format supported alongside international standards.

Privacy-First Design

Run the entire pipeline on your own hardware. In local mode, no patient data ever leaves your machine. Configurable PHI handling for every deployment environment.

Modular CLI Tools

Use scribe-transcribe and scribe-note as standalone binaries for scripting and batch workflows. Build pipelines without the web server.

Pluggable Providers

Swap transcription and LLM backends via settings or environment variables. Run fully local, fully cloud, or any hybrid combination for your needs.

Downloads

Desktop Application

Download the Electron desktop app for macOS and Windows. Choose between a light build that downloads models on first run, or a full offline build with everything bundled.

Light Build (macOS arm64)

Clean install with no bundled models. The Whisper ONNX speech recognition model downloads automatically on first use (~1.1 GB). Requires external Ollama for note generation.

Best for: users who want a smaller download and have internet access

Download DMG (~1 GB)

Full Build (~6 GB)

Everything bundled for offline use. Includes Whisper ONNX model for local speech recognition and Llama 3.1 8B for local note generation via llama-server. Works completely offline.

Best for: air-gapped environments, offline clinics, maximum privacy

Download DMG (~1.9 GB)

The full build includes the bundled Whisper ONNX model for instant local transcription. Both builds require Ollama for note generation. Additional platforms (macOS x64, Windows) will be available in future releases. See all releases.

Getting Started

Up and running in under a minute

Clone the repository, install dependencies, and start the server. The browser UI opens automatically.

      # Clone the repository

      git clone https://github.com/BirgerMoell/open-medical-scribe

      cd open-medical-scribe

      # Install dependencies

      npm install

      # Start the server

      npm start

      # Open http://localhost:8787

Requires Node.js 22 or later. Copy .env.example to .env and configure your preferred providers and API keys before starting.

Building the Desktop App

      # Light build (models download on first run)

      npm run electron:build:light:mac

      # Full build (download models first, then build)

      npm run models:download:whisper

      npm run models:download:llm

      npm run electron:build:full:mac

      # For Windows, replace :mac with :win

The full build bundles a llama-server binary and Llama 3.1 8B GGUF model (~4.7 GB). The Electron app automatically detects bundled models and configures providers accordingly.

Provider	Type
Whisper ONNX	Local
faster-whisper	Local
whisper.cpp	Local
OpenAI Whisper	Cloud
Deepgram	Cloud
Google Cloud Speech	Cloud
Berget AI	Cloud (EU)

Provider	Type
Ollama	Local
OpenAI	Cloud
Anthropic Claude	Cloud
Google Gemini	Cloud

Note Formats

Structured clinical note styles

Choose the documentation format that fits your clinical workflow. Each style generates structured JSON with typed section keys, coding hints, and follow-up questions.

SOAP

The standard four-section format: Subjective (patient-reported symptoms and history), Objective (clinician observations and vitals), Assessment (diagnosis and clinical reasoning), and Plan (treatment, follow-up, referrals). Widely used in primary care and general practice.

History & Physical

Comprehensive intake documentation with sections for Chief Complaint, History of Present Illness, Past Medical History, Medications, Allergies, Family History, Social History, Review of Systems, Physical Examination, Assessment, and Plan. Ideal for new patient encounters and hospital admissions.

Progress Note

Follow-up visit documentation structured as Interval History (changes since last visit), Current Medications, Examination Findings, Assessment, and Plan. Designed for ongoing care where a full H&P is not required.

DAP (Behavioral Health)

A three-section format for mental health and counseling sessions: Data (objective and subjective information from the session), Assessment (clinical interpretation and progress toward therapeutic goals), and Plan (next steps, homework assignments, next session agenda).

Procedure Note

Structured documentation for clinical procedures with sections for Procedure Name, Indication, Pre-procedure Diagnosis, Anesthesia, Description of Procedure, Findings, Specimens, Complications, Post-procedure Condition, and Plan.

Swedish Journal / Journalanteckning

Documentation following Swedish clinical standards and Patientdatalagen (PDL). Sections: Aktuellt (presenting concern in the patient's own words), Anamnes (medical history, heredity, current symptoms), Status (physical examination and clinical findings), Bedömning (clinical assessment and diagnosis with ICD-10 codes), and Planering (treatment plan, prescriptions, referrals, follow-up). Written in Swedish using standard medical abbreviations (AT, BT, Cor, Pulm, Buk).

Streaming

Real-time transcription pipeline

Audio flows from the browser microphone through a WebSocket connection to the server, where Whisper ONNX processes it locally and streams interim results back in real time.

Audio capture via AudioWorklet

The browser captures microphone audio using the Web Audio API's AudioWorklet interface. Audio is downsampled to 16 kHz mono and encoded as PCM16 (signed 16-bit little-endian) before being sent as binary WebSocket frames to the server at ws://host/v1/stream.

WebSocket session initialization

On connection, the client sends a JSON configuration message specifying language and diarization preferences. The server creates a streaming transcription session with the configured provider and responds with a ready message. All subsequent binary frames are routed to the session's audio buffer.

Buffered Whisper inference

The server accumulates incoming PCM16 audio into a buffer. At a configurable interval (default 5 seconds, set via STREAMING_WHISPER_INTERVAL_MS), the buffer is converted to Float32 and fed to the Whisper ONNX model via @huggingface/transformers. Segments shorter than 0.5 seconds are skipped. An RMS-based silence check prevents hallucinated transcriptions on quiet audio.

Interim results stream to the UI

Each inference cycle sends a transcript message back over the WebSocket containing the current text, speaker ID, and finality flag. The browser UI displays these interim results live, updating in place until the segment is finalized. Non-final results are shown as in-progress text so the clinician sees transcription happening in real time.

Silence detection finalizes segments

When the RMS energy of incoming audio drops below the silence threshold for 1.5 seconds, the current segment is marked as final. The server sends an utterance_end message, commits the text to the transcript, and advances the buffer pointer. This creates natural sentence boundaries without requiring the clinician to press any buttons.

Session end and note generation

When the clinician stops recording (or the WebSocket closes), the server performs a final transcription pass on any remaining buffered audio. A session_end message is sent containing the full assembled transcript, speaker information, utterance count, and audio duration. The transcript is then passed to the configured note generation provider to produce a structured clinical note. Optional post-session diarization via a pyannote sidecar can refine speaker attribution.

CLI Tools

Command-line interface

Two standalone CLI tools for scripting, batch processing, and integration into existing workflows. Pipe them together or use independently.

scribe-transcribe

Transcribes audio files to text. Accepts an audio file path as an argument, or reads from stdin. Outputs plain text to stdout.

Flags: --provider (transcription backend), --language (language hint, e.g. "sv"), --country (country code, e.g. "SE"), --locale (full locale, e.g. "sv-SE"), --mime-type (audio MIME type, auto-detected from extension), --model (override provider model)

scribe-note

Generates a structured clinical note from a transcript. Reads transcript text from stdin, --transcript, or --transcript-file. Outputs JSON to stdout with noteText, sections, codingHints, followUpQuestions, and warnings.

Flags: --provider (note backend), --note-style (soap, hp, progress, dap, procedure, journal), --specialty (medical specialty), --custom-prompt (override system prompt), --model (override provider model)

Pipeline example

      # Transcribe an audio file and generate a SOAP note in one pipeline

      scribe-transcribe recording.mp3 | scribe-note --note-style soap

      # Use a specific provider and specialty

      scribe-transcribe --provider openai --language en consultation.wav \

        | scribe-note --provider anthropic --note-style hp --specialty cardiology

      # Generate a Swedish journal note from a transcript file

      scribe-note --transcript-file transcript.txt --note-style journal

Building the CLI binaries

      # Build both CLI tools as standalone Bun executables

      npm run build:cli

      # Or build individually

      npm run build:cli:transcribe

      npm run build:cli:note

      # Compiled binaries are output to dist/scribe-transcribe and dist/scribe-note

CLI tools are compiled with Bun into self-contained executables. They share the same provider and configuration logic as the web server, reading settings from environment variables or .env.

Configuration

Environment variables

Configure the server, CLI tools, and desktop app using environment variables. Set them in a .env file at the project root, or export them in your shell. The desktop app also provides a graphical Settings page.

General

Variable	Description	Default
SCRIBE_MODE	Operating mode: `api` (cloud providers), `local` (Ollama + whisper.cpp), or `hybrid` (mix both)	hybrid
PORT	HTTP server listen port	8787
TRANSCRIPTION_PROVIDER	Transcription backend: `whisper-onnx`, `whisper.cpp`, `faster-whisper`, `openai`, `deepgram`, `google`, `berget`	depends on mode
NOTE_PROVIDER	Note generation backend: `ollama`, `openai`, `anthropic`, `gemini`	depends on mode
DEFAULT_NOTE_STYLE	Default note format: `soap`, `hp`, `progress`, `dap`, `procedure`, `journal`	journal
DEFAULT_SPECIALTY	Default medical specialty context (e.g. `primary-care`, `cardiology`, `psychiatry`)	primary-care
DEFAULT_COUNTRY	Default country code for locale-aware behavior (e.g. `SE`, `US`)	(empty)
ENABLE_WEB_UI	Enable the browser-based UI (set `false` for API-only mode)	true

Privacy

Variable	Description	Default
PHI_REDACTION_MODE	Protected Health Information redaction strategy: `basic` (regex patterns) or `none`	basic
REDACT_BEFORE_API_CALLS	Apply PHI redaction before sending data to cloud providers	true
AUDIT_LOG_FILE	Path to an audit log file for tracking usage events	(disabled)

Provider API Keys and Models

Variable	Description	Default
OPENAI_API_KEY	OpenAI API key for transcription and note generation	(required if using OpenAI)
OPENAI_NOTE_MODEL	OpenAI model for note generation	gpt-4.1-mini
OPENAI_TRANSCRIBE_MODEL	OpenAI model for transcription	gpt-4o-mini-transcribe
ANTHROPIC_API_KEY	Anthropic API key for Claude-based note generation	(required if using Anthropic)
ANTHROPIC_MODEL	Anthropic Claude model	claude-sonnet-4-20250514
GEMINI_API_KEY	Google Gemini API key for note generation	(required if using Gemini)
GEMINI_MODEL	Gemini model	gemini-2.0-flash
OLLAMA_BASE_URL	Ollama server URL for local note generation	http://localhost:11434
OLLAMA_MODEL	Ollama model for note generation	llama3.1:8b
DEEPGRAM_API_KEY	Deepgram API key for cloud transcription	(required if using Deepgram)
DEEPGRAM_MODEL	Deepgram transcription model	nova-3-medical
BERGET_API_KEY	Berget AI API key for EU-sovereign transcription	(required if using Berget)
BERGET_TRANSCRIBE_MODEL	Berget AI transcription model	KBLab/kb-whisper-large

Streaming (Live Transcription)

Variable	Description	Default
STREAMING_TRANSCRIPTION_PROVIDER	Streaming transcription backend: `whisper-stream`, `deepgram-stream`, or `mock-stream`	mock-stream
STREAMING_WHISPER_MODEL	ONNX Whisper model for streaming transcription	onnx-community/kb-whisper-large-ONNX
STREAMING_WHISPER_LANGUAGE	Language code for streaming Whisper inference	sv
STREAMING_WHISPER_INTERVAL_MS	Milliseconds between Whisper inference cycles during streaming	5000
DIARIZE_SIDECAR_URL	URL of the pyannote diarization sidecar service	http://localhost:8786
DIARIZE_ON_END	Run speaker diarization after session ends	false