Open-source AI medical documentation — fully local, fully private. Available as a web app, desktop app, or CLI.
Features
A modular, privacy-first platform that connects real-time audio capture to structured medical notes, with your choice of local or cloud providers.
Stream audio from the browser via WebSocket for live speech-to-text. Powered by Whisper ONNX running entirely on your local machine -- no audio ever leaves your network.
Transform raw transcripts into structured clinical notes using Ollama (local), OpenAI, Anthropic Claude, or Google Gemini. Choose quality, cost, and privacy tradeoffs per deployment.
Generate SOAP, H&P, Progress, DAP, and Procedure notes out of the box. Swedish journal format supported alongside international standards.
Run the entire pipeline on your own hardware. In local mode, no patient data ever leaves your machine. Configurable PHI handling for every deployment environment.
Use scribe-transcribe and scribe-note as standalone binaries for scripting and batch workflows. Build pipelines without the web server.
Swap transcription and LLM backends via settings or environment variables. Run fully local, fully cloud, or any hybrid combination for your needs.
Architecture
A four-stage pipeline from microphone to structured clinical note, with each component independently replaceable.
Downloads
Download the Electron desktop app for macOS and Windows. Choose between a light build that downloads models on first run, or a full offline build with everything bundled.
Clean install with no bundled models. The Whisper ONNX speech recognition model downloads automatically on first use (~1.1 GB). Requires external Ollama for note generation.
Best for: users who want a smaller download and have internet access
Download DMG (~1 GB)Everything bundled for offline use. Includes Whisper ONNX model for local speech recognition and Llama 3.1 8B for local note generation via llama-server. Works completely offline.
Best for: air-gapped environments, offline clinics, maximum privacy
Download DMG (~1.9 GB)The full build includes the bundled Whisper ONNX model for instant local transcription. Both builds require Ollama for note generation. Additional platforms (macOS x64, Windows) will be available in future releases. See all releases.
Getting Started
Clone the repository, install dependencies, and start the server. The browser UI opens automatically.
Requires Node.js 22 or later. Copy .env.example to .env and configure your preferred providers and API keys before starting.
The full build bundles a llama-server binary and Llama 3.1 8B GGUF model (~4.7 GB). The Electron app automatically detects bundled models and configures providers accordingly.
Providers
Mix and match local and cloud providers for transcription and note generation. Switch backends with a single configuration change.
| Provider | Type |
|---|---|
| Whisper ONNX | Local |
| faster-whisper | Local |
| whisper.cpp | Local |
| OpenAI Whisper | Cloud |
| Deepgram | Cloud |
| Google Cloud Speech | Cloud |
| Berget AI | Cloud (EU) |
| Provider | Type |
|---|---|
| Ollama | Local |
| OpenAI | Cloud |
| Anthropic Claude | Cloud |
| Google Gemini | Cloud |
Note Formats
Choose the documentation format that fits your clinical workflow. Each style generates structured JSON with typed section keys, coding hints, and follow-up questions.
The standard four-section format: Subjective (patient-reported symptoms and history), Objective (clinician observations and vitals), Assessment (diagnosis and clinical reasoning), and Plan (treatment, follow-up, referrals). Widely used in primary care and general practice.
Comprehensive intake documentation with sections for Chief Complaint, History of Present Illness, Past Medical History, Medications, Allergies, Family History, Social History, Review of Systems, Physical Examination, Assessment, and Plan. Ideal for new patient encounters and hospital admissions.
Follow-up visit documentation structured as Interval History (changes since last visit), Current Medications, Examination Findings, Assessment, and Plan. Designed for ongoing care where a full H&P is not required.
A three-section format for mental health and counseling sessions: Data (objective and subjective information from the session), Assessment (clinical interpretation and progress toward therapeutic goals), and Plan (next steps, homework assignments, next session agenda).
Structured documentation for clinical procedures with sections for Procedure Name, Indication, Pre-procedure Diagnosis, Anesthesia, Description of Procedure, Findings, Specimens, Complications, Post-procedure Condition, and Plan.
Documentation following Swedish clinical standards and Patientdatalagen (PDL). Sections: Aktuellt (presenting concern in the patient's own words), Anamnes (medical history, heredity, current symptoms), Status (physical examination and clinical findings), Bedömning (clinical assessment and diagnosis with ICD-10 codes), and Planering (treatment plan, prescriptions, referrals, follow-up). Written in Swedish using standard medical abbreviations (AT, BT, Cor, Pulm, Buk).
Streaming
Audio flows from the browser microphone through a WebSocket connection to the server, where Whisper ONNX processes it locally and streams interim results back in real time.
The browser captures microphone audio using the Web Audio API's AudioWorklet interface. Audio is downsampled to 16 kHz mono and encoded as PCM16 (signed 16-bit little-endian) before being sent as binary WebSocket frames to the server at ws://host/v1/stream.
On connection, the client sends a JSON configuration message specifying language and diarization preferences. The server creates a streaming transcription session with the configured provider and responds with a ready message. All subsequent binary frames are routed to the session's audio buffer.
The server accumulates incoming PCM16 audio into a buffer. At a configurable interval (default 5 seconds, set via STREAMING_WHISPER_INTERVAL_MS), the buffer is converted to Float32 and fed to the Whisper ONNX model via @huggingface/transformers. Segments shorter than 0.5 seconds are skipped. An RMS-based silence check prevents hallucinated transcriptions on quiet audio.
Each inference cycle sends a transcript message back over the WebSocket containing the current text, speaker ID, and finality flag. The browser UI displays these interim results live, updating in place until the segment is finalized. Non-final results are shown as in-progress text so the clinician sees transcription happening in real time.
When the RMS energy of incoming audio drops below the silence threshold for 1.5 seconds, the current segment is marked as final. The server sends an utterance_end message, commits the text to the transcript, and advances the buffer pointer. This creates natural sentence boundaries without requiring the clinician to press any buttons.
When the clinician stops recording (or the WebSocket closes), the server performs a final transcription pass on any remaining buffered audio. A session_end message is sent containing the full assembled transcript, speaker information, utterance count, and audio duration. The transcript is then passed to the configured note generation provider to produce a structured clinical note. Optional post-session diarization via a pyannote sidecar can refine speaker attribution.
CLI Tools
Two standalone CLI tools for scripting, batch processing, and integration into existing workflows. Pipe them together or use independently.
Transcribes audio files to text. Accepts an audio file path as an argument, or reads from stdin. Outputs plain text to stdout.
Flags:
--provider (transcription backend),
--language (language hint, e.g. "sv"),
--country (country code, e.g. "SE"),
--locale (full locale, e.g. "sv-SE"),
--mime-type (audio MIME type, auto-detected from extension),
--model (override provider model)
Generates a structured clinical note from a transcript. Reads transcript text from stdin, --transcript, or --transcript-file. Outputs JSON to stdout with noteText, sections, codingHints, followUpQuestions, and warnings.
Flags:
--provider (note backend),
--note-style (soap, hp, progress, dap, procedure, journal),
--specialty (medical specialty),
--custom-prompt (override system prompt),
--model (override provider model)
CLI tools are compiled with Bun into self-contained executables. They share the same provider and configuration logic as the web server, reading settings from environment variables or .env.
Configuration
Configure the server, CLI tools, and desktop app using environment variables. Set them in a .env file at the project root, or export them in your shell. The desktop app also provides a graphical Settings page.
| Variable | Description | Default |
|---|---|---|
| SCRIBE_MODE | Operating mode: api (cloud providers), local (Ollama + whisper.cpp), or hybrid (mix both) |
hybrid |
| PORT | HTTP server listen port | 8787 |
| TRANSCRIPTION_PROVIDER | Transcription backend: whisper-onnx, whisper.cpp, faster-whisper, openai, deepgram, google, berget |
depends on mode |
| NOTE_PROVIDER | Note generation backend: ollama, openai, anthropic, gemini |
depends on mode |
| DEFAULT_NOTE_STYLE | Default note format: soap, hp, progress, dap, procedure, journal |
journal |
| DEFAULT_SPECIALTY | Default medical specialty context (e.g. primary-care, cardiology, psychiatry) |
primary-care |
| DEFAULT_COUNTRY | Default country code for locale-aware behavior (e.g. SE, US) |
(empty) |
| ENABLE_WEB_UI | Enable the browser-based UI (set false for API-only mode) |
true |
| Variable | Description | Default |
|---|---|---|
| PHI_REDACTION_MODE | Protected Health Information redaction strategy: basic (regex patterns) or none |
basic |
| REDACT_BEFORE_API_CALLS | Apply PHI redaction before sending data to cloud providers | true |
| AUDIT_LOG_FILE | Path to an audit log file for tracking usage events | (disabled) |
| Variable | Description | Default |
|---|---|---|
| OPENAI_API_KEY | OpenAI API key for transcription and note generation | (required if using OpenAI) |
| OPENAI_NOTE_MODEL | OpenAI model for note generation | gpt-4.1-mini |
| OPENAI_TRANSCRIBE_MODEL | OpenAI model for transcription | gpt-4o-mini-transcribe |
| ANTHROPIC_API_KEY | Anthropic API key for Claude-based note generation | (required if using Anthropic) |
| ANTHROPIC_MODEL | Anthropic Claude model | claude-sonnet-4-20250514 |
| GEMINI_API_KEY | Google Gemini API key for note generation | (required if using Gemini) |
| GEMINI_MODEL | Gemini model | gemini-2.0-flash |
| OLLAMA_BASE_URL | Ollama server URL for local note generation | http://localhost:11434 |
| OLLAMA_MODEL | Ollama model for note generation | llama3.1:8b |
| DEEPGRAM_API_KEY | Deepgram API key for cloud transcription | (required if using Deepgram) |
| DEEPGRAM_MODEL | Deepgram transcription model | nova-3-medical |
| BERGET_API_KEY | Berget AI API key for EU-sovereign transcription | (required if using Berget) |
| BERGET_TRANSCRIBE_MODEL | Berget AI transcription model | KBLab/kb-whisper-large |
| Variable | Description | Default |
|---|---|---|
| STREAMING_TRANSCRIPTION_PROVIDER | Streaming transcription backend: whisper-stream, deepgram-stream, or mock-stream |
mock-stream |
| STREAMING_WHISPER_MODEL | ONNX Whisper model for streaming transcription | onnx-community/kb-whisper-large-ONNX |
| STREAMING_WHISPER_LANGUAGE | Language code for streaming Whisper inference | sv |
| STREAMING_WHISPER_INTERVAL_MS | Milliseconds between Whisper inference cycles during streaming | 5000 |
| DIARIZE_SIDECAR_URL | URL of the pyannote diarization sidecar service | http://localhost:8786 |
| DIARIZE_ON_END | Run speaker diarization after session ends | false |