Local voice to text for macOS and Linux

Speak once. Clean text anywhere.

A fully local dictation app with a native overlay, on-device transcription, and polished text insertion across your desktop.

Install in one line

Copy and paste this into your terminal.

curl -fsSL https://raw.githubusercontent.com/cesp99/sussurro/master/scripts/install.sh | bash

View on GitHub Download Release

Built for the desktop, not the browser

The product is native where it counts.

Sussurro is designed around a simple desktop loop: trigger recording, speak, watch the state change, and keep working in the same app. The overlay, tray, settings window, and text injection are part of the app itself, not a browser shell dressed up as one.

Native overlay capsule, tray menu, and settings window on Linux and macOS
System-wide text injection so you can dictate into any focused app
Open-source Go codebase with native bindings instead of a browser wrapper

Fast enough to disappear

A compact pipeline with one job: turn speech into finished text.

Trigger recording

Use push-to-talk on Linux X11 and macOS, or set up a desktop shortcut on Wayland where global hotkeys are handled by the compositor.

Transcribe locally

Microphone audio is captured on-device and passed to Whisper.cpp, with the overlay reflecting idle, recording, and transcribing states in real time.

Clean and paste

A fine-tuned Qwen 3 cleanup model removes filler words, handles self-corrections, and pastes polished text back into the app you were already using.

Different functions, one local stack

Everything important is configurable, inspectable, and built for actual use.

Native overlay and desktop controls

The pill-shaped capsule stays visible above your workspace and mirrors recording state. You can open Settings or quit from the tray icon or by right-clicking the capsule.

Idle dots, live waveform bars, and a shimmer transcribing state
Settings UI for model downloads, language selection, hotkey changes, and mode switching
Headless `--no-ui` mode when you want terminal-only usage

Local speech pipeline

Sussurro keeps the full speech-to-text path on your own machine. Whisper handles recognition and the cleanup model refines the result before injection.

Whisper Small for speed or Whisper Large v3 Turbo for higher accuracy
Qwen 3 cleanup removes disfluencies, speech repairs, and rough punctuation
Clipboard + paste injection makes the result system-wide instead of app-specific

Configurable for real setups

The docs expose a practical configuration surface so the app can fit different desktops, languages, and model choices without feeling fragile.

Config file lookup through `-config`, current directory, and `~/.sussurro/config.yaml`
Language selection supports explicit codes or `auto` for Whisper detection
Environment variable overrides via the `SUSSURRO_` prefix

Open model stack, deliberate cleanup

Research-backed transcription quality without shipping your voice to the cloud.

The docs describe a modular architecture: audio capture, Whisper.cpp transcription, Qwen 3 cleanup, clipboard handling, and text injection. That means you can understand the path from microphone to final text instead of trusting a black box.

If you want more control, you can switch models, choose the transcription language, override config through environment variables, or run headless for CLI workflows and scripting.

Dictate into editors, terminals, chat apps, notes, and forms without changing workflow
Switch Whisper models depending on whether you want lighter runtime or better accuracy
Run `sussurro-transcribe` for batch audio transcription with the same local models
Compile from source and inspect the full pipeline if you need an auditable stack

Setup without guesswork

The docs make the first run path clear across macOS, X11, and Wayland.

macOS

Install, launch from a terminal, then grant Accessibility access when prompted so the global hotkey can work.

No extra runtime packages required.

Linux X11

Install the GTK, WebKit, and tray dependencies once, then use the built-in global hotkey with push-to-talk or toggle mode.

Default hotkey: `Ctrl+Shift+Space`.

Linux Wayland

Install `wl-clipboard`, then bind your desktop shortcut to the provided trigger script because Wayland does not allow app-managed global hotkeys.

One-time compositor setup required.

On first launch Sussurro guides you through downloading the required models. Whisper Small is the lighter option, Whisper Large v3 Turbo is the higher-accuracy option, and the Qwen 3 cleanup model is used to refine the final text.

Documentation map

The important docs are worth surfacing, not burying.

Quick StartInstall flow, first run, testing, and troubleshooting in under 5 minutes.ConfigurationConfig file structure, hotkey syntax, language options, and env overrides.Wayland SetupWhy Wayland needs compositor shortcuts and how to wire them up correctly.File Transcription CLIUse `sussurro-transcribe` for local audio-file transcription with optional cleanup.

FAQ

Questions people usually have before they install.

Does Sussurro send audio to the cloud?

No. The docs describe a fully local pipeline: Whisper.cpp handles speech recognition on-device and the Qwen 3 cleanup model runs locally before text is injected.

What platforms are supported?

The main app supports Linux and macOS. The overlay, settings window, tray, and headless mode are documented for both, with Wayland needing external shortcut setup.

How do hotkeys work on Wayland?

Wayland blocks application-managed global hotkeys by design. Sussurro works there by having you bind your desktop environment shortcut to the included trigger script or socket command.

Can I change models and transcription language later?

Yes. Settings lets you switch Whisper models, download models, and choose the transcription language. The config file also supports language codes, model paths, and environment overrides.

Is there a CLI for audio files?

Yes. `sussurro-transcribe` is the companion CLI for transcribing local audio files with optional LLM cleanup, output files, language overrides, and debug mode.

Do I need to build from source?

No for normal use. The install script downloads the correct release binary for your platform. Building from source is available if you want to compile the native UI or inspect the full stack.

Read the research

The Sussurro-specific research is public, inspectable, and linked here.

Qwen3-Sussurro2026 · Software · Carlo Esposito Qwen3-Sussurro: Efficient Speech-to-Text Post-Correction via Parameter-Efficient Fine-Tuning2026-02-16 · Journal article · Carlo Esposito

Support the developer

Donate directly to Carlo Esposito to help fund continued open-source work.

Sussurro is fully open source and built around a native desktop experience. If it becomes part of your workflow, donations go straight to the developer maintaining it.

Donate on Ko-fi

Ko-fiSupport Carlo on Ko-fi

Solana

7jizbu8GD2EGJyxRGxDsSayGD9CV5hkCmUfeYpwvAiNH

Ethereum

0xA6EAFb432c3bfF4BB1cBaCF19eABbb1da9F56488

Bitcoin

bc1qecdkhsh0fpx2zlp9x0smmme9xgwumltsf3m3ph

Stellar

GCHW7CSWW7VA4UZMPSXDHR5CKLZ5DQREUVCJZFXYCCWWJMXTUJUABUCN

Fully local. Fully open source.

Install Sussurro and keep your voice on your own machine.

Explore the repo Get the latest build