Voice Input
The Amurg UI supports voice dictation with two backends. The default mode uses the browser's built-in Web Speech API and requires no setup. For private, offline speech recognition, you can connect a local Whisper server.
Voice Modes
| Mode | Backend | Setup | Privacy |
|---|---|---|---|
| Browser (default) | Web Speech API (Chrome, Edge, Safari) | None — works out of the box | Audio may be sent to the browser vendor's cloud service |
| Local Whisper | Self-hosted Whisper ASR server via WebSocket | Run a Whisper server, configure the URL in settings | Fully local — audio never leaves your machine |
Switch modes via the gear icon next to the microphone button. Settings are saved in localStorage under the key amurg-voice.
How to Use
The microphone button supports two interaction styles:
| Gesture | Action |
|---|---|
| Hold to talk | Press and hold the mic button for more than 200ms. Recording stops when you release. |
| Tap to toggle | Quick-tap (under 200ms) to start recording, tap again to stop. Useful on mobile. |
Edit before send
Transcribed text is appended to the message input field — it is never sent automatically. You can review, edit, or add to it before pressing send. While you speak, a real-time interim preview is shown above the input field.
Visual Feedback
| Indicator | Meaning |
|---|---|
| Red mic button | Recording is active |
| Pulsing ring around button | Audio level visualization (scales with input volume) |
| Italic text above input | Interim transcription (partial, live as you speak) |
| Green ring on input field | Final transcription received (flashes briefly) |
Browser Mode
The default mode uses the browser's SpeechRecognition API (or webkitSpeechRecognition on Safari). It requires no server or configuration.
| Browser | Support |
|---|---|
| Chrome / Edge | Full support |
| Safari (iOS 14.5+, macOS) | Full support |
| Firefox | Not supported (mic button hidden) |
Language Detection
The browser mode uses navigator.language for speech recognition language, falling back to en-US. This means it automatically matches your browser's language setting.
Local Whisper Mode
For private, offline speech recognition, you can run a Whisper-compatible ASR server and point the UI at it. Audio never leaves your machine.
Setup
- Run a Whisper ASR server that accepts WebSocket connections and receives
audio/webmchunks. - Click the gear icon next to the microphone button in the Amurg UI.
- Select Local Whisper.
- Enter the WebSocket URL (e.g.
ws://localhost:8000/asr).
Protocol
The UI streams audio to the Whisper server in 250ms chunks using MediaRecorder with audio/webm;codecs=opus format. The server is expected to respond with JSON messages containing transcription results.
Expected server responses
The UI looks for a text or transcript field in the JSON response. It also recognizes partial/interim results via buffer, segments, is_final, and type: "partial" fields.
// Partial transcription (shown as interim preview)
{"type": "partial", "text": "hello wor"}
// Final transcription (appended to input field)
{"text": "hello world", "is_final": true}
// Alternative field names also accepted
{"transcript": "hello world"} Compatible Servers
Any Whisper ASR server that accepts WebSocket audio streaming and returns JSON with a text or transcript field will work. Popular options include whisper_streaming and WhisperLive.
Settings Storage
Voice settings are persisted in localStorage under the key amurg-voice as a JSON object:
{
"mode": "browser",
"whisperUrl": "ws://localhost:8000/asr"
} | Field | Type | Default | Description |
|---|---|---|---|
mode | string | "browser" | "browser" or "whisper" |
whisperUrl | string | "" | WebSocket URL for the Whisper server |
Troubleshooting
| Problem | Solution |
|---|---|
| No microphone button visible | Your browser does not support the Web Speech API (e.g. Firefox). Switch to Chrome, Edge, or Safari. |
| "Microphone access denied" toast | Grant microphone permission in your browser settings. On mobile, check app-level permissions too. |
| Recognition stops after ~60 seconds | Some mobile browsers kill long-running speech recognition. The UI auto-restarts it. Tap the mic again if needed. |
| Whisper mode shows no transcription | Check that the Whisper server is running and the WebSocket URL is correct. Open browser dev tools to inspect WebSocket frames. |
| Whisper mode: "Connection failed" | Ensure the Whisper server accepts WebSocket connections. If using HTTPS for the UI, the Whisper URL must also be wss:// (browsers block mixed content). |