graveyard/raycast-voice

Fork 0

Local voice transcription for Raycast on Windows. Uses Whisper with CUDA GPU acceleration for fast, private transcription.

ai raycast raycast-extension transcription whisper

TypeScript 66.5%
Python 33.2%
Batchfile 0.2%
JavaScript 0.1%

Find a file

dikkadev bacf8ad34e feat(transcription): default to turbo whisper model with int8_float16 on cuda Promote the turbo Whisper model and mixed int8/float16 compute by default to give most users a better speed–quality tradeoff out of the box while keeping advanced model and device options available via preferences.		2026-01-28 13:00:39 +01:00
assets	feat(transcribe): add configurable auto-action after transcription	2025-11-29 21:41:53 +01:00
backend	feat(transcription): default to turbo whisper model with int8_float16 on cuda	2026-01-28 13:00:39 +01:00
extra-ui	feat(voice-gui): show transcription stats for completed recordings	2025-12-08 09:55:50 +01:00
src	feat(transcription): default to turbo whisper model with int8_float16 on cuda	2026-01-28 13:00:39 +01:00
.gitignore	feat(transcription): harden recording pipeline and UX for end-to-end transcription	2025-12-02 19:31:49 +01:00
.prettierrc	feat(voice): init raycast extension scaffold	2025-11-23 22:31:44 +01:00
bun.lock	chore(meta): adopt gplv3 and finalize extension documentation	2025-11-29 14:14:54 +01:00
eslint.config.js	feat(voice): init raycast extension scaffold	2025-11-23 22:31:44 +01:00
LICENSE	chore(meta): adopt gplv3 and finalize extension documentation	2025-11-29 14:14:54 +01:00
package.json	feat(transcription): default to turbo whisper model with int8_float16 on cuda	2026-01-28 13:00:39 +01:00
README.md	feat(history): add searchable transcription history with CSV export	2025-11-30 01:15:04 +01:00
tsconfig.json	feat(voice): init raycast extension scaffold	2025-11-23 22:31:44 +01:00
WIP-README.md	feat(error-handling): add contextual user-friendly error messages across transcription flow	2025-12-02 21:35:36 +01:00

README.md

Voice Transcription - Raycast Extension

A high-performance Raycast extension for Windows that records audio and transcribes it locally using Whisper (via faster-whisper) with CUDA GPU acceleration.

🚀 Features

Local Transcription: Privacy-focused, runs entirely on your machine.
CUDA Acceleration: Uses NVIDIA GPU for lightning-fast transcription.
One-Click Workflow: Record -> Transcribe -> Copy/Paste.
Auto-Start: Server automatically starts when you use the extension.
Auto-Action: Automatically paste or copy transcription when complete (configurable, can be skipped).
Transcription History: Automatically saves transcriptions for quick access. View, search, copy/paste previous transcriptions, and export to CSV.
Multi-Language: Auto-detects languages (Whisper supports 99+ languages).
Customizable: Choose your model size (Tiny to Large) and compute device (CUDA/CPU).
Visual Feedback: Real-time audio level indicator and recording timer.

📋 Prerequisites

Windows (The extension uses specific Windows APIs for process management).
Raycast for Windows.
Python 3.10+ installed.

UV (Python package manager) - Installation Guide.

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

NVIDIA GPU (Optional but recommended for speed).
- Requires CUDA Toolkit 12.x.
- Requires cuDNN 9.x (The backend automatically adds cuDNN to the path if found in standard locations).

🛠️ Installation

1. Clone and Install Extension

# Install frontend dependencies
bun install

2. Install Backend Dependencies

The backend handles recording and transcription. It uses uv for fast dependency management.

cd backend
uv sync

This installs:

faster-whisper: Optimized Whisper implementation.
fastapi & uvicorn: API server.
sounddevice: Audio recording.

🎮 Usage

Open Raycast and search for "Transcribe Voice".
Start Recording:
- If "Auto-start" is enabled (default), recording begins immediately.
- Otherwise, press Enter to start.
Speak your text. Watch the audio level indicator to see your input.
Stop: Press Enter again or select "Stop & Transcribe".
Result:
- If Auto-action is enabled (Paste or Copy), the action executes automatically when transcription completes.
- You can skip auto-action at any time:
  - While listening (during recording): Press Ctrl+A or use the "Skip Auto-Action" button. (Note: Enter stops recording, not skip.)
  - While transcribing (during processing): Press Enter or Ctrl+A or use the "Skip Auto-Action" button. (The skip action is the primary action, so Enter triggers it automatically.)
- The UI shows the configured auto-action (e.g., "🔄 Auto-paste") and displays it with strikethrough when disabled.
- Manual actions (if auto-action is skipped or disabled):
  - Paste: Press Enter or click "Paste" to paste directly into the active window.
  - Copy (Ctrl+C): Copy to clipboard.
  - Edit (Ctrl+E): Make corrections before copying or pasting.
  - View History: View your transcription history.
  - New Recording: Start a new recording session.
Skip Saving to History (Ctrl+S): Press during recording or processing to skip saving the current transcription to history (only if "Save to History" is enabled in preferences).

📚 Transcription History

The extension automatically saves your transcriptions to history (configurable via preferences). Access your history with the "View Transcription History" command:

View History: Search for "View Transcription History" in Raycast or use Cmd+H from the main transcription command.
Search: Use the built-in search to find specific transcriptions by text content.
Actions:
- Paste: Press Enter or click to paste the selected transcription directly.
- Copy (Ctrl+C): Copy transcription to clipboard.
- Export to CSV: Export all transcriptions to a CSV file (opens Windows save dialog).
- Delete (Ctrl+Backspace): Remove a single transcription.
- Clear All: Remove all saved transcriptions.
Metadata: Each entry includes language, model used, device (CUDA/CPU), audio duration, transcription time, and timestamp.

⚙️ Configuration

Go to Raycast Settings > Extensions > Voice Transcription to configure:

Setting	Description	Default
Backend Directory	Required. Path to the `backend` folder in this project.
Whisper Model	Model size (`tiny`, `base`, `small`, `medium`, `large-v3`). Larger models are more accurate but slower.	`base`
Compute Device	`CUDA` (GPU) or `CPU`. CUDA is significantly faster.	`CUDA`
Auto-start	Start recording immediately when the command is opened.	`true`
Auto-action After Transcription	Automatically execute `Paste` or `Copy` when transcription completes. Set to `None` to disable.	`None`
Save to History	Automatically save transcriptions to history for quick access later.	`true`
Return to Root After Action	Close Raycast after copy or paste actions.	`false`
Audio Level Polling Rate (ms)	How often to update audio level indicator (25-250ms). Set to `0` to disable.	`100`
Server Port	Port for the local transcription server.	`51234`

🏗️ Architecture

The extension operates as a hybrid system:

Frontend (TypeScript): A Raycast UI that manages the user interaction and state.
Backend (Python): A local FastAPI server managed by the frontend.
- Server Manager: The frontend checks if the server is running on the specified port. If not, it launches uv run python transcription_server.py.
- Recording: Audio is captured by the Python backend using sounddevice (PortAudio) to a temporary WAV file.
- Transcription: When recording stops, faster-whisper processes the WAV file and returns the text.

🔧 Troubleshooting

Server fails to start

Check the "Backend Directory" setting in Raycast preferences. It must point to the folder containing transcription_server.py.
Ensure uv is in your system PATH.
Run uv sync in the backend directory manually to ensure dependencies are installed.

"Model not loaded" or CUDA errors

If using CUDA, ensure you have NVIDIA drivers and CUDA Toolkit installed.
If CUDA fails, the server attempts to fallback to CPU. Check the extension display to see if it says cpu.
The first run might take longer as it downloads the Whisper model.

Recording issues

Ensure your default microphone is set correctly in Windows Sound Settings.
The backend uses sounddevice, which connects to the default input device.

📜 License

GNU GPLv3