Meet OmniVoice Studio: The Local, Open Source Alternative at ElevenLabs

0 2 3 minutes read

Meet OmniVoice Studio: The Local, Open Source Alternative at ElevenLabs

OmniVoice Studio – How to Use it
01 / 08

What is OmniVoice Studio?

OmniVoice Studio is an app an open source desktop application voice cloning, video copying, real-time calling, and speaker dyeing. Everything works locally on your device. No API keys, no cloud account, no registration required.

646 languages TTS is supported by the default OmniVoice engine
99 languages by writing with WhisperX
Available at macOS, Windows, and Linux
GPU is optional – the full pipeline runs on the CPU
Free for personal use, teaching, and research (FSL-1.1-ALv2)

OmniVoice Studio – How to Use it
02 / 08

System Requirements

GPU is optional. With one exception, TTS works approx 3× less on the CPU. With ≤8 GB VRAM, TTS automatically loads on the CPU during transcription — no configuration required.

Element	The minimum	Recommended
OS	Win 10 / macOS 12+ / Ubuntu 20.04+	Any modern 64-bit OS
RAM	8GB	16GB+
VRAM	4 GB (autoload)	8 GB+ (RTX 3060+)
The disc	10 GB for free	20 GB+ SSD
Python	3.10+	3.11–3.12
The GPU	Optional	CUDA / MPS / ROCm

OmniVoice Studio – How to Use it
03 / 08

Installation

The project recommends that you start with the source. Enter the three requirements first: ffmpeg, The Bun (JS runtime), and uv (Python package manager).

git clone cd OmniVoice-Studio uv sync bun install bun dev

It's in the front loading to | API it works on port 8000.
Model weights are automatically downloaded from the first generation.

Prebuilt installers are available: macOS DMG, Windows MSI, Linux AppImage and .deb — see the Releases page on GitHub.

OmniVoice Studio – How to Use it
04 / 08

Voice Cloning

Use of Voice cloning zero shot reading — joins the word in a short clip like 3 secondswithout prior training in that voice. The default OmniVoice engine places a broadcast-based TTS model on the reference audio.

Go to Voice Clone tab in the UI
Upload or record a 3 second audio clip of the target word
Enter your text and select the target language (646 available)
Click Produce — the output is saved to your project library

Voice Gallery: Search YouTube, browse categories, and download reference clips right within the app to build your voice library.

OmniVoice Studio – How to Use it
05 / 08

Video Copying

The full copy pipeline works in place: write → translate → synthesize → mux. Demucs separates the vocals so that the original background sound is preserved in the final export.

Go to Dub tab — paste a YouTube URL or upload a local file
WhisperX records speech with word level compatibility
Select the target language; translation starts automatically
The TTS engine repeats the transcription; Demucs keep background noise
Take out the last one MP4 with integrated integrated sound

Collection line: Stream up to 50 videos and go. Each task has its own tracking bar with a full pipeline.

OmniVoice Studio – How to Use it
06/08

Dictation & Speaker Diarization

Summoning works in a wide range of applications from any operating system. Dialing identifies individual speakers in a multi-speaker audio file using Pyannote + WhisperX.

Press ⌘+⇧+Space (macOS) to open the floating call widget
The expression broadcasts via WebSocket and automatically attaches to the active input field
Upload a multi-speaker file to Dialing tab
Pianonote points who said; each speaker gets an automatically generated voice profile
Assign a TTS voice to each speaker by duplicating each speaker

An equivalent face token is required because Piannote diarization. See docs/setup/huggingface-token.md in the repo.

OmniVoice Studio – How to Use it
07/08

TTS engines

Six TTS engines are built in. Change using Settings → TTS Engine or env var:
OMNIVOICE_TTS_BACKEND=cosyvoice

Engine	Languages	Clone	The platform
OmniVoice (default)	600+	✓	CUDA / MPS / CPU
CozyVoice 3	9 + 18 dialects	✓	CUDA / MPS / CPU
MLX-Audio	More	It varies	Apple Silicon only
VoxCPM2	30	✓	CUDA / MPS / CPU
MOSS-TTS-Nano	20	✓	CUDA / CPU
KittenTTS	English	✗	CPU only

Custom engine: Subclass TTSBakend in backend/services/tts_backend.py and add it to _REGISTRY. ~ 50 lines of Python.

OmniVoice Studio – How to Use it
08/08

MCP Server and Services

OmniVoice Studio sends built-in MCP serverwhich exposes voice and copy capabilities to any MCP-compatible client – Claude, Cursor, or your tools – without opening the desktop UI.

The MCP server starts with the FastAPI backend enabled dev
Point your MCP client to a local server to access all endpoints
AudioSeal (Meta) embeds an invisible neural watermark in every AI generated sound