Generative AI

Meet OmniVoice Studio: The Local, Open Source Alternative at ElevenLabs

OmniVoice Studio – How to Use it
01 / 08

What is OmniVoice Studio?

OmniVoice Studio is an app an open source desktop application voice cloning, video copying, real-time calling, and speaker dyeing. Everything works locally on your device. No API keys, no cloud account, no registration required.

  • 646 languages TTS is supported by the default OmniVoice engine
  • 99 languages by writing with WhisperX
  • Available at macOS, Windows, and Linux
  • GPU is optional – the full pipeline runs on the CPU
  • Free for personal use, teaching, and research (FSL-1.1-ALv2)

OmniVoice Studio – How to Use it
02 / 08

System Requirements

GPU is optional. With one exception, TTS works approx 3× less on the CPU. With ≤8 GB VRAM, TTS automatically loads on the CPU during transcription — no configuration required.

Element The minimum Recommended
OS Win 10 / macOS 12+ / Ubuntu 20.04+ Any modern 64-bit OS
RAM 8GB 16GB+
VRAM 4 GB (autoload) 8 GB+ (RTX 3060+)
The disc 10 GB for free 20 GB+ SSD
Python 3.10+ 3.11–3.12
The GPU Optional CUDA / MPS / ROCm

OmniVoice Studio – How to Use it
03 / 08

Installation

The project recommends that you start with the source. Enter the three requirements first: ffmpeg, The Bun (JS runtime), and uv (Python package manager).

git clone
cd OmniVoice-Studio
uv sync
bun install
bun dev

It's in the front loading to | API it works on port 8000.
Model weights are automatically downloaded from the first generation.

Prebuilt installers are available: macOS DMG, Windows MSI, Linux AppImage and .deb — see the Releases page on GitHub.

OmniVoice Studio – How to Use it
04 / 08

Voice Cloning

Use of Voice cloning zero shot reading — joins the word in a short clip like 3 secondswithout prior training in that voice. The default OmniVoice engine places a broadcast-based TTS model on the reference audio.

  • Go to Voice Clone tab in the UI
  • Upload or record a 3 second audio clip of the target word
  • Enter your text and select the target language (646 available)
  • Click Produce — the output is saved to your project library

Voice Gallery: Search YouTube, browse categories, and download reference clips right within the app to build your voice library.

OmniVoice Studio – How to Use it
05 / 08

Video Copying

The full copy pipeline works in place: write → translate → synthesize → mux. Demucs separates the vocals so that the original background sound is preserved in the final export.

  • Go to Dub tab — paste a YouTube URL or upload a local file
  • WhisperX records speech with word level compatibility
  • Select the target language; translation starts automatically
  • The TTS engine repeats the transcription; Demucs keep background noise
  • Take out the last one MP4 with integrated integrated sound

Collection line: Stream up to 50 videos and go. Each task has its own tracking bar with a full pipeline.

OmniVoice Studio – How to Use it
06/08

Dictation & Speaker Diarization

Summoning works in a wide range of applications from any operating system. Dialing identifies individual speakers in a multi-speaker audio file using Pyannote + WhisperX.

  • Press ⌘+⇧+Space (macOS) to open the floating call widget
  • The expression broadcasts via WebSocket and automatically attaches to the active input field
  • Upload a multi-speaker file to Dialing tab
  • Pianonote points who said; each speaker gets an automatically generated voice profile
  • Assign a TTS voice to each speaker by duplicating each speaker

An equivalent face token is required because Piannote diarization. See docs/setup/huggingface-token.md in the repo.

OmniVoice Studio – How to Use it
07/08

TTS engines

Six TTS engines are built in. Change using Settings → TTS Engine or env var:
OMNIVOICE_TTS_BACKEND=cosyvoice

Engine Languages Clone The platform
OmniVoice (default) 600+ CUDA / MPS / CPU
CozyVoice 3 9 + 18 dialects CUDA / MPS / CPU
MLX-Audio More It varies Apple Silicon only
VoxCPM2 30 CUDA / MPS / CPU
MOSS-TTS-Nano 20 CUDA / CPU
KittenTTS English CPU only

Custom engine: Subclass TTSBakend in backend/services/tts_backend.py and add it to _REGISTRY. ~ 50 lines of Python.

OmniVoice Studio – How to Use it
08/08

MCP Server and Services

OmniVoice Studio sends built-in MCP serverwhich exposes voice and copy capabilities to any MCP-compatible client – Claude, Cursor, or your tools – without opening the desktop UI.

  • The MCP server starts with the FastAPI backend enabled dev
  • Point your MCP client to a local server to access all endpoints
  • AudioSeal (Meta) embeds an invisible neural watermark in every AI generated sound
  • GitHub: github.com/debpalash/OmniVoice-Studio
  • Enter the documents: docs/install/ (macos/windows/linux/docker)
  • To solve the problem: docs/install/troubleshooting.md
  • Disagreements: discord.gg/bzQavDfVV9

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button