Voice & Audio AI

TTS, speech, music, voice cloning

48 tools · All-time leaderboard

← All Categories

What is Voice & Audio AI?

Voice & Audio AI covers 48 AI tools, with the top 10 averaging 75,747 community votes. 34 of the tools here are open source or have significant community traction. This page ranks them by all-time community votes, not by paid placement.

The current top three: yt-dlp, transformers, whisper. Each entry below shows the tool, its open-source stars or community size, and a short description from the project's own README. Click through to a full review for pricing, alternatives, and what it's actually good at.

Looking for something specific? Try the AI tool search engine — it indexes every tool on saas.pet and will surface what fits your workflow, not just what has the most votes.

🏆 Top 30 in Voice & Audio AI

yt-dlp

A feature-rich command-line audio/video downloader

★ 171,312 votes

transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

★ 161,696 votes

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

★ 102,982 votes

Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

★ 59,924 votes

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

★ 58,804 votes

LocalAI

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

★ 48,082 votes

TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

★ 45,582 votes

bark

🔊 Text-Prompted Generative Audio Model

★ 39,161 votes

Retrieval-based-Voice-Conversion-WebUI

Easily train a good VC model with voice data <= 10 mins!

★ 36,052 votes

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

★ 33,878 votes

datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

★ 21,792 votes

rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

★ 21,218 votes

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

★ 17,400 votes

OpenMontage

World's first open-source, agentic video production system. 12 pipelines, 52 tools, 500+ agent skills. Turn your AI coding assistant into a full video production studio.

★ 15,353 votes

pipecat

Open Source framework for voice and multimodal conversational AI

★ 12,878 votes

agents

A framework for building realtime voice AI agents 🤖🎙️📹

★ 11,033 votes

meetily

Privacy first, AI meeting assistant with 4x faster Parakeet/Whisper live transcription, speaker diarization, and Ollama summarization built on Rust. 100% local processing. no cloud required. Meetily (...

★ 8,885 votes

jukebox

Code for the paper "Jukebox: A Generative Model for Music"

★ 8,037 votes

awesome-llm-apps

100+ AI Agent & RAG apps you can actually run — clone, customize, ship.

★ 6,252 votes

silero-models

Silero Models: pre-trained text-to-speech models made embarrassingly simple

★ 5,970 votes

ai-website-cloner-template

Clone any website with one command using AI coding agents

★ 5,624 votes

voicebox

The open-source AI voice studio. Clone, dictate, create.

★ 3,336 votes

STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

★ 2,588 votes

speech-to-speech

Build local voice agents with open-source models

★ 788 votes

maths-cs-ai-compendium

Become a cracked AI/ML Research Engineer

★ 725 votes

SKI

Free voice coding for Claude Code, Codex and more

★ 595 votes

homerail

Voice-first local agent orchestration runtime for auditable DAG workflows.

★ 408 votes

$100 AI Music Video: Claude Fable 5 vs. GPT-5.6 Sol

$100 AI Music Video: Claude Fable 5 vs. GPT-5.6 Sol

★ 396 votes

Browser-BC

Agent behavior clone for browser using, targeting general GUI using and distributed trajectory collecting.

★ 354 votes

pocket-tts

A TTS that fits in your CPU (and pocket)

★ 235 votes