Voice & Audio AI

TTS, speech, music, voice cloning

27 tools Β· All-time leaderboard

← All Categories

πŸ† Top 27 in Voice & Audio AI

πŸ₯‡
GitHub

yt-dlp

A feature-rich command-line audio/video downloader

β˜… 171,312 votes
πŸ’¬ 0
πŸ₯ˆ
GitHub

transformers

πŸ€— Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

β˜… 161,696 votes
πŸ’¬ 0
πŸ₯‰
GitHub

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

β˜… 102,982 votes
πŸ’¬ 0
#4
GitHub

Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

β˜… 59,924 votes
πŸ’¬ 0
#5
GitHub

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

β˜… 58,804 votes
πŸ’¬ 0
#6
GitHub

TTS

πŸΈπŸ’¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

β˜… 45,582 votes
πŸ’¬ 0
#7
GitHub

bark

πŸ”Š Text-Prompted Generative Audio Model

β˜… 39,161 votes
πŸ’¬ 0
#8
GitHub

Retrieval-based-Voice-Conversion-WebUI

Easily train a good VC model with voice data <= 10 mins!

β˜… 36,052 votes
πŸ’¬ 0
#9
GitHub

diffusers

πŸ€— Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

β˜… 33,878 votes
πŸ’¬ 0
#10
GitHub

rasa

πŸ’¬ Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

β˜… 21,218 votes
πŸ’¬ 0
#11
GitHub

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

β˜… 17,400 votes
πŸ’¬ 0
#12
GitHub

pipecat

Open Source framework for voice and multimodal conversational AI

β˜… 12,878 votes
πŸ’¬ 0
#13
GitHub

agents

A framework for building realtime voice AI agents πŸ€–πŸŽ™οΈπŸ“Ή

β˜… 11,033 votes
πŸ’¬ 0
#14
GitHub

jukebox

Code for the paper "Jukebox: A Generative Model for Music"

β˜… 8,037 votes
πŸ’¬ 0
#15
GitHub

silero-models

Silero Models: pre-trained text-to-speech models made embarrassingly simple

β˜… 5,970 votes
πŸ’¬ 0
#16
GitHub

STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

β˜… 2,588 votes
πŸ’¬ 0
#17
GitHub

server

Music Assistant is a free, opensource Media library manager that connects to your streaming services and a wide range of connected speakers. The server is the beating heart, the core of Music Assistan...

β˜… 865 votes
πŸ’¬ 0
πŸ“… 1d
#18
Product Hunt

Quartz

Quartz turns Gmail into a focused inbox. It sorts every message by importance, and learns what matters to you over time. When you reply, it drafts in your own voice. And the AI runs entirely on your o...

β˜… 191 votes
πŸ’¬ 45
πŸ“… 1d
#19
Product Hunt

Tapfree for Chrome

Typing on the web has not evolved. Tapfree fixes that. Tapfree is a voice-first keyboard for Chrome text fields and ChromeOS that lets you write messages, notes, docs, and emails by speaking naturally...

β˜… 121 votes
πŸ’¬ 29
πŸ“… 1d
#20
Product Hunt

Juno

Juno is a local, open-source voice writing app for Mac. It is the only voice dictation tool with live transcriptions. Speak naturally in Mail, Slack, Notes, Cursor, or the app you’re already using; Ju...

β˜… 121 votes
πŸ’¬ 21
πŸ“… 1d
#21
Product Hunt

VoiceOS

VoiceOS is the universal voice β†’ action for your computer. Eliminates app-hopping, maximizes focus and productivity. Speak naturally, and VoiceOS instantly executes workflows while keeping you in cont...

β˜… 108 votes
πŸ’¬ 15
πŸ“… 1d
#22
Product Hunt

Buddy

Buddy is the most powerful AI design agent inside Figma. And if you already pay for ChatGPT, plug it in and chat for free. No AI credits. Generate screens, flows, and variants on your canvas. Clone an...

β˜… 107 votes
πŸ’¬ 12
πŸ“… 1d
#23
Product Hunt

Tyto by ai-coustics

Tyto is a lightweight model that runs on your audio stream and predicts whether the audio reaching your agent will cause downstream failures. It outputs a single score plus a breakdown across six dime...

β˜… 89 votes
πŸ’¬ 15
πŸ“… 1d
#24
Product Hunt

Labs AI

Labs AI is the fastest way to create professional voiceovers on iPhone. Powered by ElevenLabs: 100+ AI voices, 50+ languages, voice cloning. Use it for YouTube, TikTok, podcasts, e-learning, or any co...

β˜… 89 votes
πŸ’¬ 2
πŸ“… 1d
#25
Product Hunt

MCP 2000

MCP2000 is an AI-powered MPC that lives in your browser. Type what you want to hear ("dusty boom bap kit with crunchy snares," "8-bar afro-house shaker loop at 120 BPM") and it gen...

β˜… 80 votes
πŸ’¬ 3
πŸ“… 1d
#26
GitHub

ChatGPT-Desktop-Free-2026

A comprehensive desktop application package for advanced ChatGPT interactions and management.

β˜… 74 votes
πŸ’¬ 0
πŸ“… 1d
#27
GitHub

LTX-2

Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.

β˜… 47 votes
πŸ’¬ 94
πŸ“… 1d