A feature-rich command-line audio/video downloader
π€ Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Robust Speech Recognition via Large-Scale Weak Supervision
Clone a voice in 5 seconds to generate arbitrary speech in real-time
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
πΈπ¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
π Text-Prompted Generative Audio Model
Easily train a good VC model with voice data <= 10 mins!
π€ Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
π¬ Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Open Source framework for voice and multimodal conversational AI
A framework for building realtime voice AI agents π€ποΈπΉ
Code for the paper "Jukebox: A Generative Model for Music"
Silero Models: pre-trained text-to-speech models made embarrassingly simple
πΈSTT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
Music Assistant is a free, opensource Media library manager that connects to your streaming services and a wide range of connected speakers. The server is the beating heart, the core of Music Assistan...
Quartz turns Gmail into a focused inbox. It sorts every message by importance, and learns what matters to you over time. When you reply, it drafts in your own voice. And the AI runs entirely on your o...
Typing on the web has not evolved. Tapfree fixes that. Tapfree is a voice-first keyboard for Chrome text fields and ChromeOS that lets you write messages, notes, docs, and emails by speaking naturally...
Juno is a local, open-source voice writing app for Mac. It is the only voice dictation tool with live transcriptions. Speak naturally in Mail, Slack, Notes, Cursor, or the app youβre already using; Ju...
VoiceOS is the universal voice β action for your computer. Eliminates app-hopping, maximizes focus and productivity. Speak naturally, and VoiceOS instantly executes workflows while keeping you in cont...
Buddy is the most powerful AI design agent inside Figma. And if you already pay for ChatGPT, plug it in and chat for free. No AI credits. Generate screens, flows, and variants on your canvas. Clone an...
Tyto is a lightweight model that runs on your audio stream and predicts whether the audio reaching your agent will cause downstream failures. It outputs a single score plus a breakdown across six dime...
Labs AI is the fastest way to create professional voiceovers on iPhone. Powered by ElevenLabs: 100+ AI voices, 50+ languages, voice cloning. Use it for YouTube, TikTok, podcasts, e-learning, or any co...
MCP2000 is an AI-powered MPC that lives in your browser. Type what you want to hear ("dusty boom bap kit with crunchy snares," "8-bar afro-house shaker loop at 120 BPM") and it gen...
A comprehensive desktop application package for advanced ChatGPT interactions and management.
Official Python inference and LoRA trainer package for the LTX-2 audioβvideo generative model.