Data, ML & Research

Training, fine-tuning, datasets, MLOps

87 tools · All-time leaderboard

← All Categories

What is Data, ML & Research?

Data, ML & Research covers 87 AI tools, with the top 10 averaging 106,258 community votes. 74 of the tools here are open source or have significant community traction. This page ranks them by all-time community votes, not by paid placement.

The current top three: tensorflow, ollama, transformers. Each entry below shows the tool, its open-source stars or community size, and a short description from the project's own README. Click through to a full review for pricing, alternatives, and what it's actually good at.

Looking for something specific? Try the AI tool search engine — it indexes every tool on saas.pet and will surface what fits your workflow, not just what has the most votes.

🏆 Top 30 in Data, ML & Research

tensorflow

An Open Source Machine Learning Framework for Everyone

★ 195,729 votes

ollama

Get up and running with Kimi-K2.6, GLM-5.1, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

★ 174,448 votes

transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

★ 161,696 votes

llama.cpp

LLM inference in C/C++

★ 117,132 votes

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

★ 100,805 votes

scikit-learn

scikit-learn: machine learning in Python

★ 66,360 votes

keras

Deep Learning for humans

★ 64,096 votes

openinterpreter

A lightweight coding agent for open models like Deepseek, Kimi, and Qwen

★ 64,041 votes

llama

Inference code for Llama models

★ 59,467 votes

GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

★ 58,804 votes

LocalAI

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

★ 48,082 votes

TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

★ 45,582 votes

bert

TensorFlow code and pre-trained models for BERT

★ 40,030 votes

quivr

Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore:...

★ 39,166 votes

Langchain-Chatchat

Langchain-Chatchat（原Langchain-ChatGLM）基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Ll...

★ 38,187 votes

Retrieval-based-Voice-Conversion-WebUI

Easily train a good VC model with voice data <= 10 mins!

★ 36,052 votes

khoj

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI ...

★ 35,190 votes

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

★ 33,878 votes

pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.

★ 31,191 votes

llama3

The official Meta Llama 3 GitHub site

★ 29,287 votes

repomix

📦 Repomix is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) or other AI tools like ...

★ 27,553 votes

awesome-generative-ai-guide

A one stop repository for generative AI research updates, interview resources, notebooks and much more!

★ 27,324 votes

mlflow

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controllin...

★ 27,313 votes

Qwen3

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

★ 27,310 votes

datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

★ 21,792 votes

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

★ 21,307 votes

rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

★ 21,218 votes

trl

Train transformer language models with reinforcement learning.

★ 18,665 votes

llama-cookbook

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model f...

★ 18,544 votes

DocsGPT

Private AI platform for agents, assistants and enterprise search. Built-in Agent Builder, Deep research, Document analysis, Multi-model support, and API connectivity for agents.

★ 18,186 votes