AI Dataset Trust & Quality Directory 2026

739,616 AI datasets indexed with trust scores. Check quality, licensing, and provenance before training.

DatasetTrustDownloadsLicense
documentation-images601.9M
PhysicalAI-Robotics-GR00T-X-Embodiment-Sim631.2M
mbpp59911.7K
code_contests58860.7K
wikitext59850.4K
code_clippy_github61697.3K
objaverse59560.6K
huginn-dataset62501.6K
gsm8k60462.1K
PhysicalAI-Autonomous-Vehicles59380.3K
ai2_arc62283.8K
MADLAD-40062258.8K
fineweb-edu59254.3K
fineweb60194.3K
LLaVA-OneVision-1.5-Mid-Training-85M61175.0K
swallow-code-v262174.3K
openai_humaneval59174.1K
fractal20220817_data_lerobot62125.9K
MegaMath62106.2K
ShareGPT_Vicuna_unfiltered61105.3K
WildChat-1M6098.9K
MATH-5006097.6K
FineVision6097.2K
legalbench5695.8K
imagenet-1k5795.7K
images5995.0K
openbookqa5785.8K
LLaVA-OneVision-1.5-Instruct-Data5983.6K
librispeech_asr5673.5K
wikipedia6072.4K
fineweb-25872.4K
ag_news5669.1K
xP36069.0K
OpenThoughts-114k6167.2K
IFEval5762.1K
common_corpus5760.0K
PhysicalAI-SmartSpaces5959.1K
Emilia-Dataset5758.8K
seamless-interaction6058.5K
us_election_2024_telegram_distilled5958.1K
Fineweb-Edu-Chinese-V2.15755.8K
Fineweb-Edu-Chinese-V2.25755.7K
the_cauldron5748.5K
xP3all5948.1K
yodas2_sidon5947.0K
databricks-dolly-15k-curated-en5946.7K
finetranslations5745.6K
VoiceAssistant-Eval5945.0K
sciq5742.6K
UltraData-Math5740.8K
CulturaY5938.3K
Nemotron-CC-v25738.1K
FineVisionMax5937.8K
aime_20245934.5K
finepdfs5834.4K
aime255934.2K
ultrachat_200k6133.8K
Leopard-Instruct5532.9K
DeepDialogue-xtts5932.4K
Taur_CoT_Analysis_Project___gpt-4o-2024-08-065932.2K
gdpval5732.1K
wikipedia5831.1K
PIN-200M5931.0K
peoples_speech5630.1K
emotion5430.0K
PhysicalAI-Autonomous-Vehicle-Cosmos-Drive-Dreams5929.5K
image-segmentation-toy-data5729.4K
autonomous-driving-carla5128.2K
Superior-Reasoning-SFT-gpt-oss-120b5927.8K
fleurs5726.7K
github-code5726.0K
diffusiondb5825.8K
tulu-3-sft-mixture5724.6K
P36023.0K
Superior-Reasoning-SFT-gpt-oss-120b-Logprob5922.6K
smollm-corpus5722.1K
RoboTwin2.05920.9K
LLaSO-Align5920.4K
PhysicalAI-Robotics-Manipulation-SingleArm5919.9K
R1-Code-Interpreter-Data5919.8K
prompts.chat6119.0K
LLaSO-Instruct5918.2K
OpenMathInstruct-25717.8K
rStar-Coder5717.7K
EmbedLLM5917.6K
BenchMAX_Science5917.0K
robocasa_merged_24_tasks_100demos_v15516.9K
MetaMathQA5616.9K
Fashion_Recommender5916.8K
apex-agents5916.1K
chinese-fineweb-edu5716.0K
Ko-StrategyQA5915.9K
databricks-dolly-15k6015.5K
finance-tasks5915.3K
food1015615.3K
math_qa6015.0K
LLaVA-Video-178K6014.9K
Nanochat5914.7K
ms_marco5714.6K
OpenMathReasoning5514.5K
We use cookies for analytics and caching. Privacy Policy