Code Agent Data Explosion
Embodied AI Data Standards Elevate
This week scanned 86 HF orgs · 50 GitHub orgs · 71 blogs · 125 X accounts
NVIDIA's full-stack embodied AI data pipeline, Allen AI Molmo2 video understanding dataset cluster release, Reward Model / RLHF paper surge. Top data demand signal this week: Robotics Manipulation Data.
Key Findings
This week's 5 high commercial value findings
NVIDIA released/updated 7 datasets + 26 models in a single week, the most active organization. Datasets focused on two directions: Robotics simulation: `nvidia/PhysicalAI-Robotics-Kitchen-Sim-Demos` (2/10), `nvidia/RoboCasa-Cosmos-Policy`, `nvidia/LIBERO-Cosmos-Policy` — all serving the Cosmos Policy project, building a closed loop from simulation to policy learning; Speech TN/ITN: `nvidia/Numb3rs` (2/6) — speech number normalization benchmark.
Allen AI released 4 video-related datasets: `Molmo2-VideoPoint`, `Molmo2-VideoPointEval`, `Molmo2-VideoCountEval`, `Molmo2-CapEval`, forming a complete video grounding + counting + captioning evaluation system. Also released `pointer-retrieval` (2/10, new) and `asta-summary-citation-counts`, two utility datasets.
8 RLHF/preference learning papers this week, key trends: `compar:IA` (2/6) — French government-level LLM arena collecting French preference data, multilingual RLHF data demand has officially entered the national level; `WildReward` (2/9) — mining implicit reward signals from online interactions, reducing human labeling costs; `Fairness Aware Reward Optimization` (2/8) — demographic biases propagate through reward models, creating fairness labeling demand; `Joint Reward Modeling` (2/7) — visual reward models for image editing, expanding multimodal RLHF data demand.
StepFun released `Step-3.5-Flash` (249K downloads, 560 likes) model, alongside: `stepfun-ai/GEBench` (2/9) — GUI interaction generation evaluation benchmark; `stepfun-ai/CF-Div2-Stepfun` (2/9) — competitive programming evaluation benchmark.
GPT-5.3-Codex launched (2/5), focused on code generation; OpenAI blog announced testing ChatGPT advertising (2/10); `openai/gdpval` dataset active (28,361 downloads) — evaluating AI performance across 44 occupations and 220 real-world tasks.
Demand Signals
Infer training data demands from model releases
Download Movers
Datasets with the largest download changes this week
| Dataset | Downloads | Weekly Growth |
|---|---|---|
| nvidia/RoboCasa-Cosmos-Policy | 1,332 | +39.6% |
| Qwen/RationaleRM | 881 | +16.8% |
| nvidia/HiLiftAeroML | 992 | +16.2% |
| google/WaxalNLP | 7,465 | +2.6% |
| nvidia/LIBERO-Cosmos-Policy | 2,221 | +2.2% |
Deep Dive — DataRecipe
This week's 3 high-value datasets reverse-analyzed (auto-generated by DataRecipe)
Data Structure
Risk Assessment
Data Structure
Risk Assessment
Data Structure
Risk Assessment
3 datasets analyzed this week · 83.9% human labor share · All Hard difficulty
Want to discuss this issue?
Auto-generated by AI Dataset Radar · Updated weekly
AI Dataset Radar →