Video Understanding Data Surges
RLHF Enters the Multimodal Era
This week scanned 86 HF orgs · 50 GitHub orgs · 71 blogs · 125 X accounts
NVIDIA goes all-in on embodied AI data pipeline, Allen AI Molmo2 video understanding dataset cluster released, Reward Model / RLHF papers surge. Strongest data demand signal this week: Robotic Manipulation Data.
Key Findings
This week's 5 high commercial value findings
NVIDIA released/updated 7 datasets + 26 models this week, the most active of all organizations. Datasets focus on two directions: Robotics simulation: `nvidia/PhysicalAI-Robotics-Kitchen-Sim-Demos` (2/10), `nvidia/RoboCasa-Cosmos-Policy`, `nvidia/LIBERO-Cosmos-Policy` — all serving the Cosmos Policy project, building a closed-loop from simulation to policy learning; Speech TN/ITN: `nvidia/Numb3rs` (2/6) — speech numeral normalization benchmark
Allen AI released 4 video-related datasets: `Molmo2-VideoPoint`, `Molmo2-VideoPointEval`, `Molmo2-VideoCountEval`, `Molmo2-CapEval`, forming a complete video grounding + counting + captioning evaluation framework. Additionally, `pointer-retrieval` (new on 2/10) and `asta-summary-citation-counts` serve as utility datasets.
Eight RLHF/preference learning papers this week, with key trends: `compar:IA` (2/6) — French government-level LLM arena collecting French preference data, multilingual RLHF data demand has officially reached the national level; `WildReward` (2/9) — mining implicit reward signals from online interactions to reduce human labeling costs; `Fairness Aware Reward Optimization` (2/8) — demographic biases propagate through reward models, creating demand for fairness labeling; `Joint Reward Modeling` (2/7) — visual reward models for image editing, expanding multimodal RLHF data demand
StepFun released the `Step-3.5-Flash` model (249K downloads, 560 likes), along with: `stepfun-ai/GEBench` (2/9) — GUI interaction generation evaluation benchmark; `stepfun-ai/CF-Div2-Stepfun` (2/9) — competitive programming evaluation benchmark
GPT-5.3-Codex went live (2/5), focused on code generation; OpenAI's blog announced testing ChatGPT ads (2/10); the `openai/gdpval` dataset is active (28,361 downloads) — evaluating AI performance across 44 professions and 220 real-world tasks
Demand Signals
Infer training data demands from model releases
Download Movers
Datasets with the largest download changes this week
| Dataset | Downloads | Weekly Growth |
|---|---|---|
| nvidia/RoboCasa-Cosmos-Policy | 1,332 | +39.6% |
| Qwen/RationaleRM | 881 | +16.8% |
| nvidia/HiLiftAeroML | 992 | +16.2% |
| google/WaxalNLP | 7,465 | +2.6% |
| nvidia/LIBERO-Cosmos-Policy | 2,221 | +2.2% |
Deep Dive — DataRecipe
This week's 3 high-value datasets reverse-analyzed (auto-generated by DataRecipe)
Data Structure
Risk Assessment
Data Structure
Risk Assessment
Data Structure
Risk Assessment
Analyzed 3 datasets this week · 83.9% human effort · all Hard difficulty
Want to discuss this issue?
Auto-generated by AI Dataset Radar · Updated weekly
AI Dataset Radar →