NVIDIA Releases 600-Hour Robotic Manipulation Dataset
AI Data Intelligence Weekly
This week scanned 86 HF orgs · 50 GitHub orgs · 71 blogs · 125 X accounts
NVIDIA releases 600-hour robotic manipulation dataset, physical AI data demand surges [P0], Allen AI releases research assistant citation tracking data, Agent tool data becomes new hotspot [P0], Anthropic releases economic impact index dataset, AI application evaluation becomes new demand [P1]. This week's strongest data demand signal: robotic manipulation trajectories.
Key Findings
This week's 5 high commercial value findings
NVIDIA released the PhysicalAI-Robotics-Kitchen-Sim-Demos dataset on 2026-02-10, containing 600 hours of human teleoperation demonstrations covering 316 different tasks with 55k trajectories total. Concurrently released PhysicalAI-Robotics-NuRec (50 likes) and Arena-GR1-Manipulation datasets form a complete robotic training data ecosystem.
The allenai/asta-summary-citation-counts dataset (released 2025-10-08, 456 downloads) tracks the most cited papers on the Asta research platform, reflecting AI Agent knowledge preferences in actual usage. This is the first public "Agent usage behavior" dataset.
Anthropic/EconomicIndex (released 2025-02-06, 11,995 downloads, 473 likes) provides insights into AI integration into actual tasks in the modern economy, including labor market impact and job exposure analysis. This is the first public dataset to systematically evaluate AI's economic impact.
google/WaxalNLP (released 2026-01-19, 10,345 downloads) is a large-scale multilingual speech corpus supporting automatic speech recognition and text-to-speech tasks. The dataset uses cc-by-sa-4.0 license, demonstrating focus on low-resource languages.
This week saw publication of 5 RLHF papers including ActiveUltraFeedback (active learning for optimizing preference data collection), wDPO (robust preference optimization), DARC (divergence-aware alignment). Research focus has shifted from "how to align" to "how to efficiently obtain high-quality preference data".
Demand Signals
Infer training data demands from model releases
Download Movers
Datasets with the largest download changes this week
| Dataset | Downloads | Weekly Growth |
|---|---|---|
| lerobot/berkeley_cable_routing | 1,784 | +19.9% |
| lerobot/aloha_static_fork_pick_up | 1,249 | +12.9% |
| google/WaxalNLP | 10,345 | +2.3% |
| Anthropic/EconomicIndex | 11,995 | +1.4% |
| lerobot/berkeley_gnm_recon | 1,194 | -25.6% |
Deep Dive — DataRecipe
This week's 3 high-value datasets reverse-analyzed (auto-generated by DataRecipe)
Data Structure
Risk Assessment
Data Structure
Risk Assessment
Data Structure
Risk Assessment
This week analyzed 3 datasets · Human ratio 99.6%
Want to discuss this issue?
Auto-generated by AI Dataset Radar · Updated weekly
AI Dataset Radar →