Robotics VLA Foundation Models Surge
Chinese LLM Alignment Demand Accelerates
This week scanned 86 HF orgs · 50 GitHub orgs · 71 blogs · 125 X accounts
VLA/robotics foundation model papers surge with 4 in a single week, sim-to-real transfer becomes core bottleneck; TII UAE releases 4 evaluation datasets, Middle Eastern AI enters multilingual evaluation standard competition; Qwen 3.5 + GLM-4.6V + Ling-2.5-1T + MiniMax-2.5, scale competition and ecosystem expansion accelerate in parallel. Top data demand signal this week: Robotics VLA Trajectory Data.
Key Findings
This week's 5 high commercial value findings
4 high-quality papers emerged in embodied AI this week: GeneralVLA (2026-02-04, general VLA model + knowledge-guided trajectory planning), ABot-M0 (2026-02-11, robotics VLA foundation model + action manifold learning), RLinf-Co (2026-02-13, reinforcement learning-driven sim-real co-training), EgoHumanoid (2026-02-10, robot-free first-person perspective whole-body motion control). All 4 papers converge on the same core problem — how to achieve effective sim-to-real transfer using Vision-Language-Action (VLA) architectures. Continuing last week's trend of NVIDIA PhysicalAI + Allen AI MolmoSpaces embodied AI data expansion, this week shifts from "data supply" to "methodological breakthroughs."
UAE's Technology Innovation Institute (TII) released 4 datasets this week: tiiuae/NativeQA (evaluation, 16 downloads, 2 likes), tiiuae/NativeQA-RDP (evaluation, 22 downloads), tiiuae/SyntheticQA (synthetic, 30 downloads, 2 likes), tiiuae/evalplus-arabic (Arabic code evaluation, 46 downloads, 1 like). NativeQA and NativeQA-RDP focus on native language QA evaluation, evalplus-arabic extends code evaluation to Arabic, and SyntheticQA provides a synthetic QA baseline. The 4 datasets form a complete "native language + synthetic control + code evaluation" evaluation matrix.
Four significant events in Chinese LLMs this week: Reddit community confirms Qwen 3.5 imminent release (80 upvotes); Zhipu AI officially open-sources GLM-4.6V, positioned as "the world's best open-source visual reasoning model at the 100B level"; inclusionAI/Ling-2.5-1T trillion-parameter model listed on HuggingFace (69 upvotes); MiniMax-2.5 achieves local running (389 upvotes, among the week's highest Reddit AI topics). Meanwhile, Qwen ecosystem continues expanding: Qwen3Guard (real-time token safety filtering), GSPO (scalable RL training), Qwen-Image-Edit (image editing), Qwen-MT (multilingual translation) — four product lines advancing simultaneously.
The paper "Detecting RLVR Training Data via Structural Convergence of Reasoning" (2026-02-12) proposes detecting whether a model used specific RL training data through structural convergence of reasoning. This is academia's first systematic study on reverse-engineering RL training data sources from model outputs. Concurrently, papers P-GenRM (personalized generative reward model) and GSPO (scalable RL training) continue pushing RL/RLHF methodology boundaries.
Allen AI released allenai/asta-summary-citation-counts (agent_tool, 308 downloads, 7 likes), a dataset tracking the most-cited papers and their citation counts by Asta — an agentic research RAG platform. This is the first case of converting AI Agent information retrieval behavior into a structured dataset. Meanwhile, allenai/molmospaces maintains 24.8% weekly growth (117 to 146 downloads), with embodied AI open ecosystem continuing to expand.
Demand Signals
Infer training data demands from model releases
Download Movers
Datasets with the largest download changes this week
| Dataset | Downloads | Weekly Growth |
|---|---|---|
| allenai/molmospaces | 146 | +24.8% |
Deep Dive — DataRecipe
This week's 2 high-value datasets reverse-analyzed (auto-generated by DataRecipe)
Data Structure
Risk Assessment
Data Structure
Risk Assessment
2 datasets analyzed this week · 83.9% human labor share · All Medium difficulty
Want to discuss this issue?
Auto-generated by AI Dataset Radar · Updated weekly
AI Dataset Radar →