Radar Brief Week 11, 2026 · 2026-02-09 — 2026-02-16

Robotics VLA Foundation Models Surge
Chinese LLM Alignment Demand Accelerates

This week scanned 86 HF orgs · 50 GitHub orgs · 71 blogs · 125 X accounts

0
Valuable Datasets
0
Related Papers
0
Blog Posts
0
Active Repos
One-line Summary

VLA/robotics foundation model papers surge with 4 in a single week, sim-to-real transfer becomes core bottleneck; TII UAE releases 4 evaluation datasets, Middle Eastern AI enters multilingual evaluation standard competition; Qwen 3.5 + GLM-4.6V + Ling-2.5-1T + MiniMax-2.5, scale competition and ecosystem expansion accelerate in parallel. Top data demand signal this week: Robotics VLA Trajectory Data.

Key Findings

This week's 5 high commercial value findings

P0 VLA/Robotics Foundation Model Papers Surge: 4 in a Single Week, Sim-to-Real Transfer Becomes Core Bottleneck (2026-02-04 to 2026-02-13)

4 high-quality papers emerged in embodied AI this week: GeneralVLA (2026-02-04, general VLA model + knowledge-guided trajectory planning), ABot-M0 (2026-02-11, robotics VLA foundation model + action manifold learning), RLinf-Co (2026-02-13, reinforcement learning-driven sim-real co-training), EgoHumanoid (2026-02-10, robot-free first-person perspective whole-body motion control). All 4 papers converge on the same core problem — how to achieve effective sim-to-real transfer using Vision-Language-Action (VLA) architectures. Continuing last week's trend of NVIDIA PhysicalAI + Allen AI MolmoSpaces embodied AI data expansion, this week shifts from "data supply" to "methodological breakthroughs."

Business implications: 1. Sim-real paired data becomes essential: RLinf-Co explicitly proposes sim-real co-training, requiring paired trajectory data for the same task in both simulated and real environments. Such data has virtually no public supply — a blank spot opportunity for data service providers. 2. First-person robotics data as new category: EgoHumanoid uses robot-free human first-person video to train whole-body motion control, meaning "human daily activity video" can be directly converted to robotics training data. Collection costs may drop dramatically, but annotation (action decomposition, joint mapping) barriers are extremely high. 3. VLA models demand extreme data diversity: GeneralVLA emphasizes "generalization" requiring knowledge guidance, ABot-M0 introduces action manifold learning — both need diverse trajectory data covering many different objects, scenes, and operations. Single-scene datasets have limited value; cross-scene generalization data becomes critical.
P0 TII UAE Releases 4 Evaluation Datasets, Middle Eastern AI Enters Multilingual Evaluation Standard Competition (2026-02-16)

UAE's Technology Innovation Institute (TII) released 4 datasets this week: tiiuae/NativeQA (evaluation, 16 downloads, 2 likes), tiiuae/NativeQA-RDP (evaluation, 22 downloads), tiiuae/SyntheticQA (synthetic, 30 downloads, 2 likes), tiiuae/evalplus-arabic (Arabic code evaluation, 46 downloads, 1 like). NativeQA and NativeQA-RDP focus on native language QA evaluation, evalplus-arabic extends code evaluation to Arabic, and SyntheticQA provides a synthetic QA baseline. The 4 datasets form a complete "native language + synthetic control + code evaluation" evaluation matrix.

Business implications: 1. Multilingual evaluation standard fragmentation accelerating: TII's evalplus-arabic is the first Arabic code evaluation benchmark, breaking the English-dominated code evaluation landscape. As more language-specific evaluation benchmarks emerge, model vendors will need separate evaluations per language, multiplying demand for multilingual evaluation data. 2. "Native" vs. "synthetic" evaluation comparison becoming paradigm: The NativeQA + SyntheticQA combination suggests TII is systematically verifying quality gaps between synthetic and native data. This methodology may be widely adopted, spawning demand for "native data quality certification" services. 3. Middle Eastern AI investment data spillover: TII is backed by UAE sovereign funds; sustained investment signals the Middle East will become a major demand center for multilingual (especially Arabic) AI data. Data service providers should build Arabic + right-to-left text processing data capabilities.
P1 Chinese LLMs in Dense Release: Qwen 3.5 + GLM-4.6V + Ling-2.5-1T + MiniMax-2.5, Scale Competition and Ecosystem Expansion Accelerate (2026-02-12 to 2026-02-16)

Four significant events in Chinese LLMs this week: Reddit community confirms Qwen 3.5 imminent release (80 upvotes); Zhipu AI officially open-sources GLM-4.6V, positioned as "the world's best open-source visual reasoning model at the 100B level"; inclusionAI/Ling-2.5-1T trillion-parameter model listed on HuggingFace (69 upvotes); MiniMax-2.5 achieves local running (389 upvotes, among the week's highest Reddit AI topics). Meanwhile, Qwen ecosystem continues expanding: Qwen3Guard (real-time token safety filtering), GSPO (scalable RL training), Qwen-Image-Edit (image editing), Qwen-MT (multilingual translation) — four product lines advancing simultaneously.

Business implications: 1. Chinese LLM alignment data demand about to surge: Qwen 3.5, GLM-4.6V, and Ling-2.5-1T — three ultra-large models simultaneously entering alignment phase — each requiring massive high-quality Chinese preference data. Alignment data supply will become a bottleneck. 2. Visual reasoning data gap emerges: GLM-4.6V as a visual reasoning model needs "image + reasoning chain" paired data, which is extremely scarce in Chinese. Data service providers should prioritize Chinese visual reasoning labeling. 3. Local deployment trend changes data demand: MiniMax-2.5 local running (389 upvotes) + Qwen3-Coder-Next 80B requiring only 8GB VRAM (95 upvotes) suggest consumer-grade hardware model deployment is going mainstream, catalyzing demand for "edge scenario fine-tuning data" — lightweight task data constrained by consumer hardware.
P1 RLVR Training Data Detection Becomes New Topic, RL Training Data Security Audit Demand Emerges (2026-02-12)

The paper "Detecting RLVR Training Data via Structural Convergence of Reasoning" (2026-02-12) proposes detecting whether a model used specific RL training data through structural convergence of reasoning. This is academia's first systematic study on reverse-engineering RL training data sources from model outputs. Concurrently, papers P-GenRM (personalized generative reward model) and GSPO (scalable RL training) continue pushing RL/RLHF methodology boundaries.

Business implications: 1. RL training data traceability becomes compliance requirement: If training data sources can be detected from model outputs, unauthorized use of others' data for RL training faces legal risk. Data service providers should offer "RL training data provenance certification" to prove legitimate data sourcing. 2. Data watermarking and fingerprinting demand: Data suppliers can embed detectable structural features in RL training data for post-hoc usage verification, creating a new product category of "watermarked RL training data." 3. Three consecutive weeks of RLHF/RL surge: W09 (6 papers) to W10 (7 papers) to W11 (RLVR detection + P-GenRM + GSPO + Frankenstein analysis) — RL training data quality, security, and compliance requirements are systematically upgrading.
P2 Allen AI asta-summary-citation-counts Opens New Paradigm for Agent Behavior Data (2026-02-16)

Allen AI released allenai/asta-summary-citation-counts (agent_tool, 308 downloads, 7 likes), a dataset tracking the most-cited papers and their citation counts by Asta — an agentic research RAG platform. This is the first case of converting AI Agent information retrieval behavior into a structured dataset. Meanwhile, allenai/molmospaces maintains 24.8% weekly growth (117 to 146 downloads), with embodied AI open ecosystem continuing to expand.

Business implications: 1. Agent behavior data becomes new category: asta-summary-citation-counts marks "what Agents do" itself becoming valuable data. As Agents penetrate research, coding, and decision-making domains, Agent behavior logs, decision trajectories, and tool-call patterns will all become tradeable data assets. 2. RAG citation preference data's commercial value: This dataset reveals AI research Agent citation preferences; academic publishers and research institutions can use it to optimize content strategy. Data service providers can offer "citation quality evaluation data" for RAG systems. 3. MolmoSpaces growth validates embodied AI data adoption: Two consecutive weeks of 20%+ growth (W10: +37.6%, W11: +24.8%) show Allen AI's embodied AI data standards are gaining community consensus.

Demand Signals

Infer training data demands from model releases

Data Type Intensity Trend Related Signals
Robotics VLA Trajectory Data Very Strong ↑ New 4 VLA papers in a single week; Allen AI MolmoSpaces +24.8% sustained growth; NVIDIA Isaac-GR00T 6.2K stars; BAAI Imagine2Act; Datatang embodied AI positioning
RL Training/Alignment Data Very Strong ↑ New Three consecutive weeks of RLHF/RL paper surge; Qwen GSPO scalable RL; RL training data detection becomes new topic
Chinese LLM Alignment Data Very Strong ↑ New Qwen 3.5 + GLM-4.6V + Ling-2.5-1T three ultra-large models simultaneously entering alignment phase; MiniMax-2.5 local deployment needs lightweight alignment data; Chinese visual reasoning labeling extremely scarce
Multilingual Evaluation Data Strong ↑ New TII UAE 4 Arabic evaluation datasets; Qwen-MT multilingual translation; Hebrew Wikipedia 11M corpus; Arabic code evaluation appears for first time
Agent Behavior/Trajectory Data Strong ↑ New Allen AI asta-summary-citation-counts pioneers Agent behavior data; Mistral Devstral 2 + Vibe CLI coding Agent; NVIDIA NeMo-Agent-Toolkit 1.8K stars
Real-time Safety Labeling Data Strong ↑ New Qwen3Guard real-time token safety filtering; NVIDIA garak LLM security scanner 7K stars; RLVR training data detection paper hints at security audit demand
Visual Reasoning Data Medium ↑ New GLM-4.6V open-source visual reasoning model; OneVision-Encoder multimodal encoder; Paper "What does RL improve for Visual Reasoning"; MetaphorStar image metaphor RL
Sim-to-Real Paired Data Medium ↑ New RLinf-Co explicitly proposes sim-real co-training; EgoHumanoid robot-free first-person demonstration; Public paired datasets nearly nonexistent
Audio/Speech Data Medium ↑ New Mistral Voxtral Transcribe sonic-speed transcription; Datatang Dolphin 40 languages ongoing promotion
Image Editing Instruction Data Medium ↑ New Qwen-Image-Edit image editing model; Light4D 4D video relighting; DeepGen 1.0 multimodal generation editing
Code Agent Trajectory Data ↓ Dropped Present in previous issue, absent this issue
Robotics Demonstration Data ↓ Dropped Present in previous issue, absent this issue
Multimodal Video Data ↓ Dropped Present in previous issue, absent this issue
RLHF/Preference Data ↓ Dropped Present in previous issue, absent this issue
Synthetic Data ↓ Dropped Present in previous issue, absent this issue
Math Reasoning Data ↓ Dropped Present in previous issue, absent this issue
Evaluation Benchmark Data ↓ Dropped Present in previous issue, absent this issue
Multilingual Speech Data ↓ Dropped Present in previous issue, absent this issue
3D Scene/Asset Data ↓ Dropped Present in previous issue, absent this issue
Long-Context Data ↓ Dropped Present in previous issue, absent this issue

Download Movers

Datasets with the largest download changes this week

Dataset Downloads Weekly Growth
allenai/molmospaces 146 +24.8%

Deep Dive — DataRecipe

This week's 2 high-value datasets reverse-analyzed (auto-generated by DataRecipe)

facebook/EgoAVU_data
300 samples · 6 fields · Medium
6.0/10
🟢 Recommended to Replicate

Data Structure

video_id start_time end_time question answer category

Risk Assessment

Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality thresholds
Low Risk Data may become outdated over time → Establish continuous update mechanisms
allenai/olmix
300 samples · 113 fields · Medium
6.5/10
🟢 Recommended to Replicate

Data Structure

run name index arc_challenge:rc::olmes arc_easy:rc::olmes basic_skills:rc::olmes basic_skills_arithmetic:rc::olmes basic_skills_coding:rc::olmes basic_skills_common_knowledge:rc::olmes

Risk Assessment

Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality thresholds
Low Risk Data may become outdated over time → Establish continuous update mechanisms

2 datasets analyzed this week · 83.9% human labor share · All Medium difficulty

Want to discuss this issue?

Kai" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
Kai Founder & CEO
苏文" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
苏文 AI 文档与发布工程师
陆明哲" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
陆明哲 AI 产品经理

Auto-generated by AI Dataset Radar · Updated weekly

AI Dataset Radar →