W11 AI Data Intelligence

One-line Summary

VLA/robotics foundation model papers surge with 4 in a single week, sim-to-real transfer becomes core bottleneck; TII UAE releases 4 evaluation datasets, Middle Eastern AI enters multilingual evaluation standard competition; Qwen 3.5 + GLM-4.6V + Ling-2.5-1T + MiniMax-2.5, scale competition and ecosystem expansion accelerate in parallel. Top data demand signal this week: Robotics VLA Trajectory Data.

Key Findings

This week's 5 high commercial value findings

P0 VLA/Robotics Foundation Model Papers Surge: 4 in a Single Week, Sim-to-Real Transfer Becomes Core Bottleneck (2026-02-04 to 2026-02-13)

4 high-quality papers emerged in embodied AI this week: GeneralVLA (2026-02-04, general VLA model + knowledge-guided trajectory planning), ABot-M0 (2026-02-11, robotics VLA foundation model + action manifold learning), RLinf-Co (2026-02-13, reinforcement learning-driven sim-real co-training), EgoHumanoid (2026-02-10, robot-free first-person perspective whole-body motion control). All 4 papers converge on the same core problem — how to achieve effective sim-to-real transfer using Vision-Language-Action (VLA) architectures. Continuing last week's trend of NVIDIA PhysicalAI + Allen AI MolmoSpaces embodied AI data expansion, this week shifts from "data supply" to "methodological breakthroughs."

Business implications: 1. Sim-real paired data becomes essential: RLinf-Co explicitly proposes sim-real co-training, requiring paired trajectory data for the same task in both simulated and real environments. Such data has virtually no public supply — a blank spot opportunity for data service providers. 2. First-person robotics data as new category: EgoHumanoid uses robot-free human first-person video to train whole-body motion control, meaning "human daily activity video" can be directly converted to robotics training data. Collection costs may drop dramatically, but labeling (action decomposition, joint mapping) barriers are extremely high. 3. VLA models demand extreme data diversity: GeneralVLA emphasizes "generalization" requiring knowledge guidance, ABot-M0 introduces action manifold learning — both need diverse trajectory data covering many different objects, scenes, and operations. Single-scene datasets have limited value; cross-scene generalization data becomes critical.

P0 TII UAE Releases 4 Evaluation Datasets, Middle Eastern AI Enters Multilingual Evaluation Standard Competition (2026-02-16)

UAE's Technology Innovation Institute (TII) released 4 datasets this week: tiiuae/NativeQA (evaluation, 16 downloads, 2 likes), tiiuae/NativeQA-RDP (evaluation, 22 downloads), tiiuae/SyntheticQA (synthetic, 30 downloads, 2 likes), tiiuae/evalplus-arabic (Arabic code evaluation, 46 downloads, 1 like). NativeQA and NativeQA-RDP focus on native language QA evaluation, evalplus-arabic extends code evaluation to Arabic, and SyntheticQA provides a synthetic QA baseline. The 4 datasets form a complete "native language + synthetic control + code evaluation" evaluation matrix.

Business implications: 1. Multilingual evaluation standard fragmentation accelerating: TII's evalplus-arabic is the first Arabic code evaluation benchmark, breaking the English-dominated code evaluation landscape. As more language-specific evaluation benchmarks emerge, model vendors will need separate evaluations per language, multiplying demand for multilingual evaluation data. 2. "Native" vs. "synthetic" evaluation comparison becoming paradigm: The NativeQA + SyntheticQA combination suggests TII is systematically verifying quality gaps between synthetic and native data. This methodology may be widely adopted, spawning demand for "native data quality certification" services. 3. Middle Eastern AI investment data spillover: TII is backed by UAE sovereign funds; sustained investment signals the Middle East will become a major demand center for multilingual (especially Arabic) AI data. Data service providers should build Arabic + right-to-left text processing data capabilities.

P1 Chinese LLMs in Dense Release: Qwen 3.5 + GLM-4.6V + Ling-2.5-1T + MiniMax-2.5, Scale Competition and Ecosystem Expansion Accelerate (2026-02-12 to 2026-02-16)

Four significant events in Chinese LLMs this week: Reddit community confirms Qwen 3.5 imminent release (80 upvotes); Zhipu AI officially open-sources GLM-4.6V, positioned as "the world's best open-source visual reasoning model at the 100B level"; inclusionAI/Ling-2.5-1T trillion-parameter model listed on HuggingFace (69 upvotes); MiniMax-2.5 achieves local running (389 upvotes, among the week's highest Reddit AI topics). Meanwhile, Qwen ecosystem continues expanding: Qwen3Guard (real-time token safety filtering), GSPO (scalable RL training), Qwen-Image-Edit (image editing), Qwen-MT (multilingual translation) — four product lines advancing simultaneously.

Business implications: 1. Chinese LLM alignment data demand about to surge: Qwen 3.5, GLM-4.6V, and Ling-2.5-1T — three ultra-large models simultaneously entering alignment phase — each requiring massive high-quality Chinese preference data. Alignment data supply will become a bottleneck. 2. Visual reasoning data gap emerges: GLM-4.6V as a visual reasoning model needs "image + reasoning chain" paired data, which is extremely scarce in Chinese. Data service providers should prioritize Chinese visual reasoning labeling. 3. Local deployment trend changes data demand: MiniMax-2.5 local running (389 upvotes) + Qwen3-Coder-Next 80B requiring only 8GB VRAM (95 upvotes) suggest consumer-grade hardware model deployment is going mainstream, catalyzing demand for "edge scenario fine-tuning data" — lightweight task data constrained by consumer hardware.

P1 RLVR Training Data Detection Becomes New Topic, RL Training Data Security Audit Demand Emerges (2026-02-12)

The paper "Detecting RLVR Training Data via Structural Convergence of Reasoning" (2026-02-12) proposes detecting whether a model used specific RL training data through structural convergence of reasoning. This is academia's first systematic study on reverse-engineering RL training data sources from model outputs. Concurrently, papers P-GenRM (personalized generative reward model) and GSPO (scalable RL training) continue pushing RL/RLHF methodology boundaries.

Business implications: 1. RL training data traceability becomes compliance requirement: If training data sources can be detected from model outputs, unauthorized use of others' data for RL training faces legal risk. Data service providers should offer "RL training data provenance certification" to prove legitimate data sourcing. 2. Data watermarking and fingerprinting demand: Data suppliers can embed detectable structural features in RL training data for post-hoc usage verification, creating a new product category of "watermarked RL training data." 3. Three consecutive weeks of RLHF/RL surge: W09 (6 papers) to W10 (7 papers) to W11 (RLVR detection + P-GenRM + GSPO + Frankenstein analysis) — RL training data quality, security, and compliance requirements are systematically upgrading.

P2 Allen AI asta-summary-citation-counts Opens New Paradigm for Agent Behavior Data (2026-02-16)

Allen AI released allenai/asta-summary-citation-counts (agent_tool, 308 downloads, 7 likes), a dataset tracking the most-cited papers and their citation counts by Asta — an agentic research RAG platform. This is the first case of converting AI Agent information retrieval behavior into a structured dataset. Meanwhile, allenai/molmospaces maintains 24.8% weekly growth (117 to 146 downloads), with embodied AI open ecosystem continuing to expand.

Business implications: 1. Agent behavior data becomes new category: asta-summary-citation-counts marks "what Agents do" itself becoming valuable data. As Agents penetrate research, coding, and decision-making domains, Agent behavior logs, decision trajectories, and tool-call patterns will all become tradeable data assets. 2. RAG citation preference data's commercial value: This dataset reveals AI research Agent citation preferences; academic publishers and research institutions can use it to optimize content strategy. Data service providers can offer "citation quality evaluation data" for RAG systems. 3. MolmoSpaces growth validates embodied AI data adoption: Two consecutive weeks of 20%+ growth (W10: +37.6%, W11: +24.8%) show Allen AI's embodied AI data standards are gaining community consensus.

Demand Signals

Infer training data demands from model releases

Robotics VLA Trajectory Data

Critical ↑ New

4 VLA papers in a single week; Allen AI MolmoSpaces +24.8% sustained growth; NVIDIA Isaac-GR00T 6.2K stars; BAAI Imagine2Act; Datatang embodied AI positioning

RL Training/Alignment Data

Critical ↑ New

Three consecutive weeks of RLHF/RL paper surge; Qwen GSPO scalable RL; RL training data detection becomes new topic

Chinese LLM Alignment Data

Critical ↑ New

Qwen 3.5 + GLM-4.6V + Ling-2.5-1T three ultra-large models simultaneously entering alignment phase; MiniMax-2.5 local deployment needs lightweight alignment data; Chinese visual reasoning labeling extremely scarce

Multilingual Evaluation Data

High ↑ New

TII UAE 4 Arabic evaluation datasets; Qwen-MT multilingual translation; Hebrew Wikipedia 11M corpus; Arabic code evaluation appears for first time

Agent Behavior/Trajectory Data

High ↑ New

Allen AI asta-summary-citation-counts pioneers Agent behavior data; Mistral Devstral 2 + Vibe CLI coding Agent; NVIDIA NeMo-Agent-Toolkit 1.8K stars

Real-time Safety Labeling Data

High ↑ New

Qwen3Guard real-time token safety filtering; NVIDIA garak LLM security scanner 7K stars; RLVR training data detection paper hints at security audit demand

Visual Reasoning Data

Moderate ↑ New

GLM-4.6V open-source visual reasoning model; OneVision-Encoder multimodal encoder; Paper "What does RL improve for Visual Reasoning"; MetaphorStar image metaphor RL

Sim-to-Real Paired Data

Moderate ↑ New

RLinf-Co explicitly proposes sim-real co-training; EgoHumanoid robot-free first-person demonstration; Public paired datasets nearly nonexistent

Audio/Speech Data

Moderate ↑ New

Mistral Voxtral Transcribe sonic-speed transcription; Datatang Dolphin 40 languages ongoing promotion

Image Editing Instruction Data

Moderate ↑ New

Qwen-Image-Edit image editing model; Light4D 4D video relighting; DeepGen 1.0 multimodal generation editing

Code Agent Trajectory Data ↓ Dropped Present in previous issue, absent this issue

Robotics Demonstration Data ↓ Dropped Present in previous issue, absent this issue

Multimodal Video Data ↓ Dropped Present in previous issue, absent this issue

RLHF/Preference Data ↓ Dropped Present in previous issue, absent this issue

Synthetic Data ↓ Dropped Present in previous issue, absent this issue

Math Reasoning Data ↓ Dropped Present in previous issue, absent this issue

Evaluation Benchmark Data ↓ Dropped Present in previous issue, absent this issue

Multilingual Speech Data ↓ Dropped Present in previous issue, absent this issue

3D Scene/Asset Data ↓ Dropped Present in previous issue, absent this issue

Long-Context Data ↓ Dropped Present in previous issue, absent this issue

Download Movers

Datasets with the largest download changes this week

Dataset	Downloads	Weekly Growth
allenai/molmospaces	146	+24.8%

Deep Dive — DataRecipe

This week's 2 high-value datasets reverse-analyzed (auto-generated by DataRecipe)

facebook/EgoAVU_data

300 samples · 6 fields · Medium

6.0/10

Data Structure

Risk Assessment

Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality thresholds

Low Risk Data may become outdated over time → Establish continuous update mechanisms

allenai/olmix

300 samples · 113 fields · Medium

6.5/10

Data Structure

Risk Assessment

Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality thresholds

Low Risk Data may become outdated over time → Establish continuous update mechanisms

Analyzed 2 datasets this week · 83.9% human effort · all Medium difficulty

Want to discuss this issue?

Kai Founder & CEO

苏文 AI Documentation & Release Engineer

陆明哲 AI Product Manager

Auto-generated by AI Dataset Radar · Updated weekly

AI Dataset Radar →

Robotics VLA Foundation Models SurgeChinese LLM Alignment Demand Accelerates

Key Findings

Demand Signals

Download Movers

Deep Dive — DataRecipe

Data Structure

Risk Assessment

Data Structure

Risk Assessment

Robotics VLA Foundation Models Surge
Chinese LLM Alignment Demand Accelerates