Radar Brief Week 6, 2026 · 2026-02-02 — 2026-02-09

Code Agent Race Heats Up
Robotics Data Infrastructure Accelerates

This week scanned 86 HF orgs · 50 GitHub orgs · 71 blogs · 125 X accounts

0
Valuable Datasets
0
Related Papers
0
Blog Posts
0
Active Repos
One-line Summary

Code Agent competition intensifies, Cosmos-Policy + Numb3rs + Isaac GR00T, document understanding data demand surges. Strongest data demand signal this week: Code Agent Data.

Key Findings

This week's 5 high commercial value findings

P0 Qwen Releases Qwen3-Coder-Next: Code Agent Competition Intensifies

Alibaba's Qwen team released Qwen3-Coder-Next (80B MoE, 3B active), purpose-built for coding agents and local development. Scored 44.3 on SWE-Bench Pro, with Day-0 support from both vLLM and SGLang. Together AI has already launched inference services.

Business implications → Code Agents are becoming a core battleground for major labs. Together simultaneously released Aurora-Spec-Qwen3-Coder-Next-FP8 (2026-02-03) as a speculative decoding accelerator, indicating that inference efficiency is key to production deployment. Data service companies should focus on building code instruction datasets (multilingual code + agent tool-calling scenarios).
P0 NVIDIA Doubles Down on Robotics Data Infrastructure: Cosmos-Policy + Numb3rs + Isaac GR00T

NVIDIA released two robotics simulation datasets — RoboCasa-Cosmos-Policy and LIBERO-Cosmos-Policy — alongside the Isaac GR00T N1.6 foundation model (GitHub ⭐6143). Also released the Numb3rs speech text normalization dataset. With 5 datasets + 35 models, NVIDIA was the most active lab this week.

Business implications → NVIDIA is building an end-to-end robotics learning data pipeline (simulation → data → model), with clear and sustained demand for robotic manipulation datasets, simulation environment data, and speech data.
P1 DeepSeek-OCR-2 Downloads Surge: Document Understanding Data Demand Spikes

deepseek-ai/DeepSeek-OCR-2 reached 661,725 downloads and 712 likes within one week, becoming the most downloaded Chinese model this week. Meanwhile, Zhipu released GLM-OCR (covered by SGLang), and Mistral released OCR 3.

Business implications → The OCR/document understanding space is seeing a three-way race (DeepSeek, Zhipu, Mistral). Demand for high-quality document labeling data (complex layouts, multilingual documents, nested tables) will grow significantly.
P1 RLHF/Preference Learning Papers Surge: 7 Papers Focus on Reward Model Improvements

Seven RLHF-related papers this week, covering French preference data collection (compar:IA), democratized preference alignment (DemPO), rubric improvements, GenRM reasoning quality (R-Align), LLM judge debiasing (FairJudge), DPO over-optimization safeguards (PEPO), and video flow matching (Euphonium). Qwen released the RationaleRM dataset (2026-02-02), proposing Rationale Consistency as a new evaluation dimension.

Business implications → RLHF is evolving from simple binary preference labeling toward multi-dimensional, multilingual, and explainable approaches. Data service companies need to upgrade labeling protocols: supporting rubric-based evaluation, reasoning process labeling, and multilingual preference data collection.
P2 StepFun's Step-3.5-Flash Leads China's Open-Source Speed Race

stepfun-ai/Step-3.5-Flash achieved 228,406 downloads, alongside the release of the competitive programming benchmark CF-Div2-Stepfun. Step3-VL-10B (82,755 downloads) focuses on robotic vision-language interaction.

Business implications → StepFun is advancing simultaneously on inference speed and multimodal robotics, with competitive programming data + robotic vision-language data as core data priorities.

Demand Signals

Infer training data demands from model releases

Data Type Intensity Trend Related Signals
Code Agent Data
Critical ↑ New
Qwen3-Coder-Next · Aurora-Spec series · SERA series
Robotics / Embodied AI Data
Critical ↑ New
Cosmos-Policy×2 · Isaac GR00T · jepa-wms · Step3-VL-10B
Document OCR Data
High ↑ New
DeepSeek-OCR-2 · GLM-OCR · Mistral OCR 3
RLHF Preference Data
High ↑ New
RationaleRM · compar:IA · 7 preference learning papers
Multilingual Speech Data
Moderate ↑ New
WaxalNLP · Numb3rs · Voxtral-Mini-4B
Safety / Content Moderation Data
Moderate ↑ New
Nemotron-Safety-Guard-v3 · Qwen3Guard
Synthetic Visual Data
Moderate ↑ New
CoSyn-point · DreamDojo

Download Movers

Datasets with the largest download changes this week

Dataset Downloads Weekly Growth
nvidia/Numb3rs 232 +139.2%
amazon/doc_split 1,566 +25.9%
Qwen/RationaleRM 754 +16.9%
nvidia/LIBERO-Cosmos-Policy 2,173 +7.0%
google/WaxalNLP 7,277 +1.9%

Deep Dive — DataRecipe

This week's 3 high-value datasets reverse-analyzed (auto-generated by DataRecipe)

Qwen/RationaleRM
300 samples · 14 fields · Hard
6.0/10
🟢 Recommended for Replication

Data Structure

domain language context response1 response2 overall_preference individual_preference human-checklist model-low_deceptive_alignment-checklist

Risk Assessment

Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality gates
Low Risk Data may become outdated over time → Establish continuous update mechanisms
microsoft/CancerGUIDE
165 samples · 3 fields · Hard
6.0/10
🟢 Recommended for Replication

Data Structure

patient_id patient_note label

Risk Assessment

Medium Risk Requires domain experts; talent acquisition may be difficult → Build talent pipeline early or consider outsourcing partnerships
Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality gates
Low Risk Data may become outdated over time → Establish continuous update mechanisms
amazon/doc_split
300 samples · 3 fields · Hard
6.0/10
🟢 Recommended for Replication

Data Structure

doc_id total_pages subdocuments

Risk Assessment

Medium Risk Requires domain experts; talent acquisition may be difficult → Build talent pipeline early or consider outsourcing partnerships
Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality gates
Low Risk Data may become outdated over time → Establish continuous update mechanisms

Analyzed 3 datasets this week · 83.9% human effort · all Hard difficulty

Want to discuss this issue?

Kai
Kai Founder & CEO
苏文
苏文 AI Documentation & Release Engineer
陆明哲
陆明哲 AI Product Manager

Auto-generated by AI Dataset Radar · Updated weekly

AI Dataset Radar →