Radar Brief Week 6, 2026 · 2026-02-02 — 2026-02-09

Code Agent Race Heats Up
Robotics Data Infrastructure Accelerates

This week scanned 86 HF orgs · 50 GitHub orgs · 71 blogs · 125 X accounts

0
Valuable Datasets
0
Related Papers
0
Blog Posts
0
Active Repos
One-line Summary

Code Agent competition heats up, Cosmos-Policy + Numb3rs + Isaac GR00T, document understanding data demand surges. Top data demand signal this week: Code Agent Data.

Key Findings

This week's 5 high commercial value findings

P0 Qwen Releases Qwen3-Coder-Next: Code Agent Competition Reaches Fever Pitch

Alibaba's Qwen team released Qwen3-Coder-Next (80B MoE, 3B active), designed for coding agents and local development. SWE-Bench Pro score of 44.3, with Day-0 support from both vLLM and SGLang. Together AI has launched inference service.

Business implications: Code Agents are becoming a core battleground for major AI labs. Together simultaneously released Aurora-Spec-Qwen3-Coder-Next-FP8 (2026-02-03) as a speculative decoding accelerator, indicating inference efficiency is key to production deployment. Data service companies should prioritize building code instruction datasets (multilingual code + Agent tool-calling scenarios).
P0 NVIDIA Advances Robotics Data Infrastructure: Cosmos-Policy + Numb3rs + Isaac GR00T

NVIDIA released RoboCasa-Cosmos-Policy and LIBERO-Cosmos-Policy, two robotics simulation datasets, alongside Isaac GR00T N1.6 foundation model (GitHub 6,143 stars). Also released Numb3rs speech text normalization dataset. 5 datasets + 35 models, making it the most active lab this week.

Business implications: NVIDIA is building an end-to-end robotics learning data pipeline (simulation to data to model), with clear and sustained demand for robotics manipulation datasets, simulation environment data, and speech data.
P1 DeepSeek-OCR-2 Download Explosion: Document Understanding Data Demand Surges

deepseek-ai/DeepSeek-OCR-2 reached 661,725 downloads and 712 likes within one week, becoming the most downloaded Chinese model this week. Concurrently, Zhipu (智谱) released GLM-OCR (reported by SGLang), and Mistral released OCR 3.

Business implications: Three-way competition in the OCR/document understanding track (DeepSeek, Zhipu, Mistral). Demand for high-quality document labeling data (complex layouts, multilingual documents, nested tables) will grow significantly.
P1 RLHF/Preference Learning Paper Surge: 7 Papers Focus on Reward Model Improvements

7 RLHF-related papers this week, covering French preference data collection (compar:IA), democratized preference alignment (DemPO), Rubric improvements, GenRM reasoning quality (R-Align), LLM judge debiasing (FairJudge), DPO over-optimization defense (PEPO), and video flow matching (Euphonium). Qwen released the RationaleRM dataset (2026-02-02), proposing a new Rationale Consistency evaluation dimension.

Business implications: RLHF is evolving from simple binary preference labeling toward multi-dimensional, multilingual, and explainable directions. Data service companies need to upgrade labeling protocols: supporting rubric-based evaluation, reasoning process annotation, and multilingual preference data collection.
P2 StepFun (阶跃星辰) Step-3.5-Flash Leads China's Open-Source Speed Track

stepfun-ai/Step-3.5-Flash with 228,406 downloads, also released competitive programming benchmark CF-Div2-Stepfun. Step3-VL-10B (82,755 downloads) focuses on robotics vision-language interaction.

Business implications: StepFun is advancing on two fronts simultaneously — inference speed and multimodal robotics. Competitive programming data + robotics vision-language data are its core data demands.

Demand Signals

Infer training data demands from model releases

Data Type Intensity Trend Related Signals
Code Agent Data Very Strong ↑ New Qwen3-Coder-Next · Aurora-Spec series · SERA series
Robotics/Embodied AI Data Very Strong ↑ New Cosmos-Policy×2 · Isaac GR00T · jepa-wms · Step3-VL-10B
Document OCR Data Strong ↑ New DeepSeek-OCR-2 · GLM-OCR · Mistral OCR 3
RLHF Preference Data Strong ↑ New RationaleRM · compar:IA · 7 preference learning papers
Multilingual Speech Data Medium ↑ New WaxalNLP · Numb3rs · Voxtral-Mini-4B
Safety/Content Moderation Data Medium ↑ New Nemotron-Safety-Guard-v3 · Qwen3Guard
Synthetic Visual Data Medium ↑ New CoSyn-point · DreamDojo

Download Movers

Datasets with the largest download changes this week

Dataset Downloads Weekly Growth
nvidia/Numb3rs 232 +139.2%
amazon/doc_split 1,566 +25.9%
Qwen/RationaleRM 754 +16.9%
nvidia/LIBERO-Cosmos-Policy 2,173 +7.0%
google/WaxalNLP 7,277 +1.9%

Deep Dive — DataRecipe

This week's 3 high-value datasets reverse-analyzed (auto-generated by DataRecipe)

Qwen/RationaleRM
300 samples · 14 fields · Hard
6.0/10
🟢 Recommended to Replicate

Data Structure

domain language context response1 response2 overall_preference individual_preference human-checklist model-low_deceptive_alignment-checklist

Risk Assessment

Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality thresholds
Low Risk Data may become outdated over time → Establish continuous update mechanisms
microsoft/CancerGUIDE
165 samples · 3 fields · Hard
6.0/10
🟢 Recommended to Replicate

Data Structure

patient_id patient_note label

Risk Assessment

Medium Risk Requires domain experts; talent acquisition may be challenging → Build talent pipeline in advance, or consider outsourcing partnerships
Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality thresholds
Low Risk Data may become outdated over time → Establish continuous update mechanisms
amazon/doc_split
300 samples · 3 fields · Hard
6.0/10
🟢 Recommended to Replicate

Data Structure

doc_id total_pages subdocuments

Risk Assessment

Medium Risk Requires domain experts; talent acquisition may be challenging → Build talent pipeline in advance, or consider outsourcing partnerships
Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality thresholds
Low Risk Data may become outdated over time → Establish continuous update mechanisms

3 datasets analyzed this week · 83.9% human labor share · All Hard difficulty

Want to discuss this issue?

Kai" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
Kai Founder & CEO
苏文" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
苏文 AI 文档与发布工程师
陆明哲" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
陆明哲 AI 产品经理

Auto-generated by AI Dataset Radar · Updated weekly

AI Dataset Radar →