Radar Brief Week 14, 2026 · 2026-02-25 — 2026-03-04

Video Understanding Data Enters Industrial-Scale Supply
Apple Proves Human Judgment Irreplaceable

This week scanned 86 HF orgs · 50 GitHub orgs · 71 blogs · 125 X accounts

0
Valuable Datasets
0
Related Papers
0
Blog Posts
0
Active Repos
One-line Summary

29 datasets in one week, video multimodal data enters systematic supply [P0]; talent turbulence clashes with commercial expansion [P0]; commercial expansion and safety controversies escalate in parallel [P1]. Top data demand signal this week: Video Understanding / Tracking Data.

Key Findings

This week's 5 high commercial value findings

P0 Allen AI Molmo2 Video Understanding Dataset Cluster Erupts: 29 Datasets in One Week, Video Multimodal Data Enters Systematic Supply [P0]

Allen AI released 29 datasets under the Molmo2 brand this week, nearly all focused on the video understanding task pipeline: molmo2-single-object-track (single object tracking, 2/24), molmo2-reasonvos (reasoning video object segmentation, 2/27), molmo2-burst (burst detection, 2/23), molmo2-mevis/mevis-valid (motion expression video segmentation), molmo2-ref-davis17/ref-yt-vos (reference-guided tracking), molmo2-revos/vicas/moca/lv-vis (multi-scenario video object segmentation), molmo2-hardcodes (hard-coded samples, 2/25), molmo2-academic-video-points (academic video tracking point labeling, 2/17), Molmo2-VideoPoint (video localization data, 360 downloads), Molmo2-VideoLocalizedNarratives/CaptionHf/VideoMME/TGIF/TVQA/NewsVideoQA (video narrative and QA series). Also released Dolci-Think-SFT-32B (1,464 downloads, reasoning SFT data), Dolci-Instruct-SFT-Tool-Use-SA (tool use SFT data), code_fresh_0825_1225 (25M token code data, 42 languages), SimpleToM (theory of mind evaluation), asta-user-interactions (scientific tool user interaction data). On GitHub, molmo2 repository (197 stars), molmospaces robotics ecosystem (152 stars, +15) continue growing.

Business implications: This is the largest single-week release of video understanding training data in the past six months. Allen AI is systematically building a complete data pipeline from "video object tracking → video segmentation → video localized narratives → video QA," meaning video multimodal data has transitioned from a previously scattered, scarce state to industrial-scale supply. For data service companies, Allen AI's open strategy (ODC-BY / Apache-2.0 licenses) both lowers video data market pricing expectations and creates new opportunities for differentiation around video data quality — there remains significant value space between synthetic tracking labels vs. human-annotated precision labels.
P0 Qwen Core Member Junyang Lin Departs Amid Small Model Rollout: Talent Turbulence Clashes with Commercial Expansion [P0]

Reddit r/LocalLLaMA's hottest post this week "Junyang Lin has left Qwen" (799 votes, 3/3) — the departure of a core Qwen R&D member sparked widespread community discussion. Meanwhile, Qwen 3.5 Small series (0.8B-9B) launched on Product Hunt (3/3), Qwen3.5-35B-A3B downloads surged from 21K last week to 680K, FP8 version hit 330K, 122B-A10B reached 150K, 27B-FP8 reached 159K. Qwen ecosystem continued expanding: Qwen3Guard real-time safety filtering, Qwen-Image-Edit image editing, Qwen-MT multilingual translation, GSPO scalable RL training. Reddit posts on Qwen3.5-9B abliterated (108 votes) and Qwen3.5-9B Uncensored (30 votes) show the community has begun systematically modifying Qwen small models. Tianchi IEEE AICAS 2026 edge VLM deployment challenge continued progressing.

Business implications: The impact of core personnel departure on Qwen's R&D cadence remains to be seen, but commercial data shows the "mass rollout" strategy has successfully landed — 680K downloads for 35B-A3B proves massive market demand for small MoE vision models. Community-driven abliterated/uncensored versions indicate Qwen small models have entered the "ecosystem self-modification" stage, and demand for customized fine-tuning data will diffuse from officially-led to community-driven. For the data industry, the explosion of Qwen small models means "high-SNR visual reasoning data suited for 9B parameter scale" is a high-certainty growth category.
P1 OpenAI Strategic Triple Play + GPT-5.3 Instant: Commercial Expansion and Safety Controversies Escalate in Parallel [P1]

OpenAI released three strategic partnerships this week — Amazon strategic cooperation (Frontier platform on AWS), Microsoft partnership renewal statement, and Department of Defense contract signing. GPT-5.3 Instant and system card released simultaneously (3/3), positioned as "smoother everyday conversation." The DoD contract triggered intense community reaction: LessWrong "A Tale of Three Contracts" deep analysis of Anthropic being flagged as a supply chain risk, "Mass Surveillance w/ LLMs is the Default Outcome" (DoW contract implications), Reddit "DoW vs Anthropic saga proves closed-source safety is a fraud" (64 votes) demanding open safety evaluations. Anthropic's response to Defense Secretary Pete Hegseth's statement drew attention. GitHub codex 61,868 stars (+670), openai-agents-python 19,132 stars.

Business implications: OpenAI's government contracts will drive two data demand directions: first, safety red-line evaluation data for government/military scenarios (contracts explicitly define safety red lines); second, AI deployment evaluation data in classified environments. Community calls for open safety evaluations mean independent safety evaluation benchmark data will become essential — both to assess model capabilities and to verify safety commitments. For Knowlyr, the irreplaceability of "human judgment" in safety evaluation is further reinforced by this political contest.
P1 Together AI CoderForge-Preview Sets New Open-Source Coding Agent Dataset SOTA [P1]

Together AI released CoderForge-Preview (2/20, 8,413 downloads, 118 likes), currently the largest open-source test-verified coding Agent dataset. Fine-tuned on Qwen-3 32B, SWE-Bench Verified performance improved from 23.0% to 59.4% pass@1, ranking first among open data and second among open-weight models ≤32B. Concurrent Reddit post "Benchmarked 94 LLM endpoints for jan 2026" (54 votes) shows open-source models have closed to within 5 points of closed-source models on quality. Mistral released Devstral 2 and Vibe CLI, strengthening coding Agent toolchains. SWE-rebench V2 (HF Papers) proposed cross-language SWE task scalable collection methods.

Business implications: CoderForge-Preview proves open-source coding data can achieve near-closed-source results, which will accelerate decentralized production of coding Agent data. Key differentiation directions: real enterprise codebase Agent behavioral trajectories (rather than synthetic environments), and cross-language SWE task data (the direction of SWE-rebench V2). For data service providers, "real human developer debugging and fixing processes" are more valuable than synthetic code tasks.
P2 Apple 'Intelligence Cannot Be Separated from Judgment' Paper + Google Gemini 3.1 Flash-Lite: Alignment Theory and Efficiency Models Advance on Dual Tracks [P2]

Apple Machine Learning Research published "On the Impossibility of Separating Intelligence from Judgment: The Computational Intractability of Filtering for AI Alignment" — arguing from computational complexity theory that AI alignment filtering is theoretically inseparable from intelligence itself, i.e., you cannot perfectly filter harmful outputs without affecting model intelligence. Also released Hallucination Span Detection reasoning, EMBridge gesture EMG cross-modal transfer, UI component variant instantiation, and App Store search LLM enhancement. Google released Gemini 3.1 Flash-Lite (fastest, lowest-cost Gemini 3 series) and Nano Banana 2 image generation model. HN "Open-Source Article 12 Logging for EU AI Act" (35 votes) shows AI compliance tooling is going open-source.

Business implications: Apple's paper provides rigorous theoretical backing for "human judgment is irreplaceable in AI systems" — if filtering and alignment are computationally inseparable from intelligence, then "having humans make judgments" is not a temporary stopgap but a long-term structural necessity. Gemini 3.1 Flash-Lite and GPT-5.3 Instant both pushing "low-cost efficient inference" means lightweight model evaluation data demand is growing rapidly. The open-sourcing of EU AI Act compliance tools signals that compliance evaluation data will emerge as a new category.

Demand Signals

Infer training data demands from model releases

Data Type Intensity Trend Related Signals
Video Understanding / Tracking Data
Critical ↑ New
Allen AI Molmo2 29 video datasets in one week · Full pipeline coverage: video object tracking/segmentation/localization
Multimodal Visual Reasoning Data
Critical → Continuing
Qwen 3.5 Small downloads hit 680K · 122B-A10B 150K · Community abliterating small models · InternLM Spatial-SSRL
Coding Agent Data
Critical ↑ New
CoderForge-Preview SWE-Bench 23%→59.4% · Devstral 2 · SWE-rebench V2 cross-language tasks
Safety Evaluation / Alignment Data
High ↑ New
OpenAI DoD contract safety red lines · Apple 'Intelligence Cannot Be Separated from Judgment' paper · PrivMedChat differential privacy RLHF
RLHF / Preference Alignment Data
High → Continuing
Robometer trajectory contrastive reward model · RubricBench evaluation alignment · GRM breadth-depth synergy
Agent Tool / Planning Data
High ↑ New
Qwen DeepPlanning long-horizon Agent planning · LOGIGEN verifiable Agent task generation · DigiData mobile control
Robotics / Tactile Data
High ↑ New
BAAI ToucHD tactile dataset · NVIDIA NuRec robotics · Arena-GR1 manipulation
Synthetic Data Methodology
Moderate ↑ New
CHIMERA compact synthetic reasoning data · CharacterFlywheel 15-generation iterative production optimization · VisNec visual necessity filtering
EU Compliance Evaluation Data
Moderate ↑ New
HN: Open-source Article 12 logging infrastructure · AI safety review tools going open-source
Safety Adversarial / Evaluation Data ↓ Dropped Present in previous issue, absent this issue
Agent Terminal / Tool Data ↓ Dropped Present in previous issue, absent this issue
Coding / Code Reasoning Data ↓ Dropped Present in previous issue, absent this issue
Model Compression Evaluation Data ↓ Dropped Present in previous issue, absent this issue
Spatial Understanding / Embodied AI Data ↓ Dropped Present in previous issue, absent this issue
Speech / Multi-Speaker Understanding Data ↓ Dropped Present in previous issue, absent this issue
Synthetic Data Quality Evaluation ↓ Dropped Present in previous issue, absent this issue
Multilingual Data ↓ Dropped Present in previous issue, absent this issue

Download Movers

Datasets with the largest download changes this week

Dataset Downloads Weekly Growth
nvidia/Nemotron-Terminal-Corpus 744 +18500.0%
nvidia/HiLiftAeroML 1,011 +73.7%
google/WaxalNLP 13,506 +36.7%
allenai/asta-summary-citation-counts 439 +13.7%
microsoft/SYNUR 122 +0.8%

Deep Dive — DataRecipe

This week's 3 high-value datasets reverse-analyzed (auto-generated by DataRecipe)

togethercomputer/CoderForge-Preview
300 samples · 7 fields · Hard
6.0/10
🟢 Recommended for Replication

Data Structure

trajectory_id finish_reason image messages reward tools license

Risk Assessment

Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality thresholds
Low Risk Data may become outdated over time → Establish continuous update mechanisms
allenai/Dolci-Think-SFT-32B
300 samples · 3 fields · Hard
6.0/10
🟢 Recommended for Replication

Data Structure

messages id source

Risk Assessment

Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality thresholds
Low Risk Data may become outdated over time → Establish continuous update mechanisms
google/MapTrace
300 samples · 3 fields · Medium
6.5/10
🟢 Recommended for Replication

Data Structure

image input label

Risk Assessment

Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality thresholds
Low Risk Data may become outdated over time → Establish continuous update mechanisms

Analyzed 3 datasets this week · 99.6% human effort

Want to discuss this issue?

Kai
Kai Founder & CEO
苏文
苏文 AI Documentation & Release Engineer
陆明哲
陆明哲 AI Product Manager

Auto-generated by AI Dataset Radar · Updated weekly

AI Dataset Radar →