Radar Brief Week 13, 2026 · 2026-02-19 — 2026-02-26

Qwen 3.5 Full-Size Coverage
Safety Adversarial Data Demand Emerges

This week scanned 86 HF orgs · 50 GitHub orgs · 71 blogs · 125 X accounts

0
Valuable Datasets
0
Related Papers
0
Blog Posts
0
Active Repos
One-line Summary

Qwen 3.5 family ships 3 models on 2/24, Chinese open-source VLM enters full-size coverage phase [P0]; Anthropic RSP v3.0 + distillation attack detection + claude-code-security [P0]; NVIDIA Nemotron-Terminal-Corpus opens new terminal Agent SFT dataset category (2/19) [P1]. Top data demand signal this week: Multimodal Visual Reasoning Data.

Key Findings

This week's 5 high commercial value findings

P0 Qwen 3.5 Family Ships 3 Models on 2/24 — Chinese Open-Source VLM Enters Full-Size Coverage Phase [P0]

Following the Qwen3.5-397B-A17B flagship (2/16, 483K downloads, 1,052 likes) and FP8 quantized version (2/18, 93K downloads), Alibaba released three mid-to-small variants on 2/24: Qwen3.5-35B-A3B (21K downloads, 365 likes, Azure deployment supported), Qwen3.5-27B (6,875 downloads, 254 likes), Qwen3.5-122B-A10B (3,320 downloads, 225 likes). All three new models feature image-text-to-text multimodal architecture, with 35B-A3B and 122B-A10B using MoE sparse activation and 27B being Dense. Reddit community reaction was extremely enthusiastic: r/LocalLLaMA posts "Qwen3-30B-A3B vs Qwen3.5-35B-A3B on RTX 5090" (135 votes), "Qwen 3.5 craters on hard coding tasks" (128 votes), "Qwen 3.5 benchmark comparison" (89 votes), "Vision language benchmarks of qwen3.5" (40 votes) appeared in rapid succession. Meanwhile, Qwen blog continued ecosystem output: Qwen3Guard real-time safety filtering, Qwen-Image-Edit image editing, Qwen-Image native text rendering, GSPO scalable RL training, Qwen-MT multilingual translation.

Business implications: Compared to last week's single flagship 397B, Qwen 3.5 now spans 5 sizes (27B/35B-A3B/122B-A10B/397B-A17B/397B-FP8), covering from consumer GPUs (RTX 5090 can run 35B-A3B) to data center scenarios. This means visual reasoning training data demand shifts from "preparing data for one large model" to "preparing data of varying complexity for an entire product line" — small models need refined, high-SNR data, large models need complex reasoning chain data, MoE models may need specialized domain routing data. For Knowlyr, the commercial value of Chinese visual reasoning data has upgraded from last week's "clear demand" to "highly certain scaled demand."
P0 Anthropic Safety Infrastructure Triple Release: RSP v3.0 + Distillation Attack Detection + claude-code-security [P0]

Anthropic released three safety-related results this week: (1) Responsible Scaling Policy v3.0 — updated responsible scaling policy framework defining stricter model deployment safety thresholds; (2) Detecting and preventing distillation attacks — proposed methods for detecting model distillation attacks, with Nathan Lambert providing in-depth analysis on Interconnects blog as "How much does distillation really matter for Chinese LLMs?", and Zhiyuan community reposting "One Anthropic blog post, IBM drops 13%"; (3) claude-code-security-review (GitHub 3,318 stars) — AI-powered code security review GitHub Action. Also released Persona Selection Model interpretability research (simultaneously published on Alignment Forum), AI Fluency Index education report. Research team page updated showcasing Economic Research, Interpretability, and Societal Impacts directions.

Business implications: Anthropic's distillation attack detection directly names the distillation issue in Chinese LLMs, triggering industry shockwaves (IBM down 13%). This signals two data needs: (1) Adversarial safety data — model distillation detection requires large-scale "original output vs distilled output" paired datasets for training detectors; (2) Safety audit data — RSP v3.0's stricter thresholds mean more models must pass safety evaluations before deployment, making safety evaluation datasets essential. For Knowlyr, the irreplaceability of "human judgment" in safety evaluation is further strengthened — whether a model can be safely deployed ultimately requires expert human judgment.
P1 NVIDIA Nemotron-Terminal-Corpus Opens New Terminal Agent SFT Dataset Category (2/19) [P1]

NVIDIA released Nemotron-Terminal-Corpus (2/19, cc-by-4.0) and companion Nemotron-Terminal-Synthetic-Tasks — the former is a large-scale terminal interaction SFT dataset designed for training LLM Linux terminal operation capabilities, synthesized through the Terminal-Task-Gen pipeline; the latter provides skill-based synthetic task structures for evaluating and training autonomous terminal Agents. Companion paper "On Data Engineering for Scaling LLM Terminal Capabilities" (authors Renjie Pi, Grace Lam, M. Shoeybi) published simultaneously. Concurrent NVIDIA ecosystem signals: Nemotron-3-Nano-30B-A3B series maintaining high downloads (BF16 version 852K, FP8 version 1.159M), nemotron-colembed-vl-4b-v2 visual document retrieval model (54K downloads), Isaac-GR00T robotics foundation model (6,248 stars).

Business implications: Terminal-Corpus marks Agent SFT data expansion from Web/API scenarios to system administration — a critical scenario for enterprise AI Agent deployment (ops automation, DevOps). The cc-by-4.0 license enables commercial use, but synthetic data limitations (Terminal-Task-Gen generated) mean human-annotated data from real operational scenarios still holds value. Combined with Reddit hot post "Your coding agent sessions are sitting on your machine right now" (46 votes), the industry is recognizing the value of Agent behavioral trajectory data — whoever can systematically collect real terminal operation data builds a differentiated moat.
P1 Cerebras REAP Compresses Step 3.5 Flash + InternLM Spatial Self-Supervision: Efficient Inference and Spatial Understanding on Dual Tracks (2/25) [P1]

Cerebras released Step-3.5-Flash-REAP-121B-A11B and Step-3.5-Flash-REAP-149B-A11B on 2/25, continuing its "large model slimming" technical approach (last week compressed MiniMax-M2.5). Same day, InternLM released Spatial-SSRL-3B spatial self-supervised model — a multimodal architecture specializing in spatial understanding and self-supervised learning, tagged "spatial understanding, self-supervised learning". On Reddit, Unsloth Q3 quantization benchmarks surpassing Q4 and MXFP4 (63 votes), Mercury 2 diffusion model inference speed discussion (16 votes) — model compression and efficient inference topics remain active.

Business implications: Cerebras doing REAP compression for different models two weeks in a row (MiniMax last week, Step this week) is becoming the de facto standard for "model compression as a service" — quality evaluation of compressed models requires systematic comparative datasets. Spatial-SSRL-3B opens a new spatial self-supervision direction, echoing last week's MolmoSpaces embodied AI spatial resources — spatial understanding data demand expands from the visual domain to self-supervised pre-training. Quantization evaluation data ("how much quality is lost after compression") is evolving from niche to essential.
P2 Apple ML Intensive Research Output + Embodied AI Capital Heats Up: Wayve $1.2B + AI2 Robotics Series B [P2]

Apple Machine Learning published 5 research papers in one week: (1) CoT reasoning dynamic analysis — revealing trace dynamics of chain-of-thought reasoning; (2) Speech understanding gap — LLM performance on speech input far below text, pointing to directions for bridging the gap; (3) HTML text extraction — re-examining HTML-to-Text extraction methods for LLM pre-training, discovering limitations of existing approaches; (4) AMUSE — audio-visual multi-speaker understanding benchmark and alignment framework, highlighting shortcomings of multimodal models (GPT-4o, Qwen3-Omni) in multi-speaker conversation scenarios; (5) depyf — PyTorch compiler debugging tool. In embodied AI, Wayve raised $1.2 billion Series D (planning supervised autonomous driving robotaxi trial operations in London in 2026, entering consumer market in 2027), AI2 Robotics completed Series B (valued over $1 billion, developing AlphaBot VLA model for semi-humanoid robots).

Business implications: Apple's 5 papers signal data needs: speech-text alignment data (bridging the speech understanding gap), multi-speaker conversation annotation data (gaps identified by AMUSE), high-quality HTML-to-Text training data (pre-training infrastructure improvement). The funding scale of Wayve and AI2 Robotics confirms embodied AI has entered the industrialization phase — $1.2 billion is for building products, not research. Implication for the data industry: high-quality annotation data demand for autonomous driving and robotic manipulation scenarios will scale from research-grade to product-grade volumes.

Demand Signals

Infer training data demands from model releases

Data Type Intensity Trend Related Signals
Multimodal Visual Reasoning Data Qwen 3.5 family expands to 5 sizes (full coverage), GLM-4.6V open-sourced, InternLM Spatial-SSRL-3B spatial understanding → Continuing Extreme
Safety Adversarial / Evaluation Data Anthropic RSP v3.0 + distillation attack detection + claude-code-security, CAMEL confidence-gated reward modeling, IR3 reward hacking detection ↑ New Extreme
RLHF / Preference Alignment Data MARS self-refinement (ongoing), CAMEL confidence-gated reflection, IR3 contrastive inverse RL reward hacking detection, gradient regularization against reward hacking → Continuing Extreme
Agent Terminal / Tool Data NVIDIA Nemotron-Terminal-Corpus SFT, MagicAgent general Agent planning, Reddit "coding agent sessions" discussion ↑ New Strong
Coding / Code Reasoning Data Devstral-2-123B (15K downloads), Devstral-Small-2-24B (416K downloads), Reddit "Qwen 3.5 craters on hard coding tasks" → Continuing Strong
Model Compression Evaluation Data Cerebras REAP Step 3.5 Flash (two compressed versions), Unsloth Q3 beats Q4/MXFP4 (Reddit 63 votes), Mercury 2 diffusion model inference speed ↑ New Strong
Spatial Understanding / Embodied AI Data InternLM Spatial-SSRL-3B, Wayve $1.2B funding, AI2 Robotics Series B, GEBench GUI interaction evaluation ↑ New Strong
Speech / Multi-Speaker Understanding Data Apple "Closing the Gap Between Text and Speech", AMUSE multi-speaker benchmark, TinyTTS 9M parameter TTS (Reddit 21 votes) ↑ New Medium
Synthetic Data Quality Evaluation "When Pretty Isn't Useful" (synthetic image training degradation study), ReSyn autonomous synthetic environment scaling ↑ New Medium
Multilingual Data WaxalNLP African languages (9,883 downloads), BURMESE-SAN Myanmar benchmark, Qwen-MT multilingual translation → Continuing Medium
Agent Behavioral / Trajectory Data ↓ Dropped Present in previous issue, absent this issue
Complex Reasoning Evaluation Data ↓ Dropped Present in previous issue, absent this issue
Robotics / Embodied AI Data ↓ Dropped Present in previous issue, absent this issue
Document OCR Data ↓ Dropped Present in previous issue, absent this issue
Quantization / Compression Evaluation Data ↓ Dropped Present in previous issue, absent this issue
Safety / Alignment Audit Data ↓ Dropped Present in previous issue, absent this issue

Download Movers

Datasets with the largest download changes this week

Dataset Downloads Weekly Growth
allenai/olmix 272 +14.3%
google/WaxalNLP 9,883 -6.6%
nvidia/HiLiftAeroML 582 -14.9%

Want to discuss this issue?

Kai" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
Kai Founder & CEO
苏文" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
苏文 AI 文档与发布工程师
陆明哲" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
陆明哲 AI 产品经理

Auto-generated by AI Dataset Radar · Updated weekly

AI Dataset Radar →