Qwen 3.5 Full-Size Coverage
Safety Adversarial Data Demand Emerges
This week scanned 86 HF orgs · 50 GitHub orgs · 71 blogs · 125 X accounts
Qwen 3.5 family ships 3 models on 2/24, Chinese open-source VLM enters full-size coverage phase [P0]; Anthropic RSP v3.0 + distillation attack detection + claude-code-security [P0]; NVIDIA Nemotron-Terminal-Corpus opens new terminal Agent SFT dataset category (2/19) [P1]. Top data demand signal this week: Multimodal Visual Reasoning Data.
Key Findings
This week's 5 high commercial value findings
Following the Qwen3.5-397B-A17B flagship (2/16, 483K downloads, 1,052 likes) and FP8 quantized version (2/18, 93K downloads), Alibaba released three mid-to-small variants on 2/24: Qwen3.5-35B-A3B (21K downloads, 365 likes, Azure deployment supported), Qwen3.5-27B (6,875 downloads, 254 likes), Qwen3.5-122B-A10B (3,320 downloads, 225 likes). All three new models feature image-text-to-text multimodal architecture, with 35B-A3B and 122B-A10B using MoE sparse activation and 27B being Dense. Reddit community reaction was extremely enthusiastic: r/LocalLLaMA posts "Qwen3-30B-A3B vs Qwen3.5-35B-A3B on RTX 5090" (135 votes), "Qwen 3.5 craters on hard coding tasks" (128 votes), "Qwen 3.5 benchmark comparison" (89 votes), "Vision language benchmarks of qwen3.5" (40 votes) appeared in rapid succession. Meanwhile, Qwen blog continued ecosystem output: Qwen3Guard real-time safety filtering, Qwen-Image-Edit image editing, Qwen-Image native text rendering, GSPO scalable RL training, Qwen-MT multilingual translation.
Anthropic released three safety-related results this week: (1) Responsible Scaling Policy v3.0 — updated responsible scaling policy framework defining stricter model deployment safety thresholds; (2) Detecting and preventing distillation attacks — proposed methods for detecting model distillation attacks, with Nathan Lambert providing in-depth analysis on Interconnects blog as "How much does distillation really matter for Chinese LLMs?", and Zhiyuan community reposting "One Anthropic blog post, IBM drops 13%"; (3) claude-code-security-review (GitHub 3,318 stars) — AI-powered code security review GitHub Action. Also released Persona Selection Model interpretability research (simultaneously published on Alignment Forum), AI Fluency Index education report. Research team page updated showcasing Economic Research, Interpretability, and Societal Impacts directions.
NVIDIA released Nemotron-Terminal-Corpus (2/19, cc-by-4.0) and companion Nemotron-Terminal-Synthetic-Tasks — the former is a large-scale terminal interaction SFT dataset designed for training LLM Linux terminal operation capabilities, synthesized through the Terminal-Task-Gen pipeline; the latter provides skill-based synthetic task structures for evaluating and training autonomous terminal Agents. Companion paper "On Data Engineering for Scaling LLM Terminal Capabilities" (authors Renjie Pi, Grace Lam, M. Shoeybi) published simultaneously. Concurrent NVIDIA ecosystem signals: Nemotron-3-Nano-30B-A3B series maintaining high downloads (BF16 version 852K, FP8 version 1.159M), nemotron-colembed-vl-4b-v2 visual document retrieval model (54K downloads), Isaac-GR00T robotics foundation model (6,248 stars).
Cerebras released Step-3.5-Flash-REAP-121B-A11B and Step-3.5-Flash-REAP-149B-A11B on 2/25, continuing its "large model slimming" technical approach (last week compressed MiniMax-M2.5). Same day, InternLM released Spatial-SSRL-3B spatial self-supervised model — a multimodal architecture specializing in spatial understanding and self-supervised learning, tagged "spatial understanding, self-supervised learning". On Reddit, Unsloth Q3 quantization benchmarks surpassing Q4 and MXFP4 (63 votes), Mercury 2 diffusion model inference speed discussion (16 votes) — model compression and efficient inference topics remain active.
Apple Machine Learning published 5 research papers in one week: (1) CoT reasoning dynamic analysis — revealing trace dynamics of chain-of-thought reasoning; (2) Speech understanding gap — LLM performance on speech input far below text, pointing to directions for bridging the gap; (3) HTML text extraction — re-examining HTML-to-Text extraction methods for LLM pre-training, discovering limitations of existing approaches; (4) AMUSE — audio-visual multi-speaker understanding benchmark and alignment framework, highlighting shortcomings of multimodal models (GPT-4o, Qwen3-Omni) in multi-speaker conversation scenarios; (5) depyf — PyTorch compiler debugging tool. In embodied AI, Wayve raised $1.2 billion Series D (planning supervised autonomous driving robotaxi trial operations in London in 2026, entering consumer market in 2027), AI2 Robotics completed Series B (valued over $1 billion, developing AlphaBot VLA model for semi-humanoid robots).
Demand Signals
Infer training data demands from model releases
Download Movers
Datasets with the largest download changes this week
| Dataset | Downloads | Weekly Growth |
|---|---|---|
| allenai/olmix | 272 | +14.3% |
| google/WaxalNLP | 9,883 | -6.6% |
| nvidia/HiLiftAeroML | 582 | -14.9% |
Want to discuss this issue?
Auto-generated by AI Dataset Radar · Updated weekly
AI Dataset Radar →