Radar Brief Week 8, 2026 · 2026-02-13 — 2026-02-20

placeholder
placeholder

This week scanned 86 HF orgs · 50 GitHub orgs · 71 blogs · 125 X accounts

0
Valuable Datasets
0
Related Papers
0
Blog Posts
0
Active Repos
One-line Summary

Allen AI releases five datasets + Olmix data mixing framework, systematically defining pre-training data methodology; Meta open-sources 200K+ multilingual multi-turn preference dataset, upgrading RLHF public data supply; RLHF/alignment research at high-density output for the fourth consecutive week, methodology moving toward personalization and decoupling. Top data demand signal this week: Multimodal Visual Reasoning Data.

Key Findings

This week's 5 high commercial value findings

P0 Allen AI Releases Five Datasets + Olmix Data Mixing Framework, Systematically Defining Pre-Training Data Methodology (2026-02-11 to 2026-02-17)

Allen AI released 5 datasets and 8 models this week, making it the highest single-week output research institution. Key highlights: allenai/olmix (2026-02-11, 238 downloads, 18 likes) — proxy run swarm data for OLMo pre-training, systematically solving the core pre-training question of "what ratio of different domain data produces optimal results"; allenai/Dolci-Instruct-DPO (2,498 downloads) — 260K preference pairs for OLMo 3 Instruct 7B alignment training, ODC-BY license; allenai/olmOCR-bench (2,745 downloads, 58 likes) — 1,403 PDFs + 7,010 unit tests, establishing PDF-to-Markdown OCR system evaluation standards; allenai/Molmo2-MultiImageQA (194 downloads) — multi-image visual question answering instruction fine-tuning dataset; allenai/molmospaces (204 downloads, +39.7% weekly growth) — embodied AI 3DGUT/USD resource update with Isaac Sim compatible format. Companion blog posts published simultaneously: Olmix data mixing framework deep dive, AutoDiscovery automated scientific discovery, MolmoSpaces ecosystem introduction, How2Everything real-world procedure evaluation.

Business implications: Allen AI has leaped from "releasing individual datasets" to "outputting data methodology" — Olmix's swarm data mixing method will transform the engineering practice of pre-training data proportioning. Data service providers should watch for: 1) Data mixing optimization as a service — helping clients find optimal training ratios; 2) OCR evaluation benchmark standardization — olmOCR-bench may become the de facto standard in document AI, and data suppliers should calibrate document labeling quality accordingly; 3) Public supply of DPO preference data — 260K open-source DPO data compresses the commercial space for low-quality preference data, and differentiated competition must focus on vertical domains.
P0 Meta Open-Sources 200K+ Multilingual Multi-Turn Preference Dataset, Upgrading RLHF Public Data Supply (2025-05-13 Initial Release, Entered Monitoring Scope This Week)

facebook/community-alignment-dataset (194 downloads, 39 likes, cc-by-4.0) — 200K+ LLM response comparison data from 3,000+ global annotators, covering multilingual and multi-turn conversation scenarios. This is Meta's largest-scale open-source multilingual preference dataset. Also released facebook/actionbench (2026-02-19, 2 downloads) — 128 video-to-animated point cloud paired samples for evaluating the ability to generate animated 3D meshes from video. The two datasets represent Meta's strategic positioning on two data fronts: "text alignment" and "video-3D multimodal."

Business implications: The cc-by-4.0 license of community-alignment-dataset means anyone can use it freely for commercial training — a boon for small and mid-sized model companies, but a direct impact on preference data suppliers. Differentiation directions: 1) Vertical industry preference data (medical, legal, financial, and other professional scenarios not covered by Meta's dataset); 2) Chinese preference data — although the dataset is multilingual, its Chinese coverage depth is limited; 3) Continuous update services — open-source datasets are static, while clients need preference data that continuously updates as models iterate.
P1 RLHF/Alignment Research at High-Density Output for Fourth Consecutive Week, Methodology Moving Toward Personalization and Decoupling (2026-02-16 to 2026-02-19)

Five RLHF/alignment-related papers this week: MARS (2026-02-19) — Margin-Aware reward modeling + self-refined data augmentation, addressing the high cost of preference data; Learning Personalized Agents from Human Feedback (2026-02-18) — introduces the PersonaliZe framework, enabling agents to adapt to dynamic changes in individual preferences; Multi-Objective Alignment for Personalized Psychotherapy (2026-02-17) — multi-objective alignment in psychotherapy scenarios, balancing patient preferences with clinical safety; Interactionless IRL (2026-02-16) — proposes "interaction-free inverse reinforcement learning," decoupling safety objectives from policy to avoid "alignment waste"; Latency-aware HITL-RL (2026-02-17) — embedding human feedback and latency constraints in semantic communication. Common trend across all five papers: moving from "one-size-fits-all alignment" toward "personalized + decoupled + multi-objective + scenario-specific."

Business implications: The refinement of alignment methodology directly changes data requirements: 1) Personalized preference data — no longer "humanity's preferences" but "preferences of specific user groups/individuals," requiring data collection to cover population diversity; 2) Multi-objective labeling — the same sample needs preference labeling across multiple dimensions (safety, helpfulness, personalization, etc.), increasing labeling costs but raising the value per data point; 3) Dynamic preference data — the PersonaliZe framework emphasizes that preferences change over time, meaning preference data needs periodic refreshing, and the "one-time labeling" model will be replaced by "continuous labeling services."
P1 Three Frontier Models Debut in the Same Week: Gemini 3.1 Pro, Sonnet 4.6, Qwen 3.5-397B — Multimodal Arms Race Reaches White Heat (2026-02-16 to 2026-02-19)

Google released Gemini 3.1 Pro (2026-02-19, DeepMind blog: "A smarter model for your most complex tasks"), emphasizing complex task reasoning capabilities; Anthropic released Claude Sonnet 4.6 (2026-02-19, "frontier performance across coding, agents, and professional work at scale"); Qwen 3.5-397B-A17B (2026-02-16, 105K downloads, 754 likes) MoE architecture vision-language model. Concurrently, MiniMax-M2.5 became a community favorite with 123K downloads and 814 likes, and Cerebras released REAP compressed versions (172B-A10B and 139B-A10B). Reddit hot post "Qwen3.5 Plus, GLM 5, Gemini 3.1 Pro, Sonnet 4.6, three new open source agents" (57 votes) confirms the community's sense of model release density.

Business implications: Three frontier models releasing in the same week signals a synchronized explosion in alignment and evaluation data demand. Key areas to watch: 1) Complex task reasoning data — Gemini 3.1 Pro targets "complex tasks," requiring multi-step reasoning and long-chain thinking evaluation and training data; 2) Coding/Agent data — Sonnet 4.6 emphasizes coding and agents, driving up demand for agent behavior trajectories and code reasoning data; 3) Vision-language multimodal data — Qwen 3.5 is a vision-language model, and at 397B scale, its consumption of visual reasoning data is enormous.
P2 GGML/llama.cpp Joins Hugging Face, Local AI Infrastructure Consolidation Accelerates (2026-02-19)

Hugging Face blog announced "GGML and llama.cpp join HF to ensure the long-term progress of Local AI." GGML is the most widely used quantization format for local model inference, and llama.cpp is the most active local inference engine in the community. Concurrent signals: Reddit "Free ASIC Llama 3.1 8B inference at 16,000 tok/s" (318 votes, highest this week), suggesting dedicated hardware-accelerated local inference has crossed the usability threshold; "Kimi K2.5 better than Opus 4.6 on hallucination benchmark" (46 votes) showing local/open-source models challenging closed-source frontiers in specific domains; Snorkel AI demonstrating a 4B model surpassing a 235B model through tool discipline.

Business implications: The consolidation of local AI infrastructure means: 1) Quantized model evaluation data demand — quality loss from quantization needs systematic evaluation, creating a new category of "pre/post-quantization comparison evaluation datasets"; 2) End-to-end on-device fine-tuning data — 16K tok/s ASIC inference + GGML/HF integration moves edge deployment from technical validation to production-ready, and edge-specialized data demand will scale up; 3) Small model alignment data — Snorkel AI's 4B model case proves small models can outperform large ones through precise fine-tuning, but the prerequisite is high-quality vertical domain alignment data.

Demand Signals

Infer training data demands from model releases

Data Type Intensity Trend Related Signals
Multimodal Visual Reasoning Data
Critical ↑ New
Qwen 3.5-397B VLM · GLM-4.6V visual reasoning · Molmo2-MultiImageQA multi-image VQA
RLHF/Preference Alignment Data
Critical ↑ New
Meta 200K+ preference pairs open-sourced · Allen AI 260K DPO pairs · MARS reward modeling self-refinement · PersonaliZe personalized alignment
Agent Behavior/Trajectory Data
High ↑ New
Sonnet 4.6 Agent performance · Snowflake AgentWorldModel-1K · Mistral Vibe CLI/Devstral 2 · OpenAI Codex 61K⭐
Complex Reasoning Evaluation Data
High ↑ New
Gemini 3.1 Pro "complex tasks" · HLE-Verified human ultimate exam correction · MATEO temporal reasoning benchmark
Coding/Code Reasoning Data
High ↑ New
Sonnet 4.6 coding performance · Qwen3 Coder Next · Reddit "surge in LLM coding capabilities" · TAROT code generation RL
Multilingual Data
High → Continuing
ÜberWeb 20T multilingual curation · WaxalNLP African language speech · ParlaCAP 28 European parliaments · Crowdsourcing Piedmontese
Robotics/Embodied AI Data
Moderate ↑ New
NVIDIA NuRec · MolmoSpaces +39.7% growth · Humanoid End-Effector Control · Isaac-GR00T 6.2K⭐
Document OCR Data
Moderate ↑ New
olmOCR-bench · Mistral OCR 3 · PaddleOCR-VL in llama.cpp · amazon/doc_split
Quantization/Compression Evaluation Data
Moderate ↑ New
Cerebras REAP compression of MiniMax · ASIC 16K tok/s inference · INT8 cross-chip precision variance
Safety/Alignment Audit Data
Moderate ↑ New
EleutherAI misalignment-control-sft · Qwen3Guard real-time safety · OpenAI $7.5M alignment research grant
Robotics Manipulation Data ↓ Dropped Present in previous issue, absent this issue
Multimodal Preference Data ↓ Dropped Present in previous issue, absent this issue
Speech/ASR Data ↓ Dropped Present in previous issue, absent this issue
Code Data ↓ Dropped Present in previous issue, absent this issue
Video Understanding Data ↓ Dropped Present in previous issue, absent this issue
GUI/Agent Data ↓ Dropped Present in previous issue, absent this issue

Download Movers

Datasets with the largest download changes this week

Dataset Downloads Weekly Growth
allenai/molmospaces 204 +39.7%

Deep Dive — DataRecipe

This week's 2 high-value datasets reverse-analyzed (auto-generated by DataRecipe)

facebook/EgoAVU_data
300 samples · 6 fields · Medium
6.0/10
🟢 Recommended for Replication

Data Structure

video_id start_time end_time question answer category

Risk Assessment

Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality thresholds
Low Risk Data may become outdated over time → Establish continuous update mechanisms
allenai/olmix
300 samples · 113 fields · Medium
6.5/10
🟢 Recommended for Replication

Data Structure

run name index arc_challenge:rc::olmes arc_easy:rc::olmes basic_skills:rc::olmes basic_skills_arithmetic:rc::olmes basic_skills_coding:rc::olmes basic_skills_common_knowledge:rc::olmes

Risk Assessment

Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality thresholds
Low Risk Data may become outdated over time → Establish continuous update mechanisms

Analyzed 2 datasets this week · 83.9% human effort · all Medium difficulty

Want to discuss this issue?

Kai
Kai Founder & CEO
苏文
苏文 AI Documentation & Release Engineer
陆明哲
陆明哲 AI Product Manager

Auto-generated by AI Dataset Radar · Updated weekly

AI Dataset Radar →