Radar Brief Week 12, 2026 · 2026-02-13 — 2026-02-20

Multimodal Alignment Data Arms Race
Allen AI Defines Pre-training Data Methodology

This week scanned 86 HF orgs · 50 GitHub orgs · 71 blogs · 125 X accounts

0
Valuable Datasets
0
Related Papers
0
Blog Posts
0
Active Repos
One-line Summary

Allen AI releases 5 datasets + Olmix data mixing framework, systematically defining pre-training data methodology; Meta open-sources 200K+ multilingual multi-turn preference dataset, RLHF data public supply upgraded; RLHF/alignment research enters 4th consecutive week of high-density output, methodology moves toward personalization and decoupling. Top data demand signal this week: Multimodal Visual Reasoning Data.

Key Findings

This week's 5 high commercial value findings

P0 Allen AI Releases 5 Datasets + Olmix Data Mixing Framework, Systematically Defining Pre-training Data Methodology (2026-02-11 to 2026-02-17)

Allen AI released 5 datasets and 8 models this week, the highest single-week output among research institutions. Key highlights: allenai/olmix (2026-02-11, 238 downloads, 18 likes) — providing proxy run swarm data for OLMo pre-training, systematically solving the core pre-training question of "what ratio to mix different domain data for optimal results"; allenai/Dolci-Instruct-DPO (2,498 downloads) — 260K preference pairs for OLMo 3 Instruct 7B alignment training, ODC-BY license; allenai/olmOCR-bench (2,745 downloads, 58 likes) — 1,403 PDFs + 7,010 unit tests, establishing PDF-to-Markdown OCR system Evaluation Benchmark; allenai/Molmo2-MultiImageQA (194 downloads) — multi-image visual QA instruction fine-tuning dataset; allenai/molmospaces (204 downloads, +39.7% week-over-week growth) — embodied AI 3DGUT/USD resources updated for Isaac Sim compatible format. Companion blog posts published simultaneously: Olmix data mixing framework details, AutoDiscovery automated scientific discovery, MolmoSpaces ecosystem introduction, How2Everything real-world procedure evaluation.

Business implications: Allen AI has leapt from "releasing individual datasets" to "outputting data methodology" — Olmix's swarm data mixing method will change the engineering practice of pre-training data ratio optimization. Data service providers should focus on: 1) Data mixing optimization as a service — helping clients find optimal training ratios; 2) OCR evaluation benchmark standardization — olmOCR-bench may become the de facto standard in the document AI field; data suppliers should calibrate document labeling quality accordingly; 3) DPO preference data public supply — 260K open-source DPO data compresses commercial space for low-quality preference data; differentiated competition must focus on vertical domains.
P0 Meta Open-Sources 200K+ Multilingual Multi-Turn Preference Dataset, RLHF Data Public Supply Upgraded (2025-05-13 First Release, Entering Monitoring Scope This Week)

facebook/community-alignment-dataset (194 downloads, 39 likes, cc-by-4.0) — 200K+ LLM response comparison data from 3,000+ global annotators, covering multilingual and multi-turn conversation scenarios. This is Meta's largest-scale multilingual preference dataset. Also released facebook/actionbench (2026-02-19, 2 downloads) — 128 video-animation point cloud paired samples for evaluating video-to-animated 3D mesh generation. The two datasets represent Meta's positioning on both "text alignment" and "video-3D multimodal" data fronts.

Business implications: community-alignment-dataset's cc-by-4.0 license means anyone can use it for commercial training — good news for small and mid-sized model vendors but a direct hit to preference data suppliers. Differentiation directions: 1) Vertical industry preference data (medical, legal, financial and other professional scenarios not covered by Meta's dataset); 2) Chinese preference data — while multilingual, the dataset's Chinese coverage depth is limited; 3) Continuous update service — open-source datasets are static, while clients need preference data that continuously updates alongside model iterations.
P1 RLHF/Alignment Research Enters 4th Consecutive Week of High-Density Output, Methodology Moves Toward Personalization and Decoupling (2026-02-16 to 2026-02-19)

5 RLHF/alignment papers this week: MARS (2026-02-19) — Margin-Aware reward modeling + self-refining data augmentation, addressing high cost of preference data; Learning Personalized Agents from Human Feedback (2026-02-18) — introducing PersonaliZe framework for Agents adapting to dynamic personal preference changes; Multi-Objective Alignment for Personalized Psychotherapy (2026-02-17) — multi-objective alignment in psychotherapy, balancing patient preferences with clinical safety; Interactionless IRL (2026-02-16) — proposing "interaction-free inverse reinforcement learning," decoupling safety objectives from policy to avoid "alignment waste"; Latency-aware HITL-RL (2026-02-17) — embedding human feedback and latency constraints in semantic communication. Common trend across all five papers: moving from "one-size-fits-all alignment" to "personalized + decoupled + multi-objective + scenario-specific."

Business implications: Refinement of alignment methodology directly changes data requirements: 1) Personalized preference data — no longer "all of humanity's preferences" but "preferences of specific user groups/individuals"; data collection needs to cover population diversity; 2) Multi-objective annotation — the same sample requires preference annotations across multiple dimensions (safety, helpfulness, personalization, etc.); annotation costs rise but per-sample data value increases; 3) Dynamic preference data — PersonaliZe framework emphasizes preferences change over time, meaning preference data needs periodic refreshing; "one-time labeling" models will be replaced by "continuous labeling services."
P1 Three Frontier Models Debut in Same Week: Gemini 3.1 Pro, Sonnet 4.6, Qwen 3.5-397B, Multimodal Arms Race Reaches Fever Pitch (2026-02-16 to 2026-02-19)

Google releases Gemini 3.1 Pro (2026-02-19, DeepMind blog: "A smarter model for your most complex tasks"), emphasizing complex task reasoning; Anthropic releases Claude Sonnet 4.6 (2026-02-19, "frontier performance across coding, agents, and professional work at scale"); Qwen 3.5-397B-A17B (2026-02-16, 105K downloads, 754 likes) MoE architecture vision-language model. Meanwhile MiniMax-M2.5 with 123K downloads, 814 likes becomes community favorite; Cerebras releases REAP compressed versions (172B-A10B and 139B-A10B). Reddit post "Qwen3.5 Plus, GLM 5, Gemini 3.1 Pro, Sonnet 4.6, three new open source agents" (57 upvotes) confirms the community's sense of model release density.

Business implications: Three frontier models releasing in the same week means the next wave of alignment and evaluation data demand will surge simultaneously. Key areas: 1) Complex task reasoning data — Gemini 3.1 Pro targets "complex tasks," needing multi-step reasoning, long chain-of-thought evaluation and training data; 2) Coding/Agent data — Sonnet 4.6 emphasizes coding and agents; Agent behavior trajectory and code reasoning data demand rises; 3) Visual-language multimodal data — Qwen 3.5 is a vision-language model; 397B scale means massive visual reasoning data consumption.
P2 GGML/llama.cpp Join Hugging Face, Local AI Infrastructure Consolidation Accelerates (2026-02-19)

Hugging Face blog announces "GGML and llama.cpp join HF to ensure the long-term progress of Local AI." GGML is the most widely used quantization format for local model inference; llama.cpp is the community's most active local inference engine. Concurrent signals: Reddit "Free ASIC Llama 3.1 8B inference at 16,000 tok/s" (318 upvotes, week's highest), suggesting dedicated hardware-accelerated local inference has crossed the usability threshold; "Kimi K2.5 better than Opus 4.6 on hallucination benchmark" (46 upvotes) shows local/open-source models challenging closed-source frontier in specific domains; Snorkel AI demonstrates 4B model outperforming 235B model through tool discipline.

Business implications: Local AI infrastructure consolidation means: 1) Quantized model evaluation data demand — quality loss from quantization needs systematic evaluation; "pre/post-quantization comparison evaluation datasets" is a new category; 2) Edge scenario fine-tuning data — 16K tok/s ASIC inference + GGML/HF consolidation moves edge deployment from technical validation to production-ready, scaling edge-specific data demand; 3) Small model alignment data — Snorkel AI's 4B model case proves small models can outperform large ones through precise fine-tuning, but the prerequisite is high-quality vertical domain alignment data.

Demand Signals

Infer training data demands from model releases

Data Type Intensity Trend Related Signals
Multimodal Visual Reasoning Data Qwen 3.5-397B VLM (105K downloads), GLM-4.6V visual reasoning, Molmo2-MultiImageQA multi-image VQA ↑ New Very Strong
RLHF Preference Alignment Data Meta 200K+ preference pairs open-sourced, Allen AI 260K DPO pairs, MARS reward modeling self-refinement, PersonaliZe personalized alignment ↑ New Very Strong
Agent Behavior/Trajectory Data Sonnet 4.6 Agent performance, Snowflake AgentWorldModel-1K, Mistral Vibe CLI/Devstral 2, OpenAI Codex 61K stars → Continuing Strong
Complex Reasoning Evaluation Data Gemini 3.1 Pro "complex tasks", HLE-Verified human ultimate exam corrections, MATEO temporal reasoning benchmark ↑ New Strong
Coding/Code Reasoning Data Sonnet 4.6 coding performance, Qwen3 Coder Next, Reddit "surge in LLM coding capabilities", TAROT code generation RL ↑ New Strong
Multilingual Data UberWeb 20T multilingual curation, WaxalNLP African language speech, ParlaCAP 28 European parliaments, Crowdsourcing Piedmontese ↑ New Strong
Robotics/Embodied AI Data NVIDIA NuRec (849 downloads), MolmoSpaces +39.7% growth, Humanoid End-Effector Control, Isaac-GR00T 6.2K stars ↑ New Medium
Document OCR Data olmOCR-bench (2,745 downloads), Mistral OCR 3, PaddleOCR-VL in llama.cpp, amazon/doc_split ↑ New Medium
Quantization/Compression Evaluation Data Cerebras REAP compressed MiniMax, ASIC 16K tok/s inference, INT8 cross-chip precision variance (Reddit 251 upvotes) ↑ New Medium
Safety/Alignment Audit Data EleutherAI misalignment-control-sft, Qwen3Guard real-time safety, OpenAI $7.5M alignment research grants ↑ New Medium
Robotics VLA Trajectory Data ↓ Dropped Present in previous issue, absent this issue
RL Training/Alignment Data ↓ Dropped Present in previous issue, absent this issue
Chinese LLM Alignment Data ↓ Dropped Present in previous issue, absent this issue
Multilingual Evaluation Data ↓ Dropped Present in previous issue, absent this issue
Real-time Safety Labeling Data ↓ Dropped Present in previous issue, absent this issue
Visual Reasoning Data ↓ Dropped Present in previous issue, absent this issue
Sim-to-Real Paired Data ↓ Dropped Present in previous issue, absent this issue
Audio/Speech Data ↓ Dropped Present in previous issue, absent this issue
Image Editing Instruction Data ↓ Dropped Present in previous issue, absent this issue

Download Movers

Datasets with the largest download changes this week

Dataset Downloads Weekly Growth
allenai/molmospaces 204 +39.7%

Deep Dive — DataRecipe

This week's 2 high-value datasets reverse-analyzed (auto-generated by DataRecipe)

facebook/EgoAVU_data
300 samples · 6 fields · Medium
6.0/10
🟢 Recommended to Replicate

Data Structure

video_id start_time end_time question answer category

Risk Assessment

Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality thresholds
Low Risk Data may become outdated over time → Establish continuous update mechanisms
allenai/olmix
300 samples · 113 fields · Medium
6.5/10
🟢 Recommended to Replicate

Data Structure

run name index arc_challenge:rc::olmes arc_easy:rc::olmes basic_skills:rc::olmes basic_skills_arithmetic:rc::olmes basic_skills_coding:rc::olmes basic_skills_common_knowledge:rc::olmes

Risk Assessment

Medium Risk Labeling quality may fluctuate → Establish rigorous QA processes with quality thresholds
Low Risk Data may become outdated over time → Establish continuous update mechanisms

2 datasets analyzed this week · 83.9% human labor share · All Medium difficulty

Want to discuss this issue?

Kai" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
Kai Founder & CEO
苏文" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
苏文 AI 文档与发布工程师
陆明哲" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
陆明哲 AI 产品经理

Auto-generated by AI Dataset Radar · Updated weekly

AI Dataset Radar →