Radar Brief Week 17, 2026 · 2026-03-13 — 2026-03-20

Allen AI Releases 4 MolmoPoint Datasets and Models in a Row
Fine-grained human judgment Becomes Fuel for Multimodal Agents

This week scanned 86 HF orgs · 50 GitHub orgs · 71 blogs · 125 X accounts

0
Valuable Datasets
0
Related Papers
0
Blog Posts
0
Active Repos
One-line Summary

Allen AI released 4 MolmoPoint-related datasets/models consecutively from 2026-03-15 to 2026-03-17, with video and GUI pointing to data-intensive growth [P0]; NVIDIA simultaneously disclosed RL and SFT training data from 2026-03-18 to 2026-03-19, accelerating the assetization of post-training data [P0]; NVIDIA's robotics and Physical AI datasets continue to lead in downloads, with teleoperation demonstrations becoming the strongest public demand signal [P1]. This week's strongest data demand signal: video understanding/tracking data.

Key Findings

This week's 5 high commercial value findings

P0 Allen AI released 4 MolmoPoint-related datasets/models consecutively from 2026-03-15 to 2026-03-17, with video and GUI pointing to data-intensive growth [P0]

Allen AI released allenai/MolmoPoint-TrackSyn on 2026-03-15, with 94 downloads and 2 likes; on the same day, it also released allenai/MolmoPoint-TrackAny, with 108 downloads and 2 likes. On 2026-03-16, it released the model allenai/MolmoPoint-8B, with 289 downloads and 11 likes. On 2026-03-17, it released the models allenai/MolmoPoint-GUI-8B and allenai/MolmoPoint-Vid-4B, with 91 downloads each. Previously, the related dataset allenai/MolmoPoint-GUISyn was released on 2026-02-24, with 265 downloads and 6 likes; allenai/Molmo2-VideoPoint has now reached 440 downloads, up +22 from the previous period.

Business implication → This indicates that multimodal Agents have shifted from “image-based QA” to fine-grained execution capabilities such as “pointing, tracking, GUI grounding, and video grounding.” The training core is no longer just massive volumes of raw content, but human judgment signals carrying spatial locations, temporal trajectories, and intent references. For Knowlyr, this is a high-value opportunity: building a task network around video point selection, object trajectory verification, GUI element alignment, and natural language reference resolution, where people can earn income by contributing judgment. These data types are still difficult to cover reliably in real-world ambiguous settings using synthetic data alone.
P0 NVIDIA simultaneously disclosed RL and SFT training data from 2026-03-18 to 2026-03-19, accelerating the assetization of post-training data [P0]

nvidia/Nemotron-Cascade-2-RL-data was released on 2026-03-18, with 15 downloads and 12 likes; nvidia/Nemotron-Cascade-2-SFT-Data was released on 2026-03-19, with 32 downloads and 10 likes. The corresponding paper, "Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation," was released on 2026-03-19. The dataset description explicitly includes instruction-following RL, multi-domain RL, on-policy distillation, and software engineering RL. During the same period, nvidia/Nemotron-RL-bixbench_hypothesis was released on 2026-03-14, with 2,534 downloads and 4 likes.

Business implication → Leading vendors are starting to directly open up post-training data mixtures, meaning the competitive focus is shifting from “whether you have a model” to “whether you have reusable, auditable, and continuously updatable training recipe data.” While these datasets appear to be text on the surface, they are in essence compressed outputs of human judgment on preferences, refusal boundaries, task completion quality, code repair quality, and more. Knowlyr can prioritize RLHF/RLAIF data production and evaluation review services, especially in three judgment-intensive scenarios: code Agents, complex instruction following, and safe refusal in high-risk domains.
P1 NVIDIA robotics and Physical AI datasets continue to lead in downloads, with teleoperation demonstrations becoming the strongest public demand signal [P1]

nvidia/PhysicalAI-Robotics-Open-H-Embodiment was released on 2026-02-06, with 37,433 downloads and 8 likes; nvidia/PhysicalAI-Robotics-Manipulation-Kitchen-Demos was released on 2026-02-10, with 20,849 downloads and 38 likes, and the dataset includes 600 hours of human teleoperation demonstrations, 316 tasks, and 55k trajectories. The larger-scale nvidia/PhysicalAI-Autonomous-Vehicles has reached 214,152 downloads and 785 likes. On Meta's side, facebook/ego-1k was released on 2026-01-29, with 5,903 downloads, further strengthening egocentric 3D/multiview data.

Business implication → Robotics and autonomous driving have not reduced the value of real-world data because of better simulation; instead, they rely even more on high-quality demonstrations, temporal alignment, and failure-case coverage. Signals such as “how humans operate,” “when to intervene,” and “what defines the boundary of executable actions” are fundamentally derived from human judgment. Knowlyr can focus embodied data opportunities into three layers: teleoperation demonstration collection, temporal text explanation, and failure/risky action review, creating higher-margin data services than pure collection alone.
P1 Chinese open-source models are starting to directly release training datasets, making SFT and reward evaluation more transparent [P1]

stepfun-ai/Step-3.5-Flash-SFT was released on 2026-03-14, with 27,044 downloads and 260 likes, making it one of the highest-downloaded new SFT datasets this week. Its tags cover chat, sft, instruction-tuning, reasoning, and code. InternLM released internlm/VC-RewardBench on 2026-03-12, with 1,810 downloads and 6 likes, and simultaneously released the internlm/Visual-ERM model, whose tags directly reference dataset:internlm/VC-RewardBench. internlm/EndoCoT-Data was released on 2026-03-11, with 1,764 downloads and 6 likes, ranking first among this week's Download Movers.

Business implication → Chinese teams are shifting from “releasing models” to “releasing training data and evaluation data,” with greater emphasis on verifiable scenarios such as visual rewards, code, and chain-of-thought reasoning. This creates two opportunities for data service providers: first, taking on cleaning, slicing, and review needs for vertical-domain SFT data from model vendors; second, building reward model benchmarks and post-render comparison datasets. In particular, visual coding, UI reconstruction, and image editing quality judgment still rely heavily on human judgment standards rather than automated scoring.
P2 Multiple papers from 2026-03-17 to 2026-03-19 simultaneously shift toward “observational feedback, negative feedback, and slice governance,” signaling a change in preference data collection paradigms [P2]

CausalRM, published on 2026-03-19, proposes learning reward models from observational user feedback. MOSAIC, also published on 2026-03-19, discusses multi-objective slice-aware iterative curation. Efficient Exploration at Scale, published on 2026-03-18, emphasizes online updates to choice data. Via Negativa for AI Alignment, published on 2026-03-17, argues that negative-only feedback can approach or surpass standard RLHF. HIPO, also published on 2026-03-17, focuses on hierarchical instruction adherence. During the same period, Anthropic released news on large-scale qualitative user feedback from “81,000 people.”

Business implication → Preference data is no longer limited to traditional binary labeling, but is shifting toward real user behavior, negative constraints, scenario slices, and hierarchical rules. This means the barrier to entry in the data industry is rising: it is no longer about simply collecting feedback, but about designing feedback structures, filtering for high-information-density samples, and establishing consistency standards. Knowlyr can productize human judgment into services for preference experiment design, negative feedback collection, boundary case mining, and slice-level quality control.

Demand Signals

Infer training data demands from model releases

Data Type Intensity Trend Related Signals
Video understanding/tracking data
Very strong ↑ New
Allen AI released MolmoPoint-TrackSyn and MolmoPoint-TrackAny on 2026-03-15, and released MolmoPoint-Vid-4B on 2026-03-17
GUI grounding and mobile operation data
Very strong ↑ New
allenai/MolmoPoint-GUISyn has 265 downloads; facebook/DigiData targets mobile control agents, with 272 downloads
Post-training RL/preference data
Very strong ↑ New
nvidia/Nemotron-Cascade-2-RL-data was released on 2026-03-18; the related paper was released on 2026-03-19
General SFT and code reasoning data
Strong ↑ New
stepfun-ai/Step-3.5-Flash-SFT has 27,044 downloads, covering reasoning and code
Robotics teleoperation demonstration data
Very strong ↑ New
nvidia/PhysicalAI-Robotics-Open-H-Embodiment has 37,433 downloads; Kitchen-Demos has 20,849 downloads
Visual reward and verifiable evaluation benchmarks
Strong ↑ New
internlm/VC-RewardBench has 1,810 downloads and is directly referenced by internlm/Visual-ERM
Long-video audiovisual evaluation benchmarks
Strong ↑ New
nvidia/MMOU has 504 downloads; the paper LVOmniBench was released on 2026-03-19
High-quality multilingual translation evaluation
Medium ↑ New
facebook/bouquet has 1,721 downloads, covers 8 languages, and is handcrafted by linguists
Persona and social distribution simulation data
Medium ↑ New
nvidia/Nemotron-Personas-France has 3,147 downloads · 62 likes, emphasizing grounded personas
Observational user feedback data
Strong ↑ New
The paper CausalRM proposed reward modeling based on clicks · copies · upvotes on 2026-03-19
Robotics teleoperation demonstration data ↓ Dropped Present in previous issue, absent this issue
Cross-embodiment robot trajectories ↓ Dropped Present in previous issue, absent this issue
Code Agent trajectories and patch data ↓ Dropped Present in previous issue, absent this issue
Preference alignment and disagreement data ↓ Dropped Present in previous issue, absent this issue
Factuality and scientific agent evaluation ↓ Dropped Present in previous issue, absent this issue
Video pointing and temporal grounding data ↓ Dropped Present in previous issue, absent this issue
Multilingual speech data ↓ Dropped Present in previous issue, absent this issue
Retrieval and RAG synthetic data ↓ Dropped Present in previous issue, absent this issue
Medical reasoning and endoscopy data ↓ Dropped Present in previous issue, absent this issue
Privacy redaction and PII labeling data ↓ Dropped Present in previous issue, absent this issue

Download Movers

Datasets with the largest download changes this week

Dataset Downloads Weekly Growth
nvidia/HiLiftAeroML 1,200 +66.4%
laion/majestrino-data 7,837 +28.4%
allenai/asta-summary-citation-counts 509 +11.6%
allenai/Molmo2-VideoPoint 440 +5.3%
internlm/EndoCoT-Data 1,764 new

Want to discuss this issue?

Kai
Kai Founder & CEO
苏文
苏文 AI Documentation & Release Engineer
陆明哲
陆明哲 AI Product Manager

Auto-generated by AI Dataset Radar · Updated weekly

AI Dataset Radar →