Reinforcement Learning
Loop Solutions

More than just labeling data — we help clients train their models

TIER 1
Know-what
Foundational Data Collection & Labeling
  • General data labeling (text / image / video / audio)
  • Multilingual and cross-cultural content
  • Domain knowledge data production
  • Domain data cleaning and structuring
Teams needing general data production and multilingual labeling
Inquire About Basic Plan →
TIER 2 · Core
Know-how RL
End-to-end Reinforcement Learning Loop
  • RLHF — Reinforcement Learning from Human Feedback
  • Complex reasoning data production and labeling
  • Preference alignment and iterative optimization
  • Code / math / logic reasoning scenarios
  • Hallucination detection and correction
Teams doing RLHF / reasoning training that need expert-in-the-loop iteration
Book a Demo →
TIER 3
Know-why
Third-party Authoritative Evaluation
  • Agent benchmark evaluation and simulation environments
  • Crowdsourced evaluation
  • Vertical industry expert review
  • Model capability comparative analysis
Research institutions needing independent third-party evaluation and benchmark construction
Learn About Evaluation Services →

Core Products

From training loops to authoritative evaluation — covering the entire AI data pipeline

Training Loop

RLHF & Preference Alignment

Pairwise comparison, multi-dimensional scoring, continuous iteration — capturing the subtlest differences in human preferences to train Reward Models to distinguish 'good' from 'better'.

Preference Comparison Reward Model Iterative Optimization
Code / Math / Logic Reasoning

Expert teams with programming, math, and logic backgrounds producing code refactoring reviews, mathematical proof chains, and multi-step inference data — verifiable chains of thought, not just correct answers.

Code Reasoning Mathematical Proofs Chain of Thought
Hallucination Detection

Multi-dimensional hallucination taxonomy — fabrication, date confusion, numerical misattribution, logical inference errors — with root cause analysis and evidence chains for every hallucination.

Fact-checking Root Cause Analysis Evidence Chains

Evaluation & Expert

Evaluation & Benchmarking

From ARC-AGI 2 to Humanity's Last Exam, producing high-difficulty evaluation datasets for leading institutions. Agent evaluation, crowdsourced evaluation, expert review.

Agent Evaluation Benchmark Datasets Expert Question Design
Expert Domain Data

A cross-disciplinary expert network covering high-barrier domains such as medicine, law, finance, math, and physics — real practitioners, not generalist annotators.

Domain Experts Vertical Industries Cross-disciplinary
Custom Solutions

Multimodal labeling, multilingual localization, novel task types — we define the problem, design the workflow, and deliver results together with you.

Custom Multimodal Multilingual

Engagement Process

From requirements to scale delivery — every step is measurable

01
Requirements Discussion
1–2 days

Deep understanding of business scenarios and model objectives, clarifying data types, quality standards, and delivery timelines.

02
Solution Design
2–3 days

Defining labeling specifications, quality metrics, and acceptance criteria; designing task workflows and expert team configuration.

03
Pilot Validation
3–5 days

Small-batch pilot labeling to align on standards and quality expectations. Once confirmed, scale production begins.

Both parties sign off on quality standards
04
Scale Production
On-demand

Expert teams working in parallel with multi-layer QA and real-time monitoring, ensuring delivery speed and data consistency.

05
Continuous Iteration
Long-term

Continuously optimizing data strategy based on model training feedback, forming a data → training → evaluation closed loop.

Data → Training → Evaluation loop

Customer Cases

From leading tech giants to world-class AI research institutions — we let our delivery speak

Agent Evaluation · Exclusive
A Leading AI Research Lab

The client was rapidly iterating complex Agent applications (data analysis, writing, presentations), benchmarking against top products, and needed an evaluation-grade data team capable of fine-grained, logic-heavy assessment.

Solutions
Evaluation Co-creation
  • Deeply involved in defining evaluation standards
  • Full coverage from visual layout to deep logical reasoning
  • Fine-grained evaluation insights driving product leadership
Delivery Results
  • Sole data vendor handling all high-priority core evaluations
  • Accumulated thousands of high-value alignment records
  • Effectively supported new strategy iteration and launch
Sole exclusive data vendor · Thousands of high-value alignment records · Powering an internal star product
HLE · ICL · Multi-attachment
A Leading AI Research Lab

As the foundation model pushed into uncharted territory, the client urgently needed high-quality data for HLE (Human-Level Evaluation), complex multi-attachment processing, and ICL (In-Context Learning).

Solutions
Frontier Scenarios
  • HLE human-level evaluation data construction
  • Complex multi-attachment scenario processing
  • ICL high-difficulty logic chain data orchestration
Delivery Results
  • Filled gaps in the client's frontier evaluation sets
  • Advanced long-context understanding and complex instruction following
  • Demonstrated exceptional business understanding
Filled frontier evaluation gaps · Cracked high-difficulty ICL logic chains · Advanced long-context understanding
3D Skeleton Annotation · Research-grade
A National Research Laboratory

For primate behavioral research, the lab needed high-precision 3D skeleton keypoint annotation of complex videos showing primates grasping objects. Research-grade 3D spatial annotation with extremely low tolerance for error.

Solutions
Annotation Approach
  • Primate 3D motion skeleton keypoint annotation
  • High-precision 3D spatial annotation
  • Strict adherence to research-grade precision standards
Delivery Results
  • Sole designated vendor for a national-level research project
  • Consistently delivered high-precision data
  • Directly contributed to major research publications
Sole designated data vendor · Research-grade spatial precision · Directly contributed to top-journal publications
Rapid Response · English Expert Team
An Overseas AI Data Service Provider

Facing a sudden surge in overseas business volume, the client needed to rapidly assemble a large-scale professional English annotation team with stringent language requirements (TEM-4/TEM-8 certified).

Solutions
Rapid Response
  • 100+ TEM-8 certified annotators onboarded in 3 days
  • Mobilized 200+ full-time and 1,000+ crowdsource reserves
  • Response speed far exceeded client expectations
Quality Assurance
  • Exceptionally high delivery consistency
  • Data rework rate strictly controlled under 5%
  • Significantly reduced client's secondary QA effort
100+ full-time English annotators onboarded in 3 days · 1,000+ crowdsource reserve · Rework rate < 5%
Code Refactoring · RLHF
A Leading AI Research Lab

The client needed human expert review to score AI-generated code refactoring proposals on preference dimensions, improving model code readability and refactoring quality.

Solutions
Task Design
  • Assembled expert teams with programming backgrounds
  • Designed multi-dimensional preference evaluation framework
  • Established quantitative code readability scoring criteria
RL Loop
  • Experts performed pairwise preference comparison and scoring
  • Continuous iterative training to optimize the Reward Model
  • Significant improvement in code readability scores
Code readability score improved by 23% · Reward Model convergence rounds reduced by 40%
Hallucination Detection
A Leading AI Research Lab

Addressing common hallucination issues in large language models through multi-dimensional cross-validation between generated content and reference materials for hallucination identification and root cause analysis.

Solutions
Hallucination Taxonomy
  • Fabrication
  • Date confusion
  • Numerical misattribution
  • Factual misattribution
  • Logical inference errors
Reasoning Chain Data
  • Labeling hallucinations with reasoning evidence
  • Labeling contradictions between reference materials
  • Labeling consistency between real content and references
  • Producing hard-to-distinguish hallucination cause data
Hallucination detection accuracy 94.7% · 5 hallucination root cause categories covered
Abstract Reasoning · ARC-AGI 2
ARC-AGI 2 Abstract Reasoning Dataset

Producing abstract reasoning evaluation datasets to measure AI systems' general intelligence — one of the benchmarks closest to AGI.

Solutions
Data Design
  • Designing visual and logical reasoning tasks
  • Constructing multi-level abstract reasoning problems
  • Ensuring problems pose genuine challenges for AI
Quality Control
  • Human expert cross-validation
  • Ensuring logical consistency and unambiguity
  • Multi-round iterative selection of high-quality samples
400+ abstract reasoning tasks covered · One of the world's top benchmarks
Extreme Evaluation · HLE
Humanity's Last Exam Dataset

Contributing to the 'Humanity's Last Exam' dataset — questions created by world-class experts, specifically designed to test the upper limits of LLM capabilities.

Solutions
Expert Network
  • Organizing cross-disciplinary domain experts
  • Covering high-barrier fields: mathematics, physics, law, and more
  • Ensuring problems exceed current strongest model capabilities
Data Standards
  • Standardized answers and scoring rubrics
  • Multi-round expert review to eliminate disputes
  • Producing high-quality reasoning process data
30+ disciplines · 100+ expert contributors · Current best models score < 10%
Agent Evaluation
Agent Evaluation & Simulation Environments

Building automated Agent evaluation pipelines for clients, systematically assessing task completion and tool-calling accuracy in simulation environments.

Solutions
Evaluation Framework
  • Building automated evaluation workflows
  • Designing multi-dimensional Agent capability metrics
  • Constructing reproducible simulation test environments
Continuous Iteration
  • Feeding evaluation results back into model training
  • Continuously expanding evaluation scenario coverage
  • Human-AI collaborative calibration of evaluation standards
50+ evaluation scenarios · 3 categories of Agent tool-calling accuracy tracking

Talk to the Right Person

Dedicated contacts for every area — human + AI employees responding together

李东耕" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
李东耕
Delivery Manager
陆明哲" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
陆明哲 AI
产品经理
林锐" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
林锐 AI
代码审查与重构顾问
程薇" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
程薇 AI
测试工程师
罗清河" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
罗清河 AI
数据工程师
赵云帆" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
赵云帆 AI
后端工程师

FAQ

How are you different from regular data labeling companies?

We don't just label data — we help clients train their models. Through RLHF preference alignment, chain-of-thought labeling, and RL loops, we directly participate in model training iteration, not just data production.

What does the RLHF data labeling process look like?

Our expert teams perform pairwise comparisons and multi-dimensional scoring of model outputs, generating preference data to train Reward Models. Through continuous iteration, we progressively optimize model performance.

What languages and domains do you support?

We support multilingual labeling including Chinese, English, Japanese, Korean, and more, covering 40+ vertical domains such as code, math, law, medicine, and finance. Our AntGather Community includes 10,000+ labeling experts with professional backgrounds.

How do you ensure data quality?

Multi-layer quality control: expert cross-validation, consistency checks, automated anomaly detection, and continuous iterative training. All data undergoes at least two rounds of human review.

What is Knowlyr?

Knowlyr (集识光年) is an AI data infrastructure company headquartered in Shanghai, founded in 2025. It provides RLHF training data, expert evaluation, and human feedback services for frontier AI models. Knowlyr operates an expert network of 10,000+ professionals across 40+ domains and offers 8 open-source tools with 110 MCP endpoints.

How is Knowlyr different from Scale AI or Surge AI?

While Scale AI and Surge AI focus primarily on data labeling at scale, Knowlyr specializes in human judgment infrastructure — the harder problems that require deep domain expertise. Knowlyr provides end-to-end RLHF training loops (not just annotation), independent third-party AI evaluation, and a fully open-source MCP-native toolchain. The core difference: Knowlyr participates in model training iteration, not just data production.

What is RLHF and how does Knowlyr support it?

RLHF (Reinforcement Learning from Human Feedback) is a technique for training AI models using human preference data. Knowlyr provides the complete RLHF loop: expert teams perform pairwise comparisons and multi-dimensional scoring of model outputs, generating preference data to train Reward Models. This iterative process progressively aligns model behavior with human values. Knowlyr covers code, math, reasoning, and alignment scenarios.

A Leading AI Research Lab JD.com Baidu Vipshop ATRenew