Reinforcement Learning
Loop Solutions

Judgment infrastructure deployed in model training

Book a Demo Explore Infrastructure

TIER 1

Data Production Judgment

Foundational Data for Model Training

General data labeling (text / image / video / audio)
Multilingual and cross-cultural content
Domain knowledge data production
Domain data cleaning and structuring

Teams needing general data production and multilingual labeling

Inquire About Basic Plan →

TIER 2 · Core

Preference Alignment Judgment

Human Feedback for Model Optimization

RLHF — Reinforcement Learning from Human Feedback
Complex reasoning data production and labeling
Preference alignment and iterative optimization
Code / math / logic reasoning scenarios
Hallucination detection and correction

Teams doing RLHF / reasoning training that need expert-in-the-loop iteration

Book a Demo →

TIER 3

Capability Boundary Judgment

Extreme Challenges for Model Breakthroughs

Extreme challenge questions beyond current strongest models
Abstract reasoning and complex scenario construction
Frontier evaluation dataset production (HLE / ARC-AGI)
Human-level evaluation (HLE) and in-context learning (ICL)

Research institutions exploring model capability limits and constructing high-difficulty evaluation sets

Learn About Frontier Evaluation →

TIER 4

Capability Evaluation Judgment

Systematic Assessment for Model Iteration

Agent benchmark evaluation and simulation environments
Multi-dimensional capability metrics
Automated evaluation workflows
Human-AI collaborative evaluation standard calibration

Teams needing independent third-party evaluation and Agent benchmark construction

Learn About Evaluation Services →

Core Products

From training loops to authoritative evaluation — covering the entire AI data pipeline

RLHF & Preference Alignment

Pairwise comparison, multi-dimensional scoring, continuous iteration — capturing the subtlest differences in human preferences to train Reward Models to distinguish 'good' from 'better'.

Preference Comparison Reward Model Iterative Optimization

Code / Math / Logic Reasoning

Expert teams with programming, math, and logic backgrounds producing code refactoring reviews, mathematical proof chains, and multi-step inference data — verifiable chains of thought, not just correct answers.

Code Reasoning Mathematical Proofs Chain of Thought

Hallucination Detection

Multi-dimensional hallucination taxonomy — fabrication, date confusion, numerical misattribution, logical inference errors — with root cause analysis and evidence chains for every hallucination.

Fact-checking Root Cause Analysis Evidence Chains

Evaluation & Benchmarking

From ARC-AGI 2 to Humanity's Last Exam, producing high-difficulty evaluation datasets for leading institutions. Agent evaluation, crowdsourced evaluation, expert review.

Agent Evaluation Benchmark Datasets Expert Question Design

Expert Domain Data

A cross-disciplinary expert network covering high-barrier domains such as medicine, law, finance, math, and physics — real practitioners, not generalist annotators.

Domain Experts Vertical Industries Cross-disciplinary

Custom Solutions

Multimodal labeling, multilingual localization, novel task types — we define the problem, design the workflow, and deliver results together with you.

Custom Multimodal Multilingual

Engagement Process

From requirements to scale delivery — every step is measurable

Requirements Discussion

1–2 days

Deep understanding of business scenarios and model objectives, clarifying data types, quality standards, and delivery timelines.

Solution Design

2–3 days

Defining labeling specifications, quality metrics, and acceptance criteria; designing task workflows and expert team configuration.

Pilot Validation

3–5 days

Small-batch pilot labeling to align on standards and quality expectations. Once confirmed, scale production begins.

Both parties sign off on quality standards

Scale Production

On-demand

Expert teams working in parallel with multi-layer QA and real-time monitoring, ensuring delivery speed and data consistency.

Continuous Iteration

Long-term

Continuously optimizing data strategy based on model training feedback, forming a data → training → evaluation closed loop.

Data → Training → Evaluation loop

Customer Cases

From leading tech giants to world-class AI research institutions — we let our delivery speak

Agent Evaluation · Exclusive

A Leading AI Research Lab

The client was rapidly iterating complex Agent applications (data analysis, writing, presentations), benchmarking against top products, and needed an evaluation-grade data team capable of fine-grained, logic-heavy assessment.

Solutions

Evaluation Co-creation

Deeply involved in defining evaluation standards
Full coverage from visual layout to deep logical reasoning
Fine-grained evaluation insights driving product leadership

Delivery Results

Sole data vendor handling all high-priority core evaluations
Accumulated thousands of high-value alignment records
Effectively supported new strategy iteration and launch

Sole exclusive data vendor · Thousands of high-value alignment records · Powering an internal star product

HLE · ICL · Multi-attachment

A Leading AI Research Lab

As the foundation model pushed into uncharted territory, the client urgently needed high-quality data for HLE (Human-Level Evaluation), complex multi-attachment processing, and ICL (In-Context Learning).

Solutions

Frontier Scenarios

HLE human-level evaluation data construction
Complex multi-attachment scenario processing
ICL high-difficulty logic chain data orchestration

Delivery Results

Filled gaps in the client's frontier evaluation sets
Advanced long-context understanding and complex instruction following
Demonstrated exceptional business understanding

Filled frontier evaluation gaps · Cracked high-difficulty ICL logic chains · Advanced long-context understanding

3D Skeleton Annotation · Research-grade

A National Research Laboratory

For primate behavioral research, the lab needed high-precision 3D skeleton keypoint annotation of complex videos showing primates grasping objects. Research-grade 3D spatial annotation with extremely low tolerance for error.

Solutions

Annotation Approach

Primate 3D motion skeleton keypoint annotation
High-precision 3D spatial annotation
Strict adherence to research-grade precision standards

Delivery Results

Sole designated vendor for a national-level research project
Consistently delivered high-precision data
Directly contributed to major research publications

Sole designated data vendor · Research-grade spatial precision · Directly contributed to top-journal publications

Rapid Response · English Expert Team

An Overseas AI Data Service Provider

Facing a sudden surge in overseas business volume, the client needed to rapidly assemble a large-scale professional English annotation team with stringent language requirements (TEM-4/TEM-8 certified).

Solutions

Rapid Response

100+ TEM-8 certified annotators onboarded in 3 days
Mobilized 200+ full-time and 1,000+ crowdsource reserves
Response speed far exceeded client expectations

Quality Assurance

Exceptionally high delivery consistency
Data rework rate strictly controlled under 5%
Significantly reduced client's secondary QA effort

100+ full-time English annotators onboarded in 3 days · 1,000+ crowdsource reserve · Rework rate < 5%

Code Refactoring · RLHF

A Leading AI Research Lab

The client needed human expert review to score AI-generated code refactoring proposals on preference dimensions, improving model code readability and refactoring quality.

Solutions

Task Design

Assembled expert teams with programming backgrounds
Designed multi-dimensional preference evaluation framework
Established quantitative code readability scoring criteria

RL Loop

Experts performed pairwise preference comparison and scoring
Continuous iterative training to optimize the Reward Model
Significant improvement in code readability scores

Code readability score improved by 23% · Reward Model convergence rounds reduced by 40%

Hallucination Detection

A Leading AI Research Lab

Addressing common hallucination issues in large language models through multi-dimensional cross-validation between generated content and reference materials for hallucination identification and root cause analysis.

Solutions

Hallucination Taxonomy

Fabrication
Date confusion
Numerical misattribution
Factual misattribution
Logical inference errors

Reasoning Chain Data

Labeling hallucinations with reasoning evidence
Labeling contradictions between reference materials
Labeling consistency between real content and references
Producing hard-to-distinguish hallucination cause data

Hallucination detection accuracy 94.7% · 5 hallucination root cause categories covered

Abstract Reasoning · ARC-AGI 2

ARC-AGI 2 Abstract Reasoning Dataset

Producing abstract reasoning evaluation datasets to measure AI systems' general intelligence — one of the benchmarks closest to AGI.

Solutions

Data Design

Designing visual and logical reasoning tasks
Constructing multi-level abstract reasoning problems
Ensuring problems pose genuine challenges for AI

Quality Control

Human expert cross-validation
Ensuring logical consistency and unambiguity
Multi-round iterative selection of high-quality samples

400+ abstract reasoning tasks covered · One of the world's top benchmarks

Extreme Evaluation · HLE

Humanity's Last Exam Dataset

Contributing to the 'Humanity's Last Exam' dataset — questions created by world-class experts, specifically designed to test the upper limits of LLM capabilities.

Solutions

Expert Network

Organizing cross-disciplinary domain experts
Covering high-barrier fields: mathematics, physics, law, and more
Ensuring problems exceed current strongest model capabilities

Data Standards

Standardized answers and scoring rubrics
Multi-round expert review to eliminate disputes
Producing high-quality reasoning process data

30+ disciplines · 100+ expert contributors · Current best models score < 10%

Agent Evaluation

Agent Evaluation & Simulation Environments

Building automated Agent evaluation pipelines for clients, systematically assessing task completion and tool-calling accuracy in simulation environments.

Solutions

Evaluation Framework

Building automated evaluation workflows
Designing multi-dimensional Agent capability metrics
Constructing reproducible simulation test environments

Continuous Iteration

Feeding evaluation results back into model training
Continuously expanding evaluation scenario coverage
Human-AI collaborative calibration of evaluation standards

50+ evaluation scenarios · 3 categories of Agent tool-calling accuracy tracking

Talk to the Right Person

Dedicated contacts for every area — human + AI employees responding together

赵七条

Operations

陆明哲 AI

Product Manager

林锐 AI

Code Review & Refactoring Consultant

程薇 AI

Test Engineer

罗清河 AI

Data Engineer

赵云帆 AI

Backend Engineer

FAQ

How are you different from regular data labeling companies?

We don't just label data — we help clients train their models. Through RLHF preference alignment, chain-of-thought labeling, and RL loops, we directly participate in model training iteration, not just data production.

What does the RLHF data labeling process look like?

Our expert teams perform pairwise comparisons and multi-dimensional scoring of model outputs, generating preference data to train Reward Models. Through continuous iteration, we progressively optimize model performance.

What languages and domains do you support?

We support multilingual labeling including Chinese, English, Japanese, Korean, and more, covering 40+ vertical domains such as code, math, law, medicine, and finance. Our AntGather Community includes 10,000+ labeling experts with professional backgrounds.

How do you ensure data quality?

Multi-layer quality control: expert cross-validation, consistency checks, automated anomaly detection, and continuous iterative training. All data undergoes at least two rounds of human review.

What types of projects have you delivered?

Code refactoring RLHF, hallucination detection, HLE extreme evaluation, ARC-AGI abstract reasoning, Agent evaluation, 3D skeleton annotation, and more. We have real delivery cases from basic labeling to frontier evaluation.

What is the typical data delivery timeline?

Pilot validation takes 3-5 days. Scale production depends on data volume. We have onboarded 100+ full-time annotators for clients within 3 days — rapid response is one of our core strengths.

How is your expert team assembled?

Our AntGather Community has 10,000+ judgment nodes covering 40+ professional domains. 85% hold bachelor's degrees or higher, with an average age of 29. We match experts based on task requirements, typically completing task matching within 3 days.

Can I purchase only part of your services?

Yes. Our four-tier judgment services can be purchased individually or combined. From basic data production to complete RLHF loops, we configure flexibly based on your needs.

Are your open-source tools free to use?

Yes. All 8 open-source projects and 130 MCP endpoints are fully open source, supporting both CLI and MCP modes. You can integrate them directly into Claude, VS Code, or custom Agents.

How do I get started?

Contact us to book a demo. We respond within 1 business day. After understanding your needs, we provide a solution design within 2-3 days, then move to pilot validation.

What is Knowlyr?

Knowlyr is an AI data infrastructure company headquartered in Shanghai, founded in 2025. We provide RLHF training data, expert evaluation, and human feedback services for frontier AI models. Knowlyr operates an expert network of 10,000+ professionals across 40+ domains and offers 8 open-source tools with 130 MCP endpoints.

How is Knowlyr different from Scale AI or Surge AI?

While Scale AI and Surge AI focus primarily on data labeling at scale, Knowlyr specializes in human judgment infrastructure — the harder problems that require deep domain expertise. We provide end-to-end RLHF training loops, independent third-party AI evaluation, and a fully open-source MCP-native toolchain. The core difference: we participate in model training iteration, not just data production.

What is RLHF and how does Knowlyr support it?

RLHF (Reinforcement Learning from Human Feedback) trains AI models using human preference data. Knowlyr provides the complete RLHF loop: expert teams perform pairwise comparisons and multi-dimensional scoring of model outputs, generating preference data to train Reward Models. This iterative process aligns model behavior with human values. We cover code, math, reasoning, and alignment scenarios.

Reinforcement LearningLoop Solutions

Core Products

Training Loop

Evaluation & Expert

Engagement Process

Customer Cases

Talk to the Right Person

FAQ

Reinforcement Learning
Loop Solutions