Reinforcement Learning
Loop Solutions
More than just labeling data — we help clients train their models
- General data labeling (text / image / video / audio)
- Multilingual and cross-cultural content
- Domain knowledge data production
- Domain data cleaning and structuring
- RLHF — Reinforcement Learning from Human Feedback
- Complex reasoning data production and labeling
- Preference alignment and iterative optimization
- Code / math / logic reasoning scenarios
- Hallucination detection and correction
- Agent benchmark evaluation and simulation environments
- Crowdsourced evaluation
- Vertical industry expert review
- Model capability comparative analysis
Core Products
From training loops to authoritative evaluation — covering the entire AI data pipeline
Training Loop
Pairwise comparison, multi-dimensional scoring, continuous iteration — capturing the subtlest differences in human preferences to train Reward Models to distinguish 'good' from 'better'.
Expert teams with programming, math, and logic backgrounds producing code refactoring reviews, mathematical proof chains, and multi-step inference data — verifiable chains of thought, not just correct answers.
Multi-dimensional hallucination taxonomy — fabrication, date confusion, numerical misattribution, logical inference errors — with root cause analysis and evidence chains for every hallucination.
Evaluation & Expert
From ARC-AGI 2 to Humanity's Last Exam, producing high-difficulty evaluation datasets for leading institutions. Agent evaluation, crowdsourced evaluation, expert review.
A cross-disciplinary expert network covering high-barrier domains such as medicine, law, finance, math, and physics — real practitioners, not generalist annotators.
Multimodal labeling, multilingual localization, novel task types — we define the problem, design the workflow, and deliver results together with you.
Engagement Process
From requirements to scale delivery — every step is measurable
Deep understanding of business scenarios and model objectives, clarifying data types, quality standards, and delivery timelines.
Defining labeling specifications, quality metrics, and acceptance criteria; designing task workflows and expert team configuration.
Small-batch pilot labeling to align on standards and quality expectations. Once confirmed, scale production begins.
Expert teams working in parallel with multi-layer QA and real-time monitoring, ensuring delivery speed and data consistency.
Continuously optimizing data strategy based on model training feedback, forming a data → training → evaluation closed loop.
Customer Cases
From leading tech giants to world-class AI research institutions — we let our delivery speak
The client was rapidly iterating complex Agent applications (data analysis, writing, presentations), benchmarking against top products, and needed an evaluation-grade data team capable of fine-grained, logic-heavy assessment.
- Deeply involved in defining evaluation standards
- Full coverage from visual layout to deep logical reasoning
- Fine-grained evaluation insights driving product leadership
- Sole data vendor handling all high-priority core evaluations
- Accumulated thousands of high-value alignment records
- Effectively supported new strategy iteration and launch
As the foundation model pushed into uncharted territory, the client urgently needed high-quality data for HLE (Human-Level Evaluation), complex multi-attachment processing, and ICL (In-Context Learning).
- HLE human-level evaluation data construction
- Complex multi-attachment scenario processing
- ICL high-difficulty logic chain data orchestration
- Filled gaps in the client's frontier evaluation sets
- Advanced long-context understanding and complex instruction following
- Demonstrated exceptional business understanding
For primate behavioral research, the lab needed high-precision 3D skeleton keypoint annotation of complex videos showing primates grasping objects. Research-grade 3D spatial annotation with extremely low tolerance for error.
- Primate 3D motion skeleton keypoint annotation
- High-precision 3D spatial annotation
- Strict adherence to research-grade precision standards
- Sole designated vendor for a national-level research project
- Consistently delivered high-precision data
- Directly contributed to major research publications
Facing a sudden surge in overseas business volume, the client needed to rapidly assemble a large-scale professional English annotation team with stringent language requirements (TEM-4/TEM-8 certified).
- 100+ TEM-8 certified annotators onboarded in 3 days
- Mobilized 200+ full-time and 1,000+ crowdsource reserves
- Response speed far exceeded client expectations
- Exceptionally high delivery consistency
- Data rework rate strictly controlled under 5%
- Significantly reduced client's secondary QA effort
The client needed human expert review to score AI-generated code refactoring proposals on preference dimensions, improving model code readability and refactoring quality.
- Assembled expert teams with programming backgrounds
- Designed multi-dimensional preference evaluation framework
- Established quantitative code readability scoring criteria
- Experts performed pairwise preference comparison and scoring
- Continuous iterative training to optimize the Reward Model
- Significant improvement in code readability scores
Addressing common hallucination issues in large language models through multi-dimensional cross-validation between generated content and reference materials for hallucination identification and root cause analysis.
- Fabrication
- Date confusion
- Numerical misattribution
- Factual misattribution
- Logical inference errors
- Labeling hallucinations with reasoning evidence
- Labeling contradictions between reference materials
- Labeling consistency between real content and references
- Producing hard-to-distinguish hallucination cause data
Producing abstract reasoning evaluation datasets to measure AI systems' general intelligence — one of the benchmarks closest to AGI.
- Designing visual and logical reasoning tasks
- Constructing multi-level abstract reasoning problems
- Ensuring problems pose genuine challenges for AI
- Human expert cross-validation
- Ensuring logical consistency and unambiguity
- Multi-round iterative selection of high-quality samples
Contributing to the 'Humanity's Last Exam' dataset — questions created by world-class experts, specifically designed to test the upper limits of LLM capabilities.
- Organizing cross-disciplinary domain experts
- Covering high-barrier fields: mathematics, physics, law, and more
- Ensuring problems exceed current strongest model capabilities
- Standardized answers and scoring rubrics
- Multi-round expert review to eliminate disputes
- Producing high-quality reasoning process data
Building automated Agent evaluation pipelines for clients, systematically assessing task completion and tool-calling accuracy in simulation environments.
- Building automated evaluation workflows
- Designing multi-dimensional Agent capability metrics
- Constructing reproducible simulation test environments
- Feeding evaluation results back into model training
- Continuously expanding evaluation scenario coverage
- Human-AI collaborative calibration of evaluation standards
Talk to the Right Person
Dedicated contacts for every area — human + AI employees responding together
FAQ
How are you different from regular data labeling companies?
We don't just label data — we help clients train their models. Through RLHF preference alignment, chain-of-thought labeling, and RL loops, we directly participate in model training iteration, not just data production.
What does the RLHF data labeling process look like?
Our expert teams perform pairwise comparisons and multi-dimensional scoring of model outputs, generating preference data to train Reward Models. Through continuous iteration, we progressively optimize model performance.
What languages and domains do you support?
We support multilingual labeling including Chinese, English, Japanese, Korean, and more, covering 40+ vertical domains such as code, math, law, medicine, and finance. Our AntGather Community includes 10,000+ labeling experts with professional backgrounds.
How do you ensure data quality?
Multi-layer quality control: expert cross-validation, consistency checks, automated anomaly detection, and continuous iterative training. All data undergoes at least two rounds of human review.
What is Knowlyr?
Knowlyr (集识光年) is an AI data infrastructure company headquartered in Shanghai, founded in 2025. It provides RLHF training data, expert evaluation, and human feedback services for frontier AI models. Knowlyr operates an expert network of 10,000+ professionals across 40+ domains and offers 8 open-source tools with 110 MCP endpoints.
How is Knowlyr different from Scale AI or Surge AI?
While Scale AI and Surge AI focus primarily on data labeling at scale, Knowlyr specializes in human judgment infrastructure — the harder problems that require deep domain expertise. Knowlyr provides end-to-end RLHF training loops (not just annotation), independent third-party AI evaluation, and a fully open-source MCP-native toolchain. The core difference: Knowlyr participates in model training iteration, not just data production.
What is RLHF and how does Knowlyr support it?
RLHF (Reinforcement Learning from Human Feedback) is a technique for training AI models using human preference data. Knowlyr provides the complete RLHF loop: expert teams perform pairwise comparisons and multi-dimensional scoring of model outputs, generating preference data to train Reward Models. This iterative process progressively aligns model behavior with human values. Knowlyr covers code, math, reasoning, and alignment scenarios.




