Lightweight collaborative labeling framework — supports IAA metrics including Cohen's/Fleiss' Kappa and Krippendorff's Alpha for quantifying labeling consistency. Built-in conflict detection and multi-strategy fusion, with zero-deployment labeling via HTML annotation interface.

IAA Quantification Conflict Fusion Zero-Deployment Labeling

Quick Start

Install

pip install knowlyr-datalabel

Usage

# CLI: 从 schema 生成标注界面
knowlyr-datalabel create schema.json tasks.json -o annotator.html

MCP Tools

12 callable endpoints

+

generate_annotator Generate HTML annotation interface from DataRecipe analysis results

create_annotator Create HTML annotation interface from Schema and task data

merge_annotations Merge annotation results from multiple annotators

calculate_iaa Calculate Inter-Annotator Agreement (IAA)

validate_schema Validate DataLabel Schema and task data format correctness

export_results Export annotation results in JSON/JSONL/CSV format

import_tasks Import task data from JSON/JSONL/CSV and convert to DataLabel format

generate_dashboard Generate annotation progress dashboard HTML from annotation result files

llm_prelabel Use LLM to auto-prelabel task data

llm_quality_analysis Use LLM to analyze annotation quality, detect suspicious annotations and disagreements

llm_gen_guidelines Use LLM to auto-generate annotation guidelines from Schema and examples

adjudicate Adjudicate annotation conflicts — arbitrate disagreements and output final labels

Documentation

English | 中文

DataLabel

Name: DataLabel
Author: Knowlyr

Serverless Human-in-the-Loop Annotation Framework
with LLM Pre-Labeling and Inter-Annotator Agreement

Generate self-contained HTML files for offline annotation. No server, no network, no deployment.

GitHub · PyPI · knowlyr.com

The Problem

Annotation tools today force a painful choice: heavyweight platforms (Label Studio, Prodigy) that require servers and databases, or throwaway scripts with zero quality guarantees. Neither provides statistical agreement metrics or LLM-assisted acceleration out of the box.

DataLabel takes a different approach: generate a single HTML file, send it to annotators, get results back. No server. No Docker. No network required.

What You Get

Serverless HTML Annotation -- self-contained files with all styles, logic, and data baked in. Works offline, supports dark mode and keyboard shortcuts
LLM Pre-Labeling -- Kimi / OpenAI / Anthropic generate initial labels so annotators start from calibration, not from scratch
Inter-Annotator Agreement -- Cohen's kappa, Fleiss' kappa, Krippendorff's alpha with pairwise agreement matrices and disagreement reports
Multi-Strategy Merging -- majority vote, average, or strict consensus with automatic conflict flagging
5 Annotation Types -- scoring, single choice, multi choice, free text, and ranking (with Borda count merging)
Visual Dashboard -- standalone HTML report with progress tracking, distribution charts, and agreement heatmaps

Quick Start

pip install knowlyr-datalabel

# Create annotation interface
knowlyr-datalabel create schema.json tasks.json -o annotator.html

# Optional: LLM pre-labeling
knowlyr-datalabel prelabel schema.json tasks.json -o pre.json -p moonshot

# Merge results + compute agreement
knowlyr-datalabel merge ann1.json ann2.json ann3.json -o merged.json

# Generate analytics dashboard
knowlyr-datalabel dashboard ann1.json ann2.json -o dashboard.html

from datalabel import AnnotatorGenerator, ResultMerger

gen = AnnotatorGenerator()
gen.generate(schema=schema, tasks=tasks, output_path="annotator.html")

merger = ResultMerger()
result = merger.merge(["ann1.json", "ann2.json"], strategy="majority")
print(f"Agreement: {result.agreement_rate:.1%}")

Annotation Pipeline

graph LR
    S["Schema"] --> P["LLM Pre-Label"]
    P --> G["HTML Generator"]
    G --> B["Browser Annotation"]
    B --> R["Results"]
    R --> M["Merge + IAA"]
    M --> D["Dashboard"]

    style G fill:#0969da,color:#fff,stroke:#0969da
    style M fill:#8b5cf6,color:#fff,stroke:#8b5cf6
    style D fill:#2da44e,color:#fff,stroke:#2da44e
    style S fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style P fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style B fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style R fill:#1a1a2e,color:#e0e0e0,stroke:#444

MCP Integration

12 MCP tools, 6 resources, and 3 prompt templates for seamless AI IDE integration -- create annotations, merge results, compute IAA, and generate dashboards directly from your editor.

{
  "mcpServers": {
    "knowlyr-datalabel": {
      "command": "uv",
      "args": ["--directory", "/path/to/data-label", "run", "python", "-m", "datalabel.mcp_server"]
    }
  }
}

Ecosystem

DataLabel is part of the knowlyr data infrastructure:

Layer	Project	Role
Discovery	AI Dataset Radar	Dataset intelligence and trend analysis
Analysis	DataRecipe	Reverse analysis, schema extraction, cost estimation
Production	DataSynth / DataLabel	LLM batch synthesis / serverless annotation
Quality	DataCheck	Rule validation, anomaly detection, auto-fix
Audit	ModelAudit	Distillation detection, model fingerprinting