Open Source Python MIT

DataLabel

Data Labeling

★ 0 ⑂ 0 Updated 2026-02-25
Lightweight collaborative labeling framework — supports IAA metrics including Cohen's/Fleiss' Kappa and Krippendorff's Alpha for quantifying labeling consistency. Built-in conflict detection and multi-strategy fusion, with zero-deployment labeling via HTML annotation interface.
IAA Quantification Conflict Fusion Zero-Deployment Labeling

Quick Start

Install
pip install knowlyr-datalabel
Usage
# CLI: 从 schema 生成标注界面
knowlyr-datalabel create schema.json tasks.json -o annotator.html
generate_annotator 从 DataRecipe 分析结果生成 HTML 标注界面
create_annotator 从 Schema 和任务数据创建 HTML 标注界面
merge_annotations 合并多个标注员的标注结果
calculate_iaa 计算标注员间一致性 (Inter-Annotator Agreement)
validate_schema 验证 DataLabel Schema 和任务数据的格式正确性
export_results 将标注结果导出为 JSON/JSONL/CSV 格式
import_tasks 从 JSON/JSONL/CSV 导入任务数据并转换为 DataLabel 格式
generate_dashboard 从标注结果文件生成标注进度仪表盘 HTML
llm_prelabel 使用 LLM 自动预标注任务数据
llm_quality_analysis 使用 LLM 分析标注质量,检测可疑标注和分歧
llm_gen_guidelines 使用 LLM 根据 Schema 和样例自动生成标注指南
adjudicate 裁决标注冲突 — 对有分歧的标注结果进行仲裁,输出最终标签

Documentation

English | 中文

DataLabel

Serverless Human-in-the-Loop Annotation Framework
with LLM Pre-Labeling and Inter-Annotator Agreement

Generate self-contained HTML files for offline annotation. No server, no network, no deployment.

GitHub · PyPI · knowlyr.com

The Problem

Annotation tools today force a painful choice: heavyweight platforms (Label Studio, Prodigy) that require servers and databases, or throwaway scripts with zero quality guarantees. Neither provides statistical agreement metrics or LLM-assisted acceleration out of the box.

DataLabel takes a different approach: generate a single HTML file, send it to annotators, get results back. No server. No Docker. No network required.

What You Get

  • Serverless HTML Annotation -- self-contained files with all styles, logic, and data baked in. Works offline, supports dark mode and keyboard shortcuts
  • LLM Pre-Labeling -- Kimi / OpenAI / Anthropic generate initial labels so annotators start from calibration, not from scratch
  • Inter-Annotator Agreement -- Cohen's kappa, Fleiss' kappa, Krippendorff's alpha with pairwise agreement matrices and disagreement reports
  • Multi-Strategy Merging -- majority vote, average, or strict consensus with automatic conflict flagging
  • 5 Annotation Types -- scoring, single choice, multi choice, free text, and ranking (with Borda count merging)
  • Visual Dashboard -- standalone HTML report with progress tracking, distribution charts, and agreement heatmaps

Quick Start

pip install knowlyr-datalabel

# Create annotation interface
knowlyr-datalabel create schema.json tasks.json -o annotator.html

# Optional: LLM pre-labeling
knowlyr-datalabel prelabel schema.json tasks.json -o pre.json -p moonshot

# Merge results + compute agreement
knowlyr-datalabel merge ann1.json ann2.json ann3.json -o merged.json

# Generate analytics dashboard
knowlyr-datalabel dashboard ann1.json ann2.json -o dashboard.html
from datalabel import AnnotatorGenerator, ResultMerger

gen = AnnotatorGenerator()
gen.generate(schema=schema, tasks=tasks, output_path="annotator.html")

merger = ResultMerger()
result = merger.merge(["ann1.json", "ann2.json"], strategy="majority")
print(f"Agreement: {result.agreement_rate:.1%}")

Annotation Pipeline

graph LR
    S["Schema"] --> P["LLM Pre-Label"]
    P --> G["HTML Generator"]
    G --> B["Browser Annotation"]
    B --> R["Results"]
    R --> M["Merge + IAA"]
    M --> D["Dashboard"]

    style G fill:#0969da,color:#fff,stroke:#0969da
    style M fill:#8b5cf6,color:#fff,stroke:#8b5cf6
    style D fill:#2da44e,color:#fff,stroke:#2da44e
    style S fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style P fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style B fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style R fill:#1a1a2e,color:#e0e0e0,stroke:#444

MCP Integration

12 MCP tools, 6 resources, and 3 prompt templates for seamless AI IDE integration -- create annotations, merge results, compute IAA, and generate dashboards directly from your editor.

{
  "mcpServers": {
    "knowlyr-datalabel": {
      "command": "uv",
      "args": ["--directory", "/path/to/data-label", "run", "python", "-m", "datalabel.mcp_server"]
    }
  }
}

Ecosystem

DataLabel is part of the knowlyr data infrastructure:

Layer Project Role
Discovery AI Dataset Radar Dataset intelligence and trend analysis
Analysis DataRecipe Reverse analysis, schema extraction, cost estimation
Production DataSynth / DataLabel LLM batch synthesis / serverless annotation
Quality DataCheck Rule validation, anomaly detection, auto-fix
Audit ModelAudit Distillation detection, model fingerprinting

GitHub · PyPI · knowlyr.com

knowlyr -- serverless annotation with LLM pre-labeling and inter-annotator agreement

Want to discuss this project? Reach out to

Kai" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
Kai Founder & CEO
程薇" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
程薇 AI 测试工程师