Quick Start
pip install knowlyr-modelaudit
from modelaudit import AuditEngine
engine = AuditEngine()
results = engine.detect(["Hello! I'd be happy to help..."])
detect_text_source
检测文本数据来源 — 判断文本可能由哪个 LLM 生成
verify_model
验证模型身份 — 检查 API 背后是不是声称的模型
compare_models
比对两个模型的指纹相似度,判断是否存在蒸馏/派生关系
compare_models_whitebox
白盒比对两个本地模型 — 使用 REEF CKA 方法比较模型隐藏状态相似度(需要模型权重)
audit_memorization
检测模型是否记忆了训练数据 — 通过前缀补全和逐字检查评估记忆程度
audit_report
生成完整的模型审计报告 — 汇总所有审计工具的结果
audit_watermark
检测文本中是否包含 AI 水印(统计特征和模式匹配)
audit_distillation
完整蒸馏审计 — 综合指纹比对 + 风格分析,生成详细审计报告。
Documentation
ModelAudit
LLM Distillation Detection and Model Fingerprinting
via Statistical Forensics
Detect unauthorized model distillation through behavioral probing,
stylistic fingerprinting, and representation similarity analysis.
Statistical Forensics · Behavioral Signatures · Cross-Model Lineage Inference
The Problem
Large language model distillation has become a core threat to model IP protection. Student models can replicate teacher model capabilities by mimicking output distributions -- without authorization. Existing detection methods either require white-box weight access (often unavailable) or only analyze surface-level text features (easily evaded).
The Solution
ModelAudit is a multi-method distillation detection framework based on statistical forensics. It extracts model fingerprints through behavioral probing, applies hypothesis testing to determine distillation relationships, and combines four complementary methods to form a complete black-box to white-box audit chain.
Four Complementary Detection Methods
| Method | Type | Mechanism |
|---|---|---|
| LLMmap | Black-box | 20 behavioral probes, Pearson correlation on response patterns |
| DLI | Black-box | Behavioral signatures + Jensen-Shannon divergence lineage inference |
| REEF | White-box | CKA layer-wise hidden state similarity |
| StyleAnalysis | Stylistic | 12 model family style signatures + language detection |
10-Dimensional Behavioral Probing
Go beyond simple text statistics. ModelAudit probes 10 cognitive dimensions -- self-awareness, safety boundaries, injection testing, reasoning, creative writing, multilingual, format control, role-playing, code generation, and summarization -- capturing deep behavioral differences that persist even after RLHF alignment.
Cross-Provider Audit Chain
Audit across providers seamlessly. Teacher and student models can come from different APIs:
knowlyr-modelaudit audit \
--teacher claude-opus --teacher-provider anthropic \
--student kimi-k2.5 --student-provider openai \
--student-api-base https://api.moonshot.cn/v1 \
-o report.md
Get Started
pip install knowlyr-modelaudit
# Detect text source
knowlyr-modelaudit detect texts.jsonl
# Verify model identity
knowlyr-modelaudit verify gpt-4o --provider openai
# Full distillation audit
knowlyr-modelaudit audit --teacher gpt-4o --student my-model -o report.md
from modelaudit import AuditEngine
engine = AuditEngine()
audit = engine.audit("claude-opus", "suspect-model")
print(f"{audit.verdict} (confidence: {audit.confidence:.3f})")
MCP Integration
ModelAudit ships with 8 MCP tools for seamless integration into AI workflows:
detect_text_source · verify_model · compare_models · compare_models_whitebox · audit_distillation · audit_memorization · audit_report · audit_watermark
Built-in Benchmark
100% detection accuracy across 6 model families (14 samples). Supports 12 model families: GPT-4 · GPT-3.5 · Claude · LLaMA · Gemini · Qwen · DeepSeek · Mistral · Yi · Phi · Cohere · ChatGLM.
Want to discuss this project? Reach out to