Statistical forensics-based multi-method distillation detection framework — extracts model fingerprints through behavioral probing, determines distillation relationships via hypothesis testing, combining Pearson correlation, Jensen-Shannon divergence, and CKA similarity.

Statistical Forensics Behavioral Probing Hypothesis Testing

Quick Start

Install

pip install knowlyr-modelaudit

Usage

from modelaudit import AuditEngine

engine = AuditEngine()
results = engine.detect(["Hello! I'd be happy to help..."])

MCP Tools

8 callable endpoints

+

detect_text_source Detect text data source — determine which LLM likely generated the text

verify_model Verify model identity — check if an API serves the claimed model

compare_models Compare fingerprint similarity of two models to detect distillation/derivation

compare_models_whitebox Whitebox comparison of two local models — uses REEF CKA method to compare hidden state similarity (requires model weights)

audit_memorization Detect if a model has memorized training data — evaluate via prefix completion and token-level checking

audit_report Generate a complete model audit report — aggregate results from all audit tools

audit_watermark Detect AI watermarks in text (statistical features and pattern matching)

audit_distillation Full distillation audit — comprehensive fingerprint comparison + style analysis, generates detailed audit report

Documentation

ModelAudit

Name: ModelAudit
Author: Knowlyr

LLM Distillation Detection and Model Fingerprinting
via Statistical Forensics

Detect unauthorized model distillation through behavioral probing,
stylistic fingerprinting, and representation similarity analysis.

Statistical Forensics · Behavioral Signatures · Cross-Model Lineage Inference

The Problem

Large language model distillation has become a core threat to model IP protection. Student models can replicate teacher model capabilities by mimicking output distributions -- without authorization. Existing detection methods either require white-box weight access (often unavailable) or only analyze surface-level text features (easily evaded).

The Solution

ModelAudit is a multi-method distillation detection framework based on statistical forensics. It extracts model fingerprints through behavioral probing, applies hypothesis testing to determine distillation relationships, and combines four complementary methods to form a complete black-box to white-box audit chain.

Four Complementary Detection Methods

Method	Type	Mechanism
LLMmap	Black-box	20 behavioral probes, Pearson correlation on response patterns
DLI	Black-box	Behavioral signatures + Jensen-Shannon divergence lineage inference
REEF	White-box	CKA layer-wise hidden state similarity
StyleAnalysis	Stylistic	12 model family style signatures + language detection

10-Dimensional Behavioral Probing

Go beyond simple text statistics. ModelAudit probes 10 cognitive dimensions -- self-awareness, safety boundaries, injection testing, reasoning, creative writing, multilingual, format control, role-playing, code generation, and summarization -- capturing deep behavioral differences that persist even after RLHF alignment.

Cross-Provider Audit Chain

Audit across providers seamlessly. Teacher and student models can come from different APIs:

knowlyr-modelaudit audit \
  --teacher claude-opus --teacher-provider anthropic \
  --student kimi-k2.5 --student-provider openai \
  --student-api-base https://api.moonshot.cn/v1 \
  -o report.md

Get Started

pip install knowlyr-modelaudit

# Detect text source
knowlyr-modelaudit detect texts.jsonl

# Verify model identity
knowlyr-modelaudit verify gpt-4o --provider openai

# Full distillation audit
knowlyr-modelaudit audit --teacher gpt-4o --student my-model -o report.md

from modelaudit import AuditEngine

engine = AuditEngine()
audit = engine.audit("claude-opus", "suspect-model")
print(f"{audit.verdict} (confidence: {audit.confidence:.3f})")

MCP Integration

ModelAudit ships with 8 MCP tools for seamless integration into AI workflows:

detect_text_source · verify_model · compare_models · compare_models_whitebox · audit_distillation · audit_memorization · audit_report · audit_watermark

Built-in Benchmark

100% detection accuracy across 6 model families (14 samples). Supports 12 model families: GPT-4 · GPT-3.5 · Claude · LLaMA · Gemini · Qwen · DeepSeek · Mistral · Yi · Phi · Cohere · ChatGLM.

GitHub · PyPI

_{knowlyr — LLM distillation detection and model fingerprinting via statistical forensics}

Want to discuss this project? Reach out to