Open Source Python MIT

Knowlyr Crew

Structured Dialectical Deliberation Engine

★ 2 ⑂ 0 Updated 2026-03-01
Structured dialectical deliberation engine — declarative employee definitions, MCP-native protocol, continuous experience accumulation. Supports 9 structured interaction modes, Ebbinghaus exponential decay memory, RLHF-style evaluation loops, and cost-aware orchestration.
Dialectical Deliberation Persistent Memory Evaluation Loop

Quick Start

Install
pip install knowlyr-crew[mcp]
Usage
# CLI
knowlyr-crew list
knowlyr-crew run code-reviewer main
list_employees 列出所有可用的数字员工
get_employee 获取数字员工的完整定义
run_employee 加载数字员工并生成可执行的 prompt
get_work_log 查看数字员工的工作日志
detect_project 检测当前项目类型、框架、包管理器等信息
list_pipelines 列出所有可用的流水线
run_pipeline 执行流水线 — 支持 prompt-only 模式和 execute 模式(自动调用 LLM 串联执行)
list_discussions 列出所有可用的讨论会
run_discussion 生成讨论会 prompt — 支持预定义 YAML 或即席讨论(employees+topic)
add_memory 为员工添加一条持久化记忆
query_memory 查询员工的持久化记忆
track_decision 记录一个待评估的决策(来自会议或日常工作)
evaluate_decision 评估一个决策 — 记录实际结果并将经验写入员工记忆
list_meeting_history 查看讨论会历史记录
get_meeting_detail 获取某次讨论会的完整记录
list_tool_schemas 列出所有可用的工具定义(名称和描述)
get_permission_matrix 查看员工权限矩阵 — 每位员工的有效工具集和权限策略
get_audit_log 查询工具调用审计日志
get_tool_metrics 查询 MCP Tool 使用率统计 — 调用次数、成功/失败、平均耗时等(支持持久化历史数据)
query_events 查询统一埋点事件 — 支持按 event_type / event_name / 时间范围过滤

Documentation

English | 中文

knowlyr-crew

Structured Dialectical Deliberation Engine
for AI Workforces

Declarative AI workforce engine — structured dialectical deliberation, protocol-native interoperability, evolving through experience

GitHub · Full Documentation · PyPI · knowlyr.com

Abstract

The primary failure modes of multi-agent collaboration systems are threefold: groupthink (Janis, 1972), shared information bias (Stasser & Titus, 1985), and framework lock-in. knowlyr-crew proposes a declarative multi-agent deliberation framework that breaks information sampling bias through structured dialectical protocols, achieves cognitive accumulation and natural attrition through exponentially decaying persistent memory (inspired by the Ebbinghaus forgetting curve), and eliminates toolchain coupling through protocol-native MCP integration.

The system implements a self-correcting closed loop of "define -> deliberate -> decide -> evaluate -> update memory", feeding human feedback directly into agents' persistent memory -- functionally isomorphic to the core mechanism of RLHF (Christiano et al., 2017): human evaluation outcomes shape subsequent inference behavior.

knowlyr-crew formalizes AI workforce capabilities as declarative specifications (YAML + Markdown), implements structured dialectical deliberation with 9 interaction modes and devil's advocacy constraints, and provides persistent semantic memory with exponential confidence decay. The system exposes 20 MCP tools across 3 transport protocols, routes across 7 LLM providers, and maintains a complete evaluation-to-memory feedback loop.

Problem Statement

The failure mechanisms of multi-agent collaboration have a solid empirical foundation in cognitive psychology and organizational decision-making research:

Root Problem Research Basis Limitations of Existing Frameworks Crew's Approach
Groupthink Groups under pressure converge toward consensus and suppress dissent (Janis, 1972); even incorrect minority opinions improve majority decision quality (Nemeth, 1994) CrewAI / AutoGen lack mandatory dissent mechanisms -- agents "supplement" rather than "challenge" each other Structured dialectical deliberation: 9 interaction modes + disagreement quota $\rho_{max}$ + tension seed injection
Shared Information Bias In group discussions, commonly known information is exchanged at significantly higher rates than individually held information (Stasser & Titus, 1985); task-focused cognitive conflict improves decision quality (Amason, 1996) Unstructured multi-agent conversations reinforce known information, drowning out individual perspectives Role-based participants + focus constraints + must_challenge forcing cross-perspective exchange
Stateless Inference Each session starts from scratch; the same cognitive errors recur $\forall t: s_t \perp s_{t-1}$ LangChain memory is a sliding-window buffer, not semantically structured persistent storage Exponentially decaying persistent memory + evaluation loop: decision -> execution -> retrospective -> correction -> evolution
Framework Lock-in Agent definitions are bound to specific SDKs/IDEs; migration cost $\propto$ definition complexity Each framework uses its own incompatible format -- switching IDEs renders definitions useless Protocol-native MCP: declarative YAML/Markdown, zero-modification cross-IDE portability

Crew is not yet another orchestration framework. It is the capability definition layer and experience accumulation layer for AI digital employees -- "who does what, how they deliberate, and what they've learned" -- while delegating identity management and runtime interactions to knowlyr-id.

Formal Framework

Employee Specification

Each AI employee is a declarative specification $e \in \mathcal{E}$, decoupled from code, version-trackable, and IDE-agnostic:

$$e = \langle \text{name}, \text{model}, \text{tools}, \text{prompt}, \text{args}, \text{output} \rangle$$

Where:

  • $\text{model} \in \mathcal{M}$ = {claude-*, gpt-*, deepseek-*, kimi-*, gemini-*, glm-*, qwen-*} -- unified routing across 7 providers
  • $\text{tools} \subseteq \mathcal{T}$ -- available tool set, constrained by PermissionPolicy
  • $\text{prompt}: \Sigma^* \to \Sigma^*$ -- Markdown template function supporting variable substitution and context injection

Structured Dialectical Deliberation

The deliberation process is formalized as a 4-tuple $D = \langle P, R, \Phi, \Psi \rangle$:

Symbol Definition Description
$P = {p_1, \ldots, p_n}$ Participant set $p_i = (\text{employee}, \text{role}, \text{stance}, \text{focus})$
$R = [r_1, \ldots, r_k]$ Round sequence $r_j \in$ {round-robin, cross-examine, steelman-then-attack, debate, vote, ...}
$\Phi$ Disagreement constraint function $\text{must_challenge}(p_i) \subseteq P \setminus {p_i}$; $\text{max_agree_ratio}(p_i) \in [0, 1]$
$\Psi$ Tension seed set Pre-seeded points of contention, forcing diversity in the issue space

Key constraint: When $\Phi$ defines $\text{max_agree_ratio}(p_i) = \rho$, participant $p_i$ may not agree with others' views more than proportion $\rho$ throughout the entire deliberation, forcing cognitive conflict rather than groupthink. This corresponds to the Devil's Advocacy method in organizational decision-making research (Schwenk, 1990).

Memory Evolution Model

The effective confidence of each memory entry $m$ decays over time, following the exponential model of the Ebbinghaus forgetting curve:

$$C_{\text{eff}}(t) = C_0 \cdot \left(\frac{1}{2}\right)^{t / \tau}$$

Where $C_0$ is the initial confidence (default 1.0), $t$ is memory age (in days), and $\tau$ is the half-life (default 90 days). Entries are ranked by $C_{\text{eff}}$ at retrieval time; those below threshold $C_{\min}$ are automatically pruned.

Semantic retrieval uses a hybrid vector-keyword scoring function:

$$\text{score}(q, m) = \alpha \cdot \cos(\mathbf{v}_q, \mathbf{v}_m) + (1 - \alpha) \cdot \text{keyword}(q, m), \quad \alpha = 0.7$$

Correction chains implement cognitive self-correction, a computational model of memory reconsolidation: $\text{correct}(m_{\text{old}}, m_{\text{new}})$ marks $m_{\text{old}}$ as superseded ($C \leftarrow 0$) and creates a new correction-type entry ($C \leftarrow 1.0$).

Evaluation Feedback Loop

Drawing on the core mechanism of RLHF -- human feedback directly shaping agent behavior (Christiano et al., 2017):

track(employee, category, prediction) -> Decision d
    |
    v  Execute + observe actual outcome
evaluate(d, outcome, evaluation) -> MemoryEntry m_correction
    |
    v  m_correction is automatically injected into the employee's subsequent inference context
employee.next_inference(context ∪ {m_correction})

Three decision categories: estimate / recommendation / commitment. Evaluation conclusions are automatically written as correction entries into persistent memory, forming a closed loop of decision -> execution -> retrospective -> improvement.

Architecture

graph LR
    E["Employee Spec<br/>(YAML + Markdown)"] -->|Prompts| S["MCP Server<br/>stdio / SSE / HTTP"]
    E -->|Resources| S
    E -->|Tools| S
    S -->|stdio| IDE["AI IDE<br/>(Claude / Cursor)"]
    S -->|SSE / HTTP| Remote["Remote Client<br/>Webhook / API"]
    IDE -->|agent-id| ID["knowlyr-id<br/>Identity Runtime"]
    ID -->|GET prompt| E
    E -->|sync push| ID

    style E fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style S fill:#0969da,color:#fff,stroke:#0969da
    style IDE fill:#2da44e,color:#fff,stroke:#2da44e
    style Remote fill:#8b5cf6,color:#fff,stroke:#8b5cf6
    style ID fill:#e5534b,color:#fff,stroke:#e5534b

Layered Architecture

Layer Module Responsibility
Specification Parser · Discovery · Models Declarative employee definition parsing; YAML/Markdown dual format; priority-based discovery
Protocol MCP Server · Skill Converter 20 Tools + Prompts + Resources; stdio/SSE/HTTP triple-protocol support
Deliberation Discussion Engine 9 structured interaction modes; cognitive conflict constraints; topologically sorted execution plans
Orchestration Pipeline · Route · Task Registry Parallel/sequential/conditional/loop orchestration; checkpoint recovery; multi-model routing
Memory Memory Store · Semantic Index Semantic search; exponential decay; importance ranking; access tracking; cross-employee pattern sharing; multi-backend embedding fallback
Evaluation Evaluation Engine Decision tracking; retrospective evaluation; automatic memory correction
Execution Providers · Output Sanitizer · Cost Tracker Unified invocation across 7 providers; retry/fallback/per-task cost metering; dual-layer output sanitization (source + egress)
Integration ID Client · Webhook · Cron Identity federation (circuit breaker); GitHub event routing; scheduled tasks (patrol/retrospective/KPI/knowledge digest); trigger-based auto-delegation
Observability Trajectory · Metrics · Audit Zero-intrusion trajectory recording (contextvars); permission matrix queries; tool invocation audit logs; post-deployment CI audit; Feishu alerting on audit failure
CLI cli/ modular package (8 submodules) employee · pipeline · route · discuss · memory · server · ops; lazy command registration

MCP Primitive Mapping

MCP Primitive Purpose Count
Prompts Each employee = one callable prompt template with typed parameters 1 per employee
Resources Raw Markdown definitions, directly readable by AI IDEs 1 per employee
Tools Employee/discussion/pipeline/memory/evaluation/permission/audit/metrics/project detection, etc. 20
Full list of 20 MCP Tools
Tool Description
list_employees List all employees (filterable by tag)
get_employee Get complete employee definition
run_employee Generate an executable prompt
get_work_log View employee work logs
detect_project Detect project type, framework, and package manager
list_pipelines List all pipelines
run_pipeline Execute a pipeline
list_discussions List all discussion meetings
run_discussion Generate a discussion meeting prompt
add_memory Add persistent memory for an employee (supports pattern type)
query_memory Query an employee's persistent memory
track_decision Record a decision pending evaluation
evaluate_decision Evaluate a decision and write lessons learned to employee memory
list_meeting_history View discussion meeting history
get_meeting_detail Get complete meeting transcript
list_tool_schemas List all available tool definitions (filterable by role)
get_permission_matrix View employee permission matrix and policies
get_audit_log Query tool invocation audit logs
get_tool_metrics Query tool invocation statistics
query_events Query system event stream

Transport Protocols

knowlyr-crew mcp                                # stdio (default, local IDE)
knowlyr-crew mcp -t sse --port 9000             # SSE (remote connection)
knowlyr-crew mcp -t http --port 9001            # Streamable HTTP
knowlyr-crew mcp -t sse --api-token SECRET      # Enable Bearer authentication

Key Innovations

1. Structured Dialectical Deliberation

The central challenge of multi-agent collaboration lies in maintaining epistemic diversity. Stasser & Titus (1985) demonstrated experimentally that in unstructured group discussions, commonly shared information is discussed at significantly higher rates than individually held information, causing optimal decisions to be systematically overlooked. Nemeth (1994) further found that even incorrect minority opinions, when persistently expressed, improve majority decision quality -- because they force the majority to more carefully examine their own assumptions.

Crew implements 9 structured interaction modes, each imposing distinct argumentative constraints on participants:

Mode Description Mechanism
round-robin Round-robin speaking Equal expression rights, preventing discourse imbalance
challenge Challenge Each participant must raise evidence-based challenges to at least one other's conclusions
response Response & defense Structured responses; vague evasion prohibited; must explicitly accept/partially accept/rebut
cross-examine Cross-examination Three-dimensional deep examination: factual challenge / logical derivation / alternative proposals
steelman-then-attack Steelman then attack First construct the strongest form of the opposing argument (steel-manning), then attack its residual weaknesses
debate Structured debate Adversarial pro/con format requiring citation of specific facts and data
brainstorm Brainstorm Suspend judgment, maximize creative space
vote Vote Force explicit stance + brief rationale
free Free discussion Open-ended exchange without structural constraints

Dialectical Constraints -- a computational implementation of Schwenk's (1990) Devil's Advocacy methodology:

  • stance -- Pre-assigned position, forcing participants to argue from a specific perspective
  • must_challenge -- Must challenge designated participants, counteracting shared information bias
  • max_agree_ratio -- Disagreement quota $\rho_{max} \in [0, 1]$, quantitatively controlling cognitive conflict density
  • tension_seeds -- Controversy seed injection, ensuring the issue space covers critical tension dimensions
  • min_disagreements -- Minimum number of disagreements per round, quantifying deliberation output

Discussion -> Execution bridging: Setting action_output: true automatically generates a structured ActionPlan JSON, which is converted via pipeline_from_action_plan() into an executable Pipeline through dependency-aware topological sorting.

Discussion YAML example
name: architecture-review
topic: Review $target design
goal: Produce improvement decisions
mode: auto
participants:
  - employee: product-manager
    role: moderator
    focus: 需求完整性
    stance: 偏用户体验
  - employee: code-reviewer
    role: speaker
    focus: 安全性
    must_challenge: [product-manager]
    max_agree_ratio: 0.6
tension_seeds:
  - 安全性 vs 开发效率
rounds:
  - name: 各抒己见
    interaction: round-robin
  - name: 交叉盘问
    interaction: cross-examine
    require_direct_reply: true
    min_disagreements: 2
  - name: Decision
    interaction: vote
output_format: decision
# Pre-defined discussion
knowlyr-crew discuss run architecture-review --arg target=auth.py

# Ad-hoc discussion (no YAML required)
knowlyr-crew discuss adhoc -e "code-reviewer,test-engineer" -t "auth 模块质量"

# Orchestrated mode: each participant reasons independently
knowlyr-crew discuss run architecture-review --orchestrated

2. Persistent Memory with Exponential Decay

Ebbinghaus (1885) demonstrated that memory strength decays exponentially over time, and that spaced repetition effectively counteracts forgetting. Crew incorporates this cognitive science principle into the knowledge persistence mechanism of agent systems:

Five memory categories:

Category Description Example
decision Decision record "Chose JWT over Session-based approach"
estimate Estimation record "CSS refactoring estimated at 2 days"
finding Discovery record "main.css has 2,057 lines, exceeding maintainability threshold"
correction Correction record "CSS refactoring actually took 5 days; cross-module dependencies were underestimated"
pattern Work pattern "API changes must be accompanied by SDK documentation updates" (automatically shared across employees)

Embedding fallback chain (Graceful Degradation):

OpenAI text-embedding-3-small -> Gemini text-embedding-004 -> TF-IDF (zero-dependency fallback)

When any upstream provider is unavailable, the system automatically degrades to the next tier, ensuring semantic search remains functional even in environments without API keys.

Importance & access tracking: Each memory entry carries an importance weight (1-5) and a last_accessed timestamp. Queries support importance-based sorting and minimum importance filtering; API calls automatically update access timestamps.

Cross-employee work patterns (pattern): Reusable work patterns distilled from individual experience, automatically marked as shared (shared: true), with configurable trigger conditions (trigger_condition) and applicability scope (applicability). Other employees automatically acquire these patterns in matching scenarios.

Correction chains correspond to the reconsolidation mechanism in memory science: $\text{correct}(m_{\text{old}}, m_{\text{new}})$ does not delete the old memory but instead zeros its confidence and creates a new entry with a provenance link, preserving the cognitive evolution trajectory.

Self-check learning loop: Through the shared template _templates/selfcheck.md, employees automatically output a self-check checklist at the end of each task. The system extracts self-check results from the output, writes them as correction memories, and automatically injects them on the next execution -- forming a continuous learning loop of execution -> self-check -> memory -> improvement.

Auto-memory (auto_memory: true): After task execution, employees automatically save a summary to persistent memory (category=finding) without manual invocation.

knowlyr-crew memory add code-reviewer finding "main.css 有 2057 行,超出维护阈值"
knowlyr-crew memory show code-reviewer
knowlyr-crew memory correct code-reviewer <old_id> "CSS 拆分实际花了 5 天"

Storage: .crew/memory/{employee}.jsonl (memories) + .crew/memory/embeddings.db (vector index, SQLite WAL)

3. Evaluation Feedback Loop

Track decision quality, and after retrospective evaluation, automatically write lessons learned into employee memory -- functionally isomorphic to the core mechanism of RLHF (Christiano et al., 2017): human preference feedback directly influences subsequent model behavior; here, human evaluation results directly influence subsequent inference context:

graph LR
    D["track()<br/>记录决策"] --> E["执行"]
    E --> O["观察<br/>实际结果"]
    O --> V["evaluate()<br/>回溯评估"]
    V --> M["correction<br/>写入记忆"]
    M --> I["下次推理<br/>自动注入"]
    I --> D

    style D fill:#0969da,color:#fff,stroke:#0969da
    style V fill:#8b5cf6,color:#fff,stroke:#8b5cf6
    style M fill:#2da44e,color:#fff,stroke:#2da44e

Three decision categories: estimate / recommendation / commitment. Evaluation conclusions are automatically written as correction entries into the employee's persistent memory and automatically injected during subsequent inference -- the agent updates its cognition from its own decision errors.

# Record a decision
knowlyr-crew eval track pm estimate "CSS 拆分需要 2 天"

# Evaluate (conclusions automatically written to memory)
knowlyr-crew eval run <id> "实际花了 5 天" \
  --evaluation "低估了跨模块依赖的复杂度,未来 ×2.5"

4. Declarative Employee Specification

By analogy with Infrastructure as Code (Morris, 2016) -- Terraform uses declarative HCL to define infrastructure, Kubernetes uses YAML to define desired service state -- Crew uses declarative specifications to define the capability boundaries of AI employees. Configuration is separated from prompts, version-trackable, and IDE-agnostic:

Directory format (recommended):

security-auditor/
├── employee.yaml    # Metadata, parameters, tools, output format
├── prompt.md        # Role definition + core instructions
├── workflows/       # Scenario-specific workflows
│   ├── scan.md
│   └── report.md
└── adaptors/        # Project-type adaptors (python / nodejs / ...)
    └── python.md
# employee.yaml
name: security-auditor
display_name: Security Auditor
character_name: Alex Morgan
version: "1.0"
model: claude-opus-4-6
tags: [security, audit]
triggers: [audit, sec]
tools: [file_read, bash, grep]
context: [pyproject.toml, src/]
auto_memory: true                    # Automatically save task summary to persistent memory
kpi:                                 # KPI metrics (auto-evaluated in weekly KPI report)
  - OWASP 覆盖率
  - 建议可操作性
  - 零误报率
args:
  - name: target
    description: 审计目标
    required: true
  - name: severity
    description: 最低严重等级
    default: medium
output:
  format: markdown
  filename: "audit-{date}.md"

Single-file format: Suitable for simple employees -- YAML frontmatter + Markdown body.

Discovery & priority:

Priority Location Description
Highest private/employees/ Custom employees within the repository
Medium .claude/skills/ Claude Code Skills compatibility layer
Low Built-in package Default employees

Smart context (--smart-context): Automatically detects project type (Python / Node.js / Go / Rust / Java), framework, package manager, and test framework, injecting adaptation information into the prompt.

Built-in employees
Employee Trigger Purpose
product-manager pm Requirements analysis, user stories, roadmaps
code-reviewer review Code review: quality, security, performance
test-engineer test Write or supplement unit tests
refactor-guide refactor Code structure analysis, refactoring recommendations
doc-writer doc Documentation generation (README / API / CHANGELOG)
pr-creator pr Analyze changes, create Pull Requests
Prompt variable substitution
Variable Description
$target, $severity Named parameter values
$1, $2 Positional parameters
{date}, {datetime} Current date/time
{cwd}, {git_branch} Working directory / Git branch
{project_type}, {framework} Project type / Framework
{test_framework}, {package_manager} Test framework / Package manager

5. Pipeline Orchestration

Multi-employee DAG (Directed Acyclic Graph) orchestration with four step types:

Step Type Description
Sequential Serial execution; {prev} references previous step output
Parallel Group asyncio.gather concurrent execution with 600s timeout
Conditional contains / matches / equals conditional branching
Loop Loop execution with state passing

Multi-Provider Routing:

Provider Model Prefix Example
Anthropic claude- claude-opus-4-6, claude-sonnet-4-5
OpenAI gpt-, o1-, o3- gpt-4o, o3-mini
DeepSeek deepseek- deepseek-chat, deepseek-reasoner
Moonshot kimi-, moonshot- kimi-k2.5
Google gemini- gemini-2.5-pro
Zhipu glm- glm-4-plus
Alibaba qwen- qwen-max

Automatic routing to the corresponding provider API by model name prefix, with primary model + fallback support.

Feature Description
Output passing {prev} (previous step), {steps.<id>.output} (reference by ID)
Checkpoint recovery Resume from last completed step after mid-run failure (pipeline checkpoint resume)
Fallback Automatic switch to fallback model after primary model retries are exhausted
Mermaid visualization Automatic flowchart generation from pipeline definitions
# Generate per-step prompts
knowlyr-crew pipeline run review-test-pr --arg target=main

# Execute mode: automatically invoke LLMs for chained execution
knowlyr-crew pipeline run full-review --execute --model claude-opus-4-6

6. Organization Governance & Adaptive Authority

Declarative organization structure defining team groupings, authority levels, and collaboration routing templates -- enabling delegation decisions to be evidence-based rather than reliant on AI guesswork. The permission system features adaptive degradation capability:

# private/organization.yaml
teams:
  engineering:
    label: 工程组
    members: [code-reviewer, test-engineer, backend-engineer]
  data:
    label: 数据组
    members: [data-engineer, dba, mlops-engineer]

authority:
  A:
    label: 自主执行
    members: [code-reviewer, test-engineer, doc-writer]
  B:
    label: 需确认
    members: [product-manager, solutions-architect]
  C:
    label: 看场景
    members: [backend-engineer, devops-engineer]

routing_templates:
  code_change:
    steps:
      - role: implement
        team: engineering
      - role: review
        employee: code-reviewer
      - role: test
        employees: [test-engineer, e2e-tester]
Feature Description
Three-tier authority A (autonomous execution) / B (requires confirmation) / C (context-dependent); delegation lists are automatically annotated
Auto-degradation 3 consecutive task failures -> authority downgrades from A/B to C, persisted to JSON
Routing templates The route tool expands templates into delegate_chain, supporting multi-workflow lines, CI step annotations, human judgment nodes, and repository bindings
KPI metrics Each employee declares KPI metrics; weekly report cron auto-evaluates and generates A/B/C/D ratings
Manual recovery One-click API to restore degraded authority

7. Cost-Aware Orchestration

Built-in model pricing tables (7 providers) with per-task cost calculation, supporting aggregation by employee / model / time period for ROI per Decision analysis:

Feature Description
Per-task metering Each execution automatically records input/output tokens + cost_usd
Quality pre-scoring Parses {"score": N} JSON at the end of output, associating it with task results
Multi-dimensional aggregation Aggregate by employee / model / time period / trigger source
A/B testing Primary model + fallback model; compare the Pareto frontier of cost vs. quality
# MCP / Agent tools
query_cost(days=7)
query_cost(days=30, employee="code-reviewer")

# HTTP API
curl /api/cost/summary?days=7

8. Output Sanitization -- Defense in Depth

Raw LLM output may contain internal reasoning tags (<thinking>, <reflection>, <inner_monologue>) and tool invocation XML blocks -- these are the model's "working drafts" and should not be exposed to end users. The Output Sanitizer implements defense in depth:

Defense Layer Location Responsibility
Source sanitization webhook_executor LLM return values are sanitized before entering business logic
Egress sanitization webhook_handlers · webhook_feishu Secondary sanitization before messages are sent to users/callbacks

Sanitization rules cover 5 tag pattern categories (regex matching + content removal), handling nested tags and multiline residual whitespace. When either layer misses something, the other provides a safety net -- drawing on the defense-in-depth principle from cybersecurity (Schneier, 2000).

9. Zero-Intrusion Trajectory Recording

Zero-intrusion trajectory recording via contextvars.ContextVar -- no modification to any business code required; automatically captures agent reasoning, tool invocations, execution results, and token consumption:

Crew produces trajectories -> agentrecorder standard format -> knowlyr-gym PRM scoring -> SFT / DPO / GRPO training

This is the data bridge connecting Crew (collaboration layer) and knowlyr-gym (training layer) -- real interaction trajectories produced during Crew runtime can be directly used for agent reinforcement learning training.

Quick Start

pip install knowlyr-crew[mcp]

# 1. View all available employees
knowlyr-crew list

# 2. Run a code review (auto-detects project type)
knowlyr-crew run review main --smart-context

# 3. Initiate a multi-employee structured discussion
knowlyr-crew discuss adhoc -e "code-reviewer,test-engineer" -t "auth 模块安全性"

# 4. Track and evaluate decisions
knowlyr-crew eval track pm estimate "重构需要 3 天"
# ... after execution ...
knowlyr-crew eval run <id> "实际花了 7 天" --evaluation "低估跨模块依赖"

# 5. View employee memory (including evaluation corrections)
knowlyr-crew memory show product-manager

MCP configuration (Claude Desktop / Claude Code / Cursor):

{
  "mcpServers": {
    "crew": {
      "command": "knowlyr-crew",
      "args": ["mcp"]
    }
  }
}

Once configured, the AI IDE can directly invoke code-reviewer for code review, test-engineer for writing tests, run_pipeline for chaining multi-employee pipelines, and run_discussion for initiating multi-employee discussions.

Async Delegation & Meeting Orchestration

AI employees can delegate in parallel to multiple colleagues for task execution, or organize multi-person meetings for asynchronous deliberation:

User -> 姜墨言: "Have code-reviewer review the PR and test-engineer write tests simultaneously"

姜墨言:
  ① delegate_async -> code-reviewer (task_id: 20260216-143022-a3f5b8c2)
  ② delegate_async -> test-engineer (task_id: 20260216-143022-b7d4e9f1)
  ③ "Both tasks are now executing in parallel"
  ④ check_task -> check progress/results
Tool Description
delegate_async Asynchronous delegation; returns task_id immediately
delegate_chain Sequential chain delegation; {prev} references previous step output
check_task / list_tasks Query task status and results
organize_meeting Multi-employee async discussion; each round runs asyncio.gather for parallel inference
schedule_task / list_schedules Dynamic cron scheduled tasks
run_pipeline Trigger a pre-defined pipeline (async execution)
agent_file_read / agent_file_grep Path-safe file operations
query_data Fine-grained business data queries
find_free_time Feishu availability query; find common free time across multiple people

Proactive patrol & self-driven operations: Scheduled tasks configured via .crew/cron.yaml:

Schedule Description
Daily 9:00 Morning patrol -- business data, to-dos, calendar, system status -> Feishu briefing
Daily 23:00 AI diary -- personal diary based on the day's work and memories
Thursday 16:00 Team knowledge digest -- cross-team work output + common issues + best practices -> Feishu document
Friday 17:00 KPI weekly report -- per-employee rating + anomaly auto-delegation (D-grade -> HR follow-up, consecutive improvement items -> team attention)
Friday 18:00 Weekly retrospective -- highlights, issues, and recommendations for next week

Production Server

Crew can run as an HTTP server, receiving external events and automatically triggering pipeline / employee execution:

pip install knowlyr-crew[webhook]
knowlyr-crew serve --port 8765 --token YOUR_SECRET

API Endpoints

Path Method Description
/health GET Health check (no authentication required)
/webhook/github POST GitHub webhook (HMAC-SHA256 signature verification)
/webhook/openclaw POST OpenClaw message events
/run/pipeline/{name} POST Trigger pipeline (async/sync/SSE streaming)
/run/employee/{name} POST Trigger employee (supports SSE streaming)
/api/employees/{id}/prompt GET Employee capability definition (includes team, authority, 7-day cost)
/api/employees/{id}/state GET Runtime state (personality, memory, notes)
/api/employees/{id} PUT Update configuration (model/temperature/max_tokens)
/api/employees/{id}/authority/restore POST Restore auto-degraded authority
/api/cost/summary GET Cost summary
/api/project/status GET Project status overview
/api/memory/ingest POST Import external discussion data into employee memory
/tasks/{task_id} GET Query task status and results
/metrics GET Invocation/latency/token/error statistics
/cron/status GET Cron scheduler status
Production features
Feature Description
Bearer authentication --api-token, timing-safe comparison
CORS --cors-origin, multi-origin support
Rate limiting 60 requests/minute/IP
Request size limit Default 1MB
Circuit breaker knowlyr-id pauses for 30 seconds after 3 consecutive failures
Cost tracking Per-task token metering + model pricing
Auto-degradation Consecutive failures automatically lower employee authority
CI audit Post-deployment automatic permission audit script; Feishu alert on failure
Trace ID Unique trace_id per task
Concurrency safety fcntl.flock file locks + SQLite WAL
Task persistence .crew/tasks.jsonl, recoverable after restart
Periodic heartbeat Heartbeat to knowlyr-id every 60 seconds

Webhook Configuration

.crew/webhook.yaml defines event routing rules (GitHub HMAC-SHA256 signature verification); .crew/cron.yaml defines scheduled tasks (croniter parsing). The KPI weekly report cron includes built-in anomaly auto-delegation rules -- employees rated D (no output) are automatically escalated to HR, and consecutive self-check improvement items trigger team attention notifications.

Integrations

knowlyr-id -- Identity & Runtime Federation

Crew defines "who does what"; knowlyr-id manages identity, conversations, and runtime. The two collaborate but can each be used independently:

┌──────────────────────────────────────┐
│        Crew (Capability Authority)    │
│  prompt · model · tools · avatar     │
│  temperature · bio · tags            │
└──────────────┬───────────────────────┘
     API fetch prompt │ sync push all fields
┌──────────────┴───────────────────────┐
│      knowlyr-id (Identity + Runtime)  │
│  user accounts · conversations ·     │
│  memory · heartbeat · scheduling ·   │
│  messaging · API keys · work logs    │
└──────────────────────────────────────┘

knowlyr-id fetches employee prompt / model / temperature / team / authority / cost via CREW_API_URL (5-minute cache), falling back to DB cache when unavailable. The connection is optional -- Crew runs independently when not configured. The admin dashboard displays each employee's authority badge, team membership, and 7-day cost in real time, and supports one-click restoration of auto-degraded authority.

Employee status sync (agent_status): Crew maintains a three-state lifecycle -- active (normal operation) / frozen (frozen: configuration preserved but execution skipped) / inactive (deactivated). Status changes are bidirectionally synced to knowlyr-id via sync; frozen employees are automatically skipped during pipeline execution.

Field mapping
Crew Employee knowlyr-id Direction
name crew_name push ->
character_name nickname push ->
display_name title push ->
bio bio push ->
description capabilities push ->
tags domains push ->
rendered prompt system_prompt push ->
avatar.webp avatar_base64 push ->
model model push ->
temperature temperature <->
max_tokens max_tokens push ->
memory-id.md memory <- pull
# One-click deployment (rsync -> restart -> sync knowlyr-id)
make push

Claude Code Skills Interoperability

Crew employees and Claude Code native Skills are bidirectionally convertible: tools <-> allowed-tools, args <-> argument-hint, metadata round-trips via HTML comments.

knowlyr-crew export code-reviewer    # -> .claude/skills/code-reviewer/SKILL.md
knowlyr-crew sync --clean            # Sync + clean orphaned directories

Avatar Generation

Tongyi Wanxiang (DashScope) generates realistic professional portrait avatars, 768x768 -> 256x256 webp:

pip install knowlyr-crew[avatar]
knowlyr-crew avatar security-auditor

CLI Reference

Complete CLI command list

Core

knowlyr-crew list [--tag TAG] [--layer LAYER] [-f json]  # List employees
knowlyr-crew show <name>                                  # View details
knowlyr-crew run <name> [ARGS] [--smart-context] [--agent-id ID] [--copy] [-o FILE]
knowlyr-crew init [--employee NAME] [--dir-format] [--avatar]
knowlyr-crew validate <path>
knowlyr-crew check --json                                 # Quality radar

Discussions

knowlyr-crew discuss list
knowlyr-crew discuss run <name> [--orchestrated] [--arg key=val]
knowlyr-crew discuss adhoc -e "员工1,员工2" -t "议题"
knowlyr-crew discuss history [-n 20]
knowlyr-crew discuss view <meeting_id>

Memory

knowlyr-crew memory list
knowlyr-crew memory show <employee> [--category ...]
knowlyr-crew memory add <employee> <category> <text>
knowlyr-crew memory correct <employee> <old_id> <text>

Evaluation

knowlyr-crew eval track <employee> <category> <text>
knowlyr-crew eval list [--status pending]
knowlyr-crew eval run <decision_id> <outcome> [--evaluation TEXT]
knowlyr-crew eval prompt <decision_id>

Pipeline

knowlyr-crew pipeline list
knowlyr-crew pipeline run <name> [--execute] [--model MODEL] [--arg key=val]
knowlyr-crew pipeline checkpoint list
knowlyr-crew pipeline checkpoint resume <task_id>

Route

knowlyr-crew route list [-f json]                                  # List collaboration routing templates
knowlyr-crew route show <name>                                     # View route details
knowlyr-crew route run <name> <task> [--execute] [--remote]        # Execute collaboration route

Server & MCP

knowlyr-crew serve --port 8765 --token SECRET [--no-cron] [--cors-origin URL]
knowlyr-crew mcp [-t stdio|sse|http] [--port PORT] [--api-token TOKEN]

Agent Management

knowlyr-crew register <name> [--dry-run]
knowlyr-crew agents list
knowlyr-crew agents status <id>
knowlyr-crew agents sync <name>
knowlyr-crew agents sync-all [--push-only|--pull-only] [--force] [--dry-run]

Templates & Export

knowlyr-crew template list
knowlyr-crew template apply <template> --employee <name> [--var key=val]
knowlyr-crew export <name>                                # -> SKILL.md
knowlyr-crew export-all
knowlyr-crew sync [--clean]                               # -> .claude/skills/

Other

knowlyr-crew avatar <name>                                # Avatar generation
knowlyr-crew log list [--employee NAME] [-n 20]           # Work logs
knowlyr-crew log show <session_id>

Ecosystem

Architecture Diagram
graph LR
    Radar["Radar<br/>Discovery"] --> Recipe["Recipe<br/>Analysis"]
    Recipe --> Synth["Synth<br/>Generation"]
    Recipe --> Label["Label<br/>Annotation"]
    Synth --> Check["Check<br/>Quality"]
    Label --> Check
    Check --> Audit["Audit<br/>Model Audit"]
    Crew["Crew<br/>Deliberation Engine"]
    Agent["Agent<br/>RL Framework"]
    ID["ID<br/>Identity Runtime"]
    Crew -.->|能力定义| ID
    ID -.->|身份 + 记忆| Crew
    Crew -.->|轨迹 + 奖励| Agent
    Agent -.->|优化策略| Crew
    Ledger["Ledger<br/>Accounting"]
    Crew -.->|AI 员工账户| Ledger
    Ledger -.->|光粒结算| Crew

    style Crew fill:#0969da,color:#fff,stroke:#0969da
    style Ledger fill:#d29922,color:#fff,stroke:#d29922
    style ID fill:#2da44e,color:#fff,stroke:#2da44e
    style Agent fill:#8b5cf6,color:#fff,stroke:#8b5cf6
    style Radar fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style Recipe fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style Synth fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style Label fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style Check fill:#1a1a2e,color:#e0e0e0,stroke:#444
    style Audit fill:#1a1a2e,color:#e0e0e0,stroke:#444
Layer Project Description Repo
Discovery AI Dataset Radar Dataset competitive intelligence and trend analysis GitHub
Analysis DataRecipe Reverse engineering, schema extraction, cost estimation GitHub
Production DataSynth / DataLabel LLM batch synthesis / lightweight annotation GitHub · GitHub
Quality DataCheck Rule validation, deduplication, distribution analysis GitHub
Audit ModelAudit Distillation detection, model fingerprinting GitHub
Identity knowlyr-id Identity system + AI employee runtime GitHub
Accounting knowlyr-ledger Unified ledger · double-entry bookkeeping · row-lock safety · idempotent transactions GitHub
Deliberation Crew Structured dialectical deliberation · persistent memory accumulation · MCP-native You are here
Agent Training knowlyr-gym Gymnasium-style RL framework · process reward model · SFT/DPO/GRPO GitHub

References

  • Model Context Protocol (MCP) -- Anthropic, 2024. Open standard protocol for agent-tool interaction
  • Multi-Agent Systems -- Wooldridge, M., 2009. An Introduction to MultiAgent Systems. Wiley
  • Groupthink -- Janis, I.L., 1972. Victims of Groupthink. Houghton Mifflin
  • Shared Information Bias -- Stasser, G. & Titus, W., 1985. Pooling of Unshared Information in Group Decision Making. JPSP, 48(6)
  • Minority Influence -- Nemeth, C.J., 1994. The Value of Minority Dissent. In S. Moscovici et al. (Eds.), Minority Influence. Nelson-Hall
  • Devil's Advocacy -- Schwenk, C.R., 1990. Effects of devil's advocacy and dialectical inquiry on decision making. Organizational Behavior and Human Decision Processes, 47(1)
  • Cognitive Conflict -- Amason, A.C., 1996. Distinguishing the Effects of Functional and Dysfunctional Conflict. Academy of Management Journal, 39(1)
  • RLHF -- Christiano, P. et al., 2017. Deep RL from Human Preferences. arXiv:1706.03741
  • Ebbinghaus Forgetting Curve -- Ebbinghaus, H., 1885. Uber das Gedachtnis -- The inspiration for the memory decay model
  • Defense in Depth -- Schneier, B., 2000. Secrets and Lies: Digital Security in a Networked World. Wiley -- Source of the multi-layer defense principle
  • Infrastructure as Code -- Morris, K., 2016. Infrastructure as Code. O'Reilly -- Paradigm source for declarative specifications
  • Gymnasium -- Towers et al., 2024. Gymnasium: A Standard Interface for RL Environments. arXiv:2407.17032

Want to discuss this project? Reach out to

Kai" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
Kai Founder & CEO
陆明哲" onerror="var d=document.createElement('div');d.innerHTML=this.dataset.fallback;this.replaceWith(d.firstChild)" />
陆明哲 AI 产品经理