Agent-Friendly Knowledge Base
This proposal addresses the fundamental mismatch between traditional knowledge management systems (Confluence, SharePoint, Google Docs) and the requirements of AI-driven development workflows. Human-readable wikis become AI-hostile environments that fragment context and impede autonomous agent operations.
Key Insight: Knowledge systems designed for human browsing actively obstruct AI agents. The shift to agent-friendly infrastructure makes knowledge machine-readable while remaining human-editable.
Problem Statement
Current State: Human-Centric Knowledge Silos
Evidence from Team Feedback
| Pain Point | Impact on AI Workflow |
|---|---|
| Proprietary formats | AI agents cannot read/write directly; requires API translation |
| No Git integration | Spec changes decoupled from code changes |
| Rich text bloat | Copy-paste introduces formatting artifacts, breaks parsing |
| Attachment-heavy | Images/files cannot be indexed by AI agents |
| Permission complexity | AI agents struggle with access token management |
| Search limitations | Full-text search insufficient for semantic queries |
The Fundamental Mismatch
Traditional wikis optimize for:
- Visual presentation
- Point-and-click navigation
- Real-time collaboration
- Rich media embedding
AI agents require:
- Plain text parsing
- Programmatic access
- Version control integration
- Structured metadata
Proposed Solution: Git-Based Markdown Knowledge System
Target Architecture
Core Principles
- Markdown First: All content in plain Markdown with optional extensions (Mermaid, frontmatter)
- Git Native: Version control is the source of truth, not a sync target
- Structured Metadata: Every document includes machine-readable frontmatter
- Flat Hierarchy: Minimize nesting; prefer tags and cross-references
- Agent-Addressable: Every document has a stable, predictable path
Knowledge Base Architecture
Directory Structure
knowledge-base/
├── CLAUDE.md # AI agent instructions for this repo
├── INDEX.md # Human-readable table of contents
├── .kb/
│ ├── schema.json # Frontmatter schema definition
│ ├── glossary.json # Term definitions for AI context
│ └── relationships.json # Document relationship graph
│
├── products/
│ ├── vsaas/
│ │ ├── _product.md # Product overview
│ │ ├── requirements/
│ │ │ ├── device-management.md
│ │ │ ├── video-playback.md
│ │ │ └── analytics.md
│ │ └── specs/
│ │ ├── api-v2.md
│ │ └── data-models.md
│ │
│ └── vortex/
│ ├── _product.md
│ └── ...
│
├── domains/
│ ├── device-identity.md # Cross-product domain concepts
│ ├── authentication.md
│ └── video-streaming.md
│
├── glossary/
│ ├── terms.md # Ubiquitous language definitions
│ └── acronyms.md
│
├── architecture/
│ ├── decisions/ # ADRs (Architecture Decision Records)
│ │ ├── 001-use-grpc.md
│ │ └── 002-event-sourcing.md
│ └── patterns/
│ ├── repository-pattern.md
│ └── clean-architecture.md
│
└── guides/
├── onboarding.md
└── contribution.mdFrontmatter Schema
Every document includes structured metadata:
id: req-device-management-001
title: Device Management Requirements
type: requirement
product: vsaas
domain: device-identity
status: approved
version: 2.1.0
created: 2025-01-15
updated: 2025-11-28
authors:
- alice@company.com
reviewers:
- bob@company.com
tags:
- device
- management
- crud
related:
- spec-api-v2
- domain-device-identity
ai_summary: |
Core requirements for device lifecycle management including
registration, configuration, monitoring, and decommissioning.
# Device Management Requirements
...content...Key Metadata Fields
| Field | Purpose | AI Usage |
|---|---|---|
id | Unique identifier | Cross-referencing, stable linking |
type | Document classification | Filtering, context loading |
product | Product ownership | Scoping agent context |
domain | Business domain | Semantic grouping |
status | Lifecycle state | Filtering active vs archived |
tags | Flexible categorization | Discovery, search |
related | Explicit relationships | Context expansion |
ai_summary | Pre-computed summary | Fast context loading |
Agent Integration Patterns
1. MCP Resource Provider
Expose knowledge base as MCP resources:
// MCP server for knowledge base
const knowledgeBaseServer = {
resources: {
list: async () => {
// Return all documents with metadata
return documents.map(doc => ({
uri: `kb://products/${doc.product}/${doc.id}`,
name: doc.title,
mimeType: 'text/markdown',
metadata: doc.frontmatter
}))
},
read: async (uri: string) => {
// Return document content
return { content: await readDocument(uri) }
}
},
tools: {
search: async ({ query, filters }) => {
// Semantic search across knowledge base
return searchDocuments(query, filters)
},
getRelated: async ({ documentId }) => {
// Return related documents
return findRelatedDocuments(documentId)
}
}
}2. CLAUDE.md Integration
## Knowledge Base
### Accessing Requirements
- **Location**: `knowledge-base/products/{product}/requirements/`
- **Format**: Markdown with YAML frontmatter
- **Search**: Use MCP tool `kb_search` for semantic queries
### Before Implementing Features
1. Search knowledge base for existing requirements: `kb_search("feature name")`
2. Check related specifications: `kb_getRelated("requirement-id")`
3. Review domain concepts: `knowledge-base/domains/`
### Updating Documentation
When implementing features, update related knowledge base documents:
1. Add implementation notes to requirement files
2. Update status if requirement is fulfilled
3. Create new specs for API changes3. Context7 Registry
Register knowledge base for AI documentation lookup:
{
"name": "company-knowledge-base",
"description": "Product requirements, specifications, and architecture decisions",
"source": "github.com/company/knowledge-base",
"topics": [
"requirements",
"specifications",
"architecture",
"domain-concepts"
]
}Migration Strategy
Phase 1: Parallel Run (Month 1-3)
Activities:
- [ ] Set up Git knowledge base repository
- [ ] Define frontmatter schema
- [ ] Create CLAUDE.md with AI instructions
- [ ] Write all new specifications in knowledge base
- [ ] Add deprecation notices to Confluence
Success Criteria:
- 100% new documents in Git KB
- AI agents can read all new specs
Phase 2: Active Migration (Month 3-6)
Activities:
- [ ] Identify active Confluence pages (accessed in last 6 months)
- [ ] Convert to Markdown format
- [ ] Add structured frontmatter
- [ ] Create redirect mapping
- [ ] Update cross-references
Success Criteria:
- 80% active content migrated
- All product requirements in Git KB
Phase 3: Archive & Complete (Month 6-9)
Activities:
- [ ] Export remaining content as static archive
- [ ] Disable Confluence editing
- [ ] Implement MCP resource provider
- [ ] Complete Context7 registration
- [ ] Train teams on new workflow
Success Criteria:
- Confluence read-only archive
- AI agents have full knowledge base access
- Teams exclusively use Git KB for new content
Confluence to Markdown Conversion
Automated Conversion Pipeline
#!/bin/bash
# confluence-to-markdown.sh
# Export from Confluence API
confluence-export --space "PROJ" --format html --output ./export/
# Convert HTML to Markdown
for file in ./export/*.html; do
pandoc "$file" -f html -t gfm -o "${file%.html}.md"
done
# Add frontmatter
for file in ./export/*.md; do
add-frontmatter.py "$file" --template ./templates/frontmatter.yaml
done
# Validate and fix links
validate-links.py ./export/ --fixManual Review Checklist
For each migrated document:
- [ ] Frontmatter complete and accurate
- [ ] Internal links converted to relative paths
- [ ] Images extracted and referenced correctly
- [ ] Tables render properly in Markdown
- [ ] Code blocks have language tags
- [ ] Mermaid diagrams converted or recreated
- [ ] Related documents cross-referenced
Comparison: Confluence vs Git Knowledge Base
| Aspect | Confluence | Git Knowledge Base |
|---|---|---|
| AI Read Access | API required, rate limited | Direct file read |
| AI Write Access | Complex API, permissions | Git commit |
| Version History | Built-in but separate | Git log, blame |
| Branching | Not supported | Native |
| Code Proximity | Completely separate | Same repo or linked |
| Offline Access | Limited | Full |
| Search | Full-text only | Semantic + structured |
| Collaboration | Real-time | PR-based review |
| Cost | Per-user licensing | Free (Git hosting) |
| Lock-in | High (proprietary) | None (plain text) |
Platform Options
Recommended: Raw Markdown in Git
For maximum AI agent compatibility:
Pros:
+ Full control, no vendor lock-in
+ AI agents read/write directly
+ Same workflow as code
+ No sync issues
Cons:
- No real-time collaboration
- Less visual editing experience
- Requires Git comfortAlternative: Git-Synced Platforms
| Platform | Git Sync | Best For |
|---|---|---|
| Outline | Native GitHub/GitLab sync | Teams wanting wiki UX |
| GitBook | Bi-directional sync | Public documentation |
| Docusaurus | Native (docs in repo) | Developer docs |
| VitePress | Native (docs in repo) | Technical documentation |
CLAUDE.md Template for Knowledge Base
# Knowledge Base - AI Instructions
## Repository Purpose
This repository contains product requirements, specifications,
architecture decisions, and domain knowledge for [Company] products.
## Document Types
- `requirement` - Product requirements and user stories
- `specification` - Technical specifications and API docs
- `adr` - Architecture Decision Records
- `domain` - Domain concept definitions
- `guide` - How-to guides and tutorials
## Finding Information
1. Check `INDEX.md` for table of contents
2. Search by product: `products/{product-name}/`
3. Search by domain: `domains/{domain-name}.md`
4. Use frontmatter tags for filtering
## Before Creating New Documents
1. Search for existing content on the topic
2. Check if updating existing doc is appropriate
3. Use correct template from `templates/`
4. Add complete frontmatter metadata
## Document Relationships
- Use `related` frontmatter field for explicit links
- Reference documents by ID: `[See Device Management](req-device-management-001)`
- Check `.kb/relationships.json` for dependency graph
## Updating Documents
When updating specifications:
1. Increment version in frontmatter
2. Update `updated` date
3. Add yourself to `authors` if significant change
4. Update related documents if neededSuccess Metrics
| Metric | Target | How to Measure |
|---|---|---|
| AI context loading time | <2s | Measure MCP resource fetch |
| Spec-to-code correlation | >80% | Track commits with spec refs |
| Document freshness | <90 days avg | Monitor updated dates |
| Cross-reference accuracy | 100% | Validate related links |
| Migration completion | 100% active | Audit Confluence usage |
| Agent query success rate | >90% | Track MCP tool results |
Anti-Patterns to Avoid
1. Recreating Wiki Features
Problem: Adding rich editing, real-time collaboration, complex permissions.
Why It Fails: Diverges from Git simplicity, creates sync issues, reduces AI compatibility.
Instead: Accept PR-based collaboration as the model.
2. Deep Nesting
Problem: Creating 5+ level folder hierarchies.
Why It Fails: Hard to navigate, paths become unwieldy, search becomes necessary anyway.
Instead: Flat structure with tags and cross-references.
3. Ignoring Frontmatter
Problem: Writing Markdown without structured metadata.
Why It Fails: Loses machine-readability, prevents filtering, breaks relationships.
Instead: Enforce frontmatter via pre-commit hooks.
4. Partial Migration
Problem: Running Confluence and Git KB indefinitely.
Why It Fails: Split knowledge, confusion about source of truth, double maintenance.
Instead: Set firm migration deadline, archive Confluence.
Related Principles
- G1: Single Source of Truth - Knowledge base as canonical source
- G2: Version-Controlled Documentation - Git-based documentation
- C1: Context Engineering Competency - Maintaining clean AI context
- C3: AI First - AI-native infrastructure
Related: Continuous Context Cleanup | Back: Proposals Overview
References
- Model Context Protocol (MCP) - Anthropic's protocol for AI-tool integration
- Context7 - AI documentation lookup service
- Claude Code Documentation - Official documentation for CLAUDE.md AI guidance files
- Docs as Code - Write the Docs guide to documentation in version control