Skip to content

Agent-Friendly Knowledge Base

This proposal addresses the fundamental mismatch between traditional knowledge management systems (Confluence, SharePoint, Google Docs) and the requirements of AI-driven development workflows. Human-readable wikis become AI-hostile environments that fragment context and impede autonomous agent operations.

Key Insight: Knowledge systems designed for human browsing actively obstruct AI agents. The shift to agent-friendly infrastructure makes knowledge machine-readable while remaining human-editable.

Problem Statement

Current State: Human-Centric Knowledge Silos

Evidence from Team Feedback

Pain PointImpact on AI Workflow
Proprietary formatsAI agents cannot read/write directly; requires API translation
No Git integrationSpec changes decoupled from code changes
Rich text bloatCopy-paste introduces formatting artifacts, breaks parsing
Attachment-heavyImages/files cannot be indexed by AI agents
Permission complexityAI agents struggle with access token management
Search limitationsFull-text search insufficient for semantic queries

The Fundamental Mismatch

Traditional wikis optimize for:

  • Visual presentation
  • Point-and-click navigation
  • Real-time collaboration
  • Rich media embedding

AI agents require:

  • Plain text parsing
  • Programmatic access
  • Version control integration
  • Structured metadata

Proposed Solution: Git-Based Markdown Knowledge System

Target Architecture

Core Principles

  1. Markdown First: All content in plain Markdown with optional extensions (Mermaid, frontmatter)
  2. Git Native: Version control is the source of truth, not a sync target
  3. Structured Metadata: Every document includes machine-readable frontmatter
  4. Flat Hierarchy: Minimize nesting; prefer tags and cross-references
  5. Agent-Addressable: Every document has a stable, predictable path

Knowledge Base Architecture

Directory Structure

knowledge-base/
├── CLAUDE.md                    # AI agent instructions for this repo
├── INDEX.md                     # Human-readable table of contents
├── .kb/
│   ├── schema.json             # Frontmatter schema definition
│   ├── glossary.json           # Term definitions for AI context
│   └── relationships.json      # Document relationship graph

├── products/
│   ├── vsaas/
│   │   ├── _product.md         # Product overview
│   │   ├── requirements/
│   │   │   ├── device-management.md
│   │   │   ├── video-playback.md
│   │   │   └── analytics.md
│   │   └── specs/
│   │       ├── api-v2.md
│   │       └── data-models.md
│   │
│   └── vortex/
│       ├── _product.md
│       └── ...

├── domains/
│   ├── device-identity.md      # Cross-product domain concepts
│   ├── authentication.md
│   └── video-streaming.md

├── glossary/
│   ├── terms.md                # Ubiquitous language definitions
│   └── acronyms.md

├── architecture/
│   ├── decisions/              # ADRs (Architecture Decision Records)
│   │   ├── 001-use-grpc.md
│   │   └── 002-event-sourcing.md
│   └── patterns/
│       ├── repository-pattern.md
│       └── clean-architecture.md

└── guides/
    ├── onboarding.md
    └── contribution.md

Frontmatter Schema

Every document includes structured metadata:

markdown
id: req-device-management-001
title: Device Management Requirements
type: requirement
product: vsaas
domain: device-identity
status: approved
version: 2.1.0
created: 2025-01-15
updated: 2025-11-28
authors:
  - alice@company.com
reviewers:
  - bob@company.com
tags:
  - device
  - management
  - crud
related:
  - spec-api-v2
  - domain-device-identity
ai_summary: |
  Core requirements for device lifecycle management including
  registration, configuration, monitoring, and decommissioning.
# Device Management Requirements

...content...

Key Metadata Fields

FieldPurposeAI Usage
idUnique identifierCross-referencing, stable linking
typeDocument classificationFiltering, context loading
productProduct ownershipScoping agent context
domainBusiness domainSemantic grouping
statusLifecycle stateFiltering active vs archived
tagsFlexible categorizationDiscovery, search
relatedExplicit relationshipsContext expansion
ai_summaryPre-computed summaryFast context loading

Agent Integration Patterns

1. MCP Resource Provider

Expose knowledge base as MCP resources:

typescript
// MCP server for knowledge base
const knowledgeBaseServer = {
  resources: {
    list: async () => {
      // Return all documents with metadata
      return documents.map(doc => ({
        uri: `kb://products/${doc.product}/${doc.id}`,
        name: doc.title,
        mimeType: 'text/markdown',
        metadata: doc.frontmatter
      }))
    },
    read: async (uri: string) => {
      // Return document content
      return { content: await readDocument(uri) }
    }
  },
  tools: {
    search: async ({ query, filters }) => {
      // Semantic search across knowledge base
      return searchDocuments(query, filters)
    },
    getRelated: async ({ documentId }) => {
      // Return related documents
      return findRelatedDocuments(documentId)
    }
  }
}

2. CLAUDE.md Integration

markdown
## Knowledge Base

### Accessing Requirements
- **Location**: `knowledge-base/products/{product}/requirements/`
- **Format**: Markdown with YAML frontmatter
- **Search**: Use MCP tool `kb_search` for semantic queries

### Before Implementing Features
1. Search knowledge base for existing requirements: `kb_search("feature name")`
2. Check related specifications: `kb_getRelated("requirement-id")`
3. Review domain concepts: `knowledge-base/domains/`

### Updating Documentation
When implementing features, update related knowledge base documents:
1. Add implementation notes to requirement files
2. Update status if requirement is fulfilled
3. Create new specs for API changes

3. Context7 Registry

Register knowledge base for AI documentation lookup:

json
{
  "name": "company-knowledge-base",
  "description": "Product requirements, specifications, and architecture decisions",
  "source": "github.com/company/knowledge-base",
  "topics": [
    "requirements",
    "specifications",
    "architecture",
    "domain-concepts"
  ]
}

Migration Strategy

Phase 1: Parallel Run (Month 1-3)

Activities:

  • [ ] Set up Git knowledge base repository
  • [ ] Define frontmatter schema
  • [ ] Create CLAUDE.md with AI instructions
  • [ ] Write all new specifications in knowledge base
  • [ ] Add deprecation notices to Confluence

Success Criteria:

  • 100% new documents in Git KB
  • AI agents can read all new specs

Phase 2: Active Migration (Month 3-6)

Activities:

  • [ ] Identify active Confluence pages (accessed in last 6 months)
  • [ ] Convert to Markdown format
  • [ ] Add structured frontmatter
  • [ ] Create redirect mapping
  • [ ] Update cross-references

Success Criteria:

  • 80% active content migrated
  • All product requirements in Git KB

Phase 3: Archive & Complete (Month 6-9)

Activities:

  • [ ] Export remaining content as static archive
  • [ ] Disable Confluence editing
  • [ ] Implement MCP resource provider
  • [ ] Complete Context7 registration
  • [ ] Train teams on new workflow

Success Criteria:

  • Confluence read-only archive
  • AI agents have full knowledge base access
  • Teams exclusively use Git KB for new content

Confluence to Markdown Conversion

Automated Conversion Pipeline

bash
#!/bin/bash
# confluence-to-markdown.sh

# Export from Confluence API
confluence-export --space "PROJ" --format html --output ./export/

# Convert HTML to Markdown
for file in ./export/*.html; do
  pandoc "$file" -f html -t gfm -o "${file%.html}.md"
done

# Add frontmatter
for file in ./export/*.md; do
  add-frontmatter.py "$file" --template ./templates/frontmatter.yaml
done

# Validate and fix links
validate-links.py ./export/ --fix

Manual Review Checklist

For each migrated document:

  • [ ] Frontmatter complete and accurate
  • [ ] Internal links converted to relative paths
  • [ ] Images extracted and referenced correctly
  • [ ] Tables render properly in Markdown
  • [ ] Code blocks have language tags
  • [ ] Mermaid diagrams converted or recreated
  • [ ] Related documents cross-referenced

Comparison: Confluence vs Git Knowledge Base

AspectConfluenceGit Knowledge Base
AI Read AccessAPI required, rate limitedDirect file read
AI Write AccessComplex API, permissionsGit commit
Version HistoryBuilt-in but separateGit log, blame
BranchingNot supportedNative
Code ProximityCompletely separateSame repo or linked
Offline AccessLimitedFull
SearchFull-text onlySemantic + structured
CollaborationReal-timePR-based review
CostPer-user licensingFree (Git hosting)
Lock-inHigh (proprietary)None (plain text)

Platform Options

For maximum AI agent compatibility:

Pros:
+ Full control, no vendor lock-in
+ AI agents read/write directly
+ Same workflow as code
+ No sync issues

Cons:
- No real-time collaboration
- Less visual editing experience
- Requires Git comfort

Alternative: Git-Synced Platforms

PlatformGit SyncBest For
OutlineNative GitHub/GitLab syncTeams wanting wiki UX
GitBookBi-directional syncPublic documentation
DocusaurusNative (docs in repo)Developer docs
VitePressNative (docs in repo)Technical documentation

CLAUDE.md Template for Knowledge Base

markdown
# Knowledge Base - AI Instructions

## Repository Purpose
This repository contains product requirements, specifications,
architecture decisions, and domain knowledge for [Company] products.

## Document Types
- `requirement` - Product requirements and user stories
- `specification` - Technical specifications and API docs
- `adr` - Architecture Decision Records
- `domain` - Domain concept definitions
- `guide` - How-to guides and tutorials

## Finding Information
1. Check `INDEX.md` for table of contents
2. Search by product: `products/{product-name}/`
3. Search by domain: `domains/{domain-name}.md`
4. Use frontmatter tags for filtering

## Before Creating New Documents
1. Search for existing content on the topic
2. Check if updating existing doc is appropriate
3. Use correct template from `templates/`
4. Add complete frontmatter metadata

## Document Relationships
- Use `related` frontmatter field for explicit links
- Reference documents by ID: `[See Device Management](req-device-management-001)`
- Check `.kb/relationships.json` for dependency graph

## Updating Documents
When updating specifications:
1. Increment version in frontmatter
2. Update `updated` date
3. Add yourself to `authors` if significant change
4. Update related documents if needed

Success Metrics

MetricTargetHow to Measure
AI context loading time<2sMeasure MCP resource fetch
Spec-to-code correlation>80%Track commits with spec refs
Document freshness<90 days avgMonitor updated dates
Cross-reference accuracy100%Validate related links
Migration completion100% activeAudit Confluence usage
Agent query success rate>90%Track MCP tool results

Anti-Patterns to Avoid

1. Recreating Wiki Features

Problem: Adding rich editing, real-time collaboration, complex permissions.

Why It Fails: Diverges from Git simplicity, creates sync issues, reduces AI compatibility.

Instead: Accept PR-based collaboration as the model.

2. Deep Nesting

Problem: Creating 5+ level folder hierarchies.

Why It Fails: Hard to navigate, paths become unwieldy, search becomes necessary anyway.

Instead: Flat structure with tags and cross-references.

3. Ignoring Frontmatter

Problem: Writing Markdown without structured metadata.

Why It Fails: Loses machine-readability, prevents filtering, breaks relationships.

Instead: Enforce frontmatter via pre-commit hooks.

4. Partial Migration

Problem: Running Confluence and Git KB indefinitely.

Why It Fails: Split knowledge, confusion about source of truth, double maintenance.

Instead: Set firm migration deadline, archive Confluence.

Related: Continuous Context Cleanup | Back: Proposals Overview

References