Reverse-Engineering Spec Extraction
Existing codebases contain implicit specifications embedded in their implementations. When AI agents work on legacy systems, they lack structured context about what the system is, leading to repeated code exploration, context window waste, and inconsistent understanding across sessions. Spec Extraction introduces a methodology for reverse-engineering specifications from code to create "existing-fact" context.
Key Insight: Specifications extracted from implementations serve as compressed, authoritative context. A well-written spec is 5-20x smaller than the code it describes, dramatically reducing context window usage while preserving the essential knowledge AI agents need.
Problem Statement
Current State: Ad-hoc Code Exploration
Pain Points from Missing Specs
| Issue | Impact on AI Development |
|---|---|
| No formal spec | AI re-explores code every session |
| Tribal knowledge | Implementation decisions locked in developers' heads |
| Context window waste | Must load entire files to understand behavior |
| Inconsistent understanding | Different AI sessions interpret code differently |
| Onboarding friction | New AI agents (and humans) start from zero |
Real-World Examples
Repeated Discovery:
AI agent needs to modify authentication flow. Each session, it reads 15 files to understand the flow. The same ~2000 lines are loaded repeatedly because no spec exists documenting the authentication architecture.
Lost Institutional Knowledge:
Original developer leaves. The codebase has no documentation. New team members and AI agents must reverse-engineer intent from implementation, often guessing incorrectly.
Context Overflow:
Feature touches 8 modules. Loading all code exceeds context window. Without specs, AI cannot get a high-level view and makes changes that break undocumented invariants.
Proposal: Existing-Fact Specifications
Target State
What is an Existing-Fact Spec?
An existing-fact specification documents verified, implemented behavior extracted from existing code. Unlike forward-looking requirements that describe what should be built, existing-fact specs describe what is built.
Key Distinction: Existing-Fact vs Requirement
| Aspect | Requirement Spec | Existing-Fact Spec |
|---|---|---|
| Source | Business needs, user stories | Implemented code |
| Purpose | Define what to build | Document what exists |
| Authority | Normative (should match) | Descriptive (does match) |
| Creation | Before implementation | After implementation |
| Validation | Tests verify implementation | Implementation is the truth |
| Use Case | Greenfield development | Legacy system understanding |
Existing-Fact Spec Format
Frontmatter Schema
id: ef-auth-oauth-flow
title: OAuth 2.0 Authentication Flow
type: existing-fact
status: verified
version: 1.0.0
created: 2025-12-02
updated: 2025-12-02
extracted_from:
- src/auth/oauth.ts
- src/auth/token-manager.ts
- src/middleware/auth.ts
extraction_method: ai-assisted
confidence: high
verified_by:
- test-suite
- human-review
compression_ratio: 12:1
authors:
- extraction-agent
reviewers:
- senior-developer@company.com
tags:
- authentication
- oauth
- security
ai_summary: |
OAuth 2.0 PKCE flow implementation for web clients.
Handles authorization, token exchange, refresh, and logout.
Integrates with identity provider via standard endpoints.Content Structure
# [Feature Name] - Existing-Fact Specification
## Overview
[1-2 paragraph summary of what this code does]
## System Boundaries
### Entry Points
- [API endpoints, function signatures]
### External Dependencies
- [Services, APIs, databases this code interacts with]
### Data Flow
[Mermaid diagram showing high-level flow]
## Behavioral Specification
### Core Behaviors
#### Behavior: [Name]
- **Trigger**: [What initiates this behavior]
- **Preconditions**: [Required state]
- **Process**: [What happens]
- **Postconditions**: [Resulting state]
- **Extracted from**: `file.ts:line`
### Error Handling
#### Error: [Name]
- **Condition**: [When this error occurs]
- **Response**: [How system responds]
- **Recovery**: [How to recover, if applicable]
## Constraints & Invariants
- [Rules that must always hold]
- [Limits and thresholds]
## Known Technical Debt
- [Acknowledged issues not to replicate]
## Verification Status
| Aspect | Status | Evidence |
|--------|--------|----------|
| Core flow | Verified | Unit tests, integration tests |
| Error handling | Partial | Some edge cases untested |
| Performance | Unverified | No benchmarks available |Layered Extraction Methodology
Four-Layer Approach
Extract specifications in layers of decreasing granularity, each providing increasing detail:
Layer Details
| Layer | What to Extract | Compression Target | When Needed |
|---|---|---|---|
| L1: Boundaries | API contracts, interface definitions, external integration points | 10:1 | Always |
| L2: Structure | Component responsibilities, inter-module dependencies, data flow | 20:1 | Most features |
| L3: Behaviors | Algorithmic patterns, state machines, error handling strategies | 5:1 | Complex logic |
| L4: Edge Cases | Validation rules, constraints, limits, known quirks | 3:1 | Critical paths |
Incremental Value
Layer 1 alone provides significant value. Each additional layer adds precision when needed:
Extraction Workflow
Three-Phase Process
Phase 1: Discovery
Objective: Understand codebase structure and identify extraction targets.
AI Tasks:
- Analyze directory structure and file organization
- Identify entry points (APIs, CLI commands, event handlers)
- Map module dependencies
- Detect architectural patterns (MVC, Clean Architecture, etc.)
- Generate discovery report for human review
Output:
## Discovery Report: [Codebase Name]
### Architecture Overview
[High-level description]
### Key Modules
| Module | Purpose | Dependencies | Priority |
|--------|---------|--------------|----------|
| auth | Authentication | db, identity-provider | High |
| api | REST endpoints | auth, services | High |
| ... | ... | ... | ... |
### Recommended Extraction Order
1. [Module] - [Reason]
2. [Module] - [Reason]
### Complexity Assessment
- Estimated extraction effort: [Hours/Days]
- High-complexity areas: [List]Phase 2: Extraction
Objective: Generate draft specifications from code.
AI Tasks:
- Read module code thoroughly
- Generate spec following layer methodology
- Cross-reference with tests for behavior verification
- Document uncertainties and assumptions
Human Tasks:
- Review generated specs for accuracy
- Correct misunderstandings
- Add context AI cannot infer (business rationale, historical decisions)
- Approve or request refinement
Iteration Pattern:
Phase 3: Validation
Objective: Establish spec trustworthiness for AI context consumption.
Validation Sources:
| Source | Confidence Boost | Evidence |
|---|---|---|
| Automated tests pass | +20% | Test coverage report |
| Human review complete | +30% | Reviewer sign-off |
| Production behavior matches | +30% | Monitoring/logs comparison |
| Original developer confirms | +20% | Developer approval |
Confidence Levels:
| Level | Criteria | AI Usage Guidance |
|---|---|---|
| High | Test-validated + human-reviewed | Use as authoritative context |
| Medium | Human-reviewed, partial test coverage | Use with caution, verify critical paths |
| Low | AI-generated draft, awaiting validation | Treat as hypothesis, verify before use |
Implementation Roadmap
Phase 1: Pilot Extraction (Week 1-2)
Goal: Extract specs for 1-2 high-value modules.
Deliverables:
- [ ] Select pilot modules (high-change-frequency, well-tested)
- [ ] Run discovery phase
- [ ] Generate Layer 1 + Layer 2 specs
- [ ] Validate with module owners
- [ ] Measure compression ratio
Success Criteria:
- Compression ratio >10:1
- Module owner confirms accuracy
- AI agent can use spec instead of reading code
Phase 2: Tooling & Templates (Week 3-4)
Goal: Establish repeatable extraction process.
Deliverables:
- [ ] Existing-fact spec template
- [ ] Extraction prompt library
- [ ] Validation checklist
- [ ] CI integration for spec freshness checks
Phase 3: Systematic Extraction (Week 5-8)
Goal: Extract specs for all high-priority modules.
Deliverables:
- [ ] Prioritized module list
- [ ] Extraction schedule
- [ ] Progress tracking dashboard
- [ ] Spec-to-code freshness monitoring
Phase 4: Continuous Maintenance (Ongoing)
Goal: Keep specs synchronized with code.
Deliverables:
- [ ] Change detection triggers re-extraction
- [ ] Spec diff on code changes
- [ ] Staleness alerts
- [ ] Periodic full refresh schedule
CLAUDE.md Integration
Add to project CLAUDE.md:
## Existing-Fact Specifications
### Purpose
Existing-fact specs document verified, implemented behavior.
Use these specs instead of reading source code when available.
### Location
- `docs/specs/existing-facts/` - Extracted specifications
- Each spec includes `extracted_from` references to source files
### Usage Priority
1. Check for existing-fact spec first
2. If spec exists and confidence is high, use spec as context
3. If spec is medium confidence, verify critical assumptions
4. If no spec exists or confidence is low, read source code
### When to Trigger Re-Extraction
If you modify code covered by an existing-fact spec:
1. Note the spec may be stale
2. Update the spec if change is significant
3. Mark spec as needs-review if uncertain
### Spec Quality Indicators
- `confidence: high` - Trust as authoritative
- `confidence: medium` - Verify critical paths
- `confidence: low` - Treat as hypothesis
- `compression_ratio` - Higher is better context efficiencyContext Compression Analysis
Compression Targets by Layer
| Layer | Code Example | Spec Equivalent | Ratio |
|---|---|---|---|
| L1: Boundaries | 500 lines of API handlers | 50 lines of endpoint specs | 10:1 |
| L2: Structure | 2000 lines across 10 modules | 100 lines of architecture | 20:1 |
| L3: Behaviors | 300 lines of algorithm | 60 lines of behavioral spec | 5:1 |
| L4: Edge Cases | 200 lines of validation | 70 lines of constraint spec | 3:1 |
Real-World Example
Before: Loading Source Code
Context tokens for authentication module:
- oauth.ts: 450 lines (~2000 tokens)
- token-manager.ts: 280 lines (~1200 tokens)
- auth-middleware.ts: 180 lines (~800 tokens)
- auth.test.ts: 520 lines (~2300 tokens)
Total: 1430 lines (~6300 tokens)After: Loading Existing-Fact Spec
Context tokens for authentication spec:
- ef-auth-oauth-flow.md: 120 lines (~500 tokens)
Total: 120 lines (~500 tokens)
Compression: 12:1
Token savings: 5800 tokens per sessionSuccess Metrics
| Metric | Target | How to Measure |
|---|---|---|
| Compression ratio (avg) | >10:1 | code_lines / spec_lines |
| Spec accuracy | >95% | Human review pass rate |
| Context reduction | >50% | Token usage before/after |
| AI task success rate | +20% | Compare with/without specs |
| Extraction efficiency | <2hr/module | Time tracking |
| Spec freshness | <30 days | updated date monitoring |
Anti-Patterns to Avoid
1. Extracting Everything
Problem: Attempting to spec every line of code.
Solution: Focus on high-value, high-change-frequency modules first. Not all code needs specs.
2. Specs Without Validation
Problem: Generating specs and assuming they're correct.
Solution: Every spec requires human review. Confidence levels communicate trustworthiness.
3. Stale Specs
Problem: Specs drift from implementation over time.
Solution: CI checks, freshness monitoring, change-triggered re-extraction.
4. Over-Detailed Specs
Problem: Specs that are as long as the code they describe.
Solution: Focus on abstraction. If compression ratio ❤️:1, spec is too detailed.
5. Ignoring Technical Debt
Problem: Extracting specs for code that "shouldn't be this way."
Solution: Document known debt in specs. Don't legitimize bad patterns by specifying them.
Frequently Asked Questions
When should we extract specs vs write new ones?
Extract when:
- Code exists but documentation doesn't
- Original developers unavailable
- Need to understand legacy system quickly
- Want to reduce context window usage
Write new when:
- Building new features
- Redesigning existing features
- Code doesn't exist yet
How do we handle code that violates its own patterns?
Document inconsistencies in the spec:
## Known Inconsistencies
- Module A uses pattern X
- Module B uses pattern Y for same purpose
- **Note**: This is legacy debt, not intentional designShould existing-fact specs live with code or separately?
Recommended: Separate docs/specs/existing-facts/ directory.
Rationale:
- Specs aggregate multiple files
- Easier to find and load as context
- Can have different review cadence
- Clear distinction from code comments
How often should we re-extract?
| Code Change Type | Re-extraction Needed |
|---|---|
| Bug fix | No (unless changes behavior) |
| Refactor (same behavior) | Maybe (update file references) |
| Feature addition | Yes (add to spec) |
| Behavior change | Yes (update spec) |
| Major rewrite | Yes (full re-extraction) |
Related Principles
- G1: Single Source of Truth - Extracted specs as canonical source
- G2: Version-Controlled Documentation - Specs in git
- C1: Context Engineering Competency - Context compression techniques
Related Proposals
- AI-DLC Mob Elaboration - Forward-looking spec creation
- Agent-Friendly Knowledge Base - Where specs should live
- Continuous Context Cleanup - Maintaining spec freshness
- Frontmatter Spec Coordination - Metadata schema for specs
Related: Agent-Friendly Knowledge Base | Back: Proposals Overview
References
- Claude Code Documentation - Official documentation for CLAUDE.md AI guidance files
- Effective Context Engineering for AI Agents - Anthropic's guide to context optimization
- OpenSpec - Specification-driven development framework
- Legacy Code Documentation Patterns - Write the Docs community resources