Skip to content

Reverse-Engineering Spec Extraction

Existing codebases contain implicit specifications embedded in their implementations. When AI agents work on legacy systems, they lack structured context about what the system is, leading to repeated code exploration, context window waste, and inconsistent understanding across sessions. Spec Extraction introduces a methodology for reverse-engineering specifications from code to create "existing-fact" context.

Key Insight: Specifications extracted from implementations serve as compressed, authoritative context. A well-written spec is 5-20x smaller than the code it describes, dramatically reducing context window usage while preserving the essential knowledge AI agents need.

Problem Statement

Current State: Ad-hoc Code Exploration

Pain Points from Missing Specs

IssueImpact on AI Development
No formal specAI re-explores code every session
Tribal knowledgeImplementation decisions locked in developers' heads
Context window wasteMust load entire files to understand behavior
Inconsistent understandingDifferent AI sessions interpret code differently
Onboarding frictionNew AI agents (and humans) start from zero

Real-World Examples

Repeated Discovery:

AI agent needs to modify authentication flow. Each session, it reads 15 files to understand the flow. The same ~2000 lines are loaded repeatedly because no spec exists documenting the authentication architecture.

Lost Institutional Knowledge:

Original developer leaves. The codebase has no documentation. New team members and AI agents must reverse-engineer intent from implementation, often guessing incorrectly.

Context Overflow:

Feature touches 8 modules. Loading all code exceeds context window. Without specs, AI cannot get a high-level view and makes changes that break undocumented invariants.

Proposal: Existing-Fact Specifications

Target State

What is an Existing-Fact Spec?

An existing-fact specification documents verified, implemented behavior extracted from existing code. Unlike forward-looking requirements that describe what should be built, existing-fact specs describe what is built.

Key Distinction: Existing-Fact vs Requirement

AspectRequirement SpecExisting-Fact Spec
SourceBusiness needs, user storiesImplemented code
PurposeDefine what to buildDocument what exists
AuthorityNormative (should match)Descriptive (does match)
CreationBefore implementationAfter implementation
ValidationTests verify implementationImplementation is the truth
Use CaseGreenfield developmentLegacy system understanding

Existing-Fact Spec Format

Frontmatter Schema

yaml
id: ef-auth-oauth-flow
title: OAuth 2.0 Authentication Flow
type: existing-fact
status: verified
version: 1.0.0
created: 2025-12-02
updated: 2025-12-02
extracted_from:
  - src/auth/oauth.ts
  - src/auth/token-manager.ts
  - src/middleware/auth.ts
extraction_method: ai-assisted
confidence: high
verified_by:
  - test-suite
  - human-review
compression_ratio: 12:1
authors:
  - extraction-agent
reviewers:
  - senior-developer@company.com
tags:
  - authentication
  - oauth
  - security
ai_summary: |
  OAuth 2.0 PKCE flow implementation for web clients.
  Handles authorization, token exchange, refresh, and logout.
  Integrates with identity provider via standard endpoints.

Content Structure

markdown
# [Feature Name] - Existing-Fact Specification

## Overview
[1-2 paragraph summary of what this code does]

## System Boundaries
### Entry Points
- [API endpoints, function signatures]

### External Dependencies
- [Services, APIs, databases this code interacts with]

### Data Flow
[Mermaid diagram showing high-level flow]

## Behavioral Specification

### Core Behaviors
#### Behavior: [Name]
- **Trigger**: [What initiates this behavior]
- **Preconditions**: [Required state]
- **Process**: [What happens]
- **Postconditions**: [Resulting state]
- **Extracted from**: `file.ts:line`

### Error Handling
#### Error: [Name]
- **Condition**: [When this error occurs]
- **Response**: [How system responds]
- **Recovery**: [How to recover, if applicable]

## Constraints & Invariants
- [Rules that must always hold]
- [Limits and thresholds]

## Known Technical Debt
- [Acknowledged issues not to replicate]

## Verification Status
| Aspect | Status | Evidence |
|--------|--------|----------|
| Core flow | Verified | Unit tests, integration tests |
| Error handling | Partial | Some edge cases untested |
| Performance | Unverified | No benchmarks available |

Layered Extraction Methodology

Four-Layer Approach

Extract specifications in layers of decreasing granularity, each providing increasing detail:

Layer Details

LayerWhat to ExtractCompression TargetWhen Needed
L1: BoundariesAPI contracts, interface definitions, external integration points10:1Always
L2: StructureComponent responsibilities, inter-module dependencies, data flow20:1Most features
L3: BehaviorsAlgorithmic patterns, state machines, error handling strategies5:1Complex logic
L4: Edge CasesValidation rules, constraints, limits, known quirks3:1Critical paths

Incremental Value

Layer 1 alone provides significant value. Each additional layer adds precision when needed:

Extraction Workflow

Three-Phase Process

Phase 1: Discovery

Objective: Understand codebase structure and identify extraction targets.

AI Tasks:

  1. Analyze directory structure and file organization
  2. Identify entry points (APIs, CLI commands, event handlers)
  3. Map module dependencies
  4. Detect architectural patterns (MVC, Clean Architecture, etc.)
  5. Generate discovery report for human review

Output:

markdown
## Discovery Report: [Codebase Name]

### Architecture Overview
[High-level description]

### Key Modules
| Module | Purpose | Dependencies | Priority |
|--------|---------|--------------|----------|
| auth | Authentication | db, identity-provider | High |
| api | REST endpoints | auth, services | High |
| ... | ... | ... | ... |

### Recommended Extraction Order
1. [Module] - [Reason]
2. [Module] - [Reason]

### Complexity Assessment
- Estimated extraction effort: [Hours/Days]
- High-complexity areas: [List]

Phase 2: Extraction

Objective: Generate draft specifications from code.

AI Tasks:

  1. Read module code thoroughly
  2. Generate spec following layer methodology
  3. Cross-reference with tests for behavior verification
  4. Document uncertainties and assumptions

Human Tasks:

  1. Review generated specs for accuracy
  2. Correct misunderstandings
  3. Add context AI cannot infer (business rationale, historical decisions)
  4. Approve or request refinement

Iteration Pattern:

Phase 3: Validation

Objective: Establish spec trustworthiness for AI context consumption.

Validation Sources:

SourceConfidence BoostEvidence
Automated tests pass+20%Test coverage report
Human review complete+30%Reviewer sign-off
Production behavior matches+30%Monitoring/logs comparison
Original developer confirms+20%Developer approval

Confidence Levels:

LevelCriteriaAI Usage Guidance
HighTest-validated + human-reviewedUse as authoritative context
MediumHuman-reviewed, partial test coverageUse with caution, verify critical paths
LowAI-generated draft, awaiting validationTreat as hypothesis, verify before use

Implementation Roadmap

Phase 1: Pilot Extraction (Week 1-2)

Goal: Extract specs for 1-2 high-value modules.

Deliverables:

  • [ ] Select pilot modules (high-change-frequency, well-tested)
  • [ ] Run discovery phase
  • [ ] Generate Layer 1 + Layer 2 specs
  • [ ] Validate with module owners
  • [ ] Measure compression ratio

Success Criteria:

  • Compression ratio >10:1
  • Module owner confirms accuracy
  • AI agent can use spec instead of reading code

Phase 2: Tooling & Templates (Week 3-4)

Goal: Establish repeatable extraction process.

Deliverables:

  • [ ] Existing-fact spec template
  • [ ] Extraction prompt library
  • [ ] Validation checklist
  • [ ] CI integration for spec freshness checks

Phase 3: Systematic Extraction (Week 5-8)

Goal: Extract specs for all high-priority modules.

Deliverables:

  • [ ] Prioritized module list
  • [ ] Extraction schedule
  • [ ] Progress tracking dashboard
  • [ ] Spec-to-code freshness monitoring

Phase 4: Continuous Maintenance (Ongoing)

Goal: Keep specs synchronized with code.

Deliverables:

  • [ ] Change detection triggers re-extraction
  • [ ] Spec diff on code changes
  • [ ] Staleness alerts
  • [ ] Periodic full refresh schedule

CLAUDE.md Integration

Add to project CLAUDE.md:

markdown
## Existing-Fact Specifications

### Purpose
Existing-fact specs document verified, implemented behavior.
Use these specs instead of reading source code when available.

### Location
- `docs/specs/existing-facts/` - Extracted specifications
- Each spec includes `extracted_from` references to source files

### Usage Priority
1. Check for existing-fact spec first
2. If spec exists and confidence is high, use spec as context
3. If spec is medium confidence, verify critical assumptions
4. If no spec exists or confidence is low, read source code

### When to Trigger Re-Extraction
If you modify code covered by an existing-fact spec:
1. Note the spec may be stale
2. Update the spec if change is significant
3. Mark spec as needs-review if uncertain

### Spec Quality Indicators
- `confidence: high` - Trust as authoritative
- `confidence: medium` - Verify critical paths
- `confidence: low` - Treat as hypothesis
- `compression_ratio` - Higher is better context efficiency

Context Compression Analysis

Compression Targets by Layer

LayerCode ExampleSpec EquivalentRatio
L1: Boundaries500 lines of API handlers50 lines of endpoint specs10:1
L2: Structure2000 lines across 10 modules100 lines of architecture20:1
L3: Behaviors300 lines of algorithm60 lines of behavioral spec5:1
L4: Edge Cases200 lines of validation70 lines of constraint spec3:1

Real-World Example

Before: Loading Source Code

Context tokens for authentication module:
- oauth.ts: 450 lines (~2000 tokens)
- token-manager.ts: 280 lines (~1200 tokens)
- auth-middleware.ts: 180 lines (~800 tokens)
- auth.test.ts: 520 lines (~2300 tokens)
Total: 1430 lines (~6300 tokens)

After: Loading Existing-Fact Spec

Context tokens for authentication spec:
- ef-auth-oauth-flow.md: 120 lines (~500 tokens)
Total: 120 lines (~500 tokens)

Compression: 12:1
Token savings: 5800 tokens per session

Success Metrics

MetricTargetHow to Measure
Compression ratio (avg)>10:1code_lines / spec_lines
Spec accuracy>95%Human review pass rate
Context reduction>50%Token usage before/after
AI task success rate+20%Compare with/without specs
Extraction efficiency<2hr/moduleTime tracking
Spec freshness<30 daysupdated date monitoring

Anti-Patterns to Avoid

1. Extracting Everything

Problem: Attempting to spec every line of code.

Solution: Focus on high-value, high-change-frequency modules first. Not all code needs specs.

2. Specs Without Validation

Problem: Generating specs and assuming they're correct.

Solution: Every spec requires human review. Confidence levels communicate trustworthiness.

3. Stale Specs

Problem: Specs drift from implementation over time.

Solution: CI checks, freshness monitoring, change-triggered re-extraction.

4. Over-Detailed Specs

Problem: Specs that are as long as the code they describe.

Solution: Focus on abstraction. If compression ratio ❤️:1, spec is too detailed.

5. Ignoring Technical Debt

Problem: Extracting specs for code that "shouldn't be this way."

Solution: Document known debt in specs. Don't legitimize bad patterns by specifying them.

Frequently Asked Questions

When should we extract specs vs write new ones?

Extract when:

  • Code exists but documentation doesn't
  • Original developers unavailable
  • Need to understand legacy system quickly
  • Want to reduce context window usage

Write new when:

  • Building new features
  • Redesigning existing features
  • Code doesn't exist yet

How do we handle code that violates its own patterns?

Document inconsistencies in the spec:

markdown
## Known Inconsistencies
- Module A uses pattern X
- Module B uses pattern Y for same purpose
- **Note**: This is legacy debt, not intentional design

Should existing-fact specs live with code or separately?

Recommended: Separate docs/specs/existing-facts/ directory.

Rationale:

  • Specs aggregate multiple files
  • Easier to find and load as context
  • Can have different review cadence
  • Clear distinction from code comments

How often should we re-extract?

Code Change TypeRe-extraction Needed
Bug fixNo (unless changes behavior)
Refactor (same behavior)Maybe (update file references)
Feature additionYes (add to spec)
Behavior changeYes (update spec)
Major rewriteYes (full re-extraction)

Related: Agent-Friendly Knowledge Base | Back: Proposals Overview

References