Harness Engineering

This proposal introduces harness engineering as a first-class discipline within the AI Workflow Conduction framework. A harness is everything around an LLM that is not the model itself: the tools it can call, the loop that decides when to stop, the memory it keeps between steps, the guardrails that block dangerous actions, and the sensors that verify its output. The working identity is:

Agent = Model + Harness

Core Insight: As frontier models converge in raw capability, the harness around them becomes the differentiator. A capable model with a poor harness stalls, picks the wrong tool, or runs past its permission boundary. A well-designed harness gives the same model dramatically higher reliability on the same task.

Problem Statement

The Ceiling of "More Context"

Context §1.2 already establishes that specification chaos, review velocity gaps, and role confusion limit AI effectiveness. The response across AI-First Context Infrastructure and Agent-Friendly Knowledge Base has been to give agents better context.

Better context raises the floor. It does not fix every failure mode.

A capable model with a full, well-curated context still fails in ways that context alone cannot address:

Failure mode	Symptom	Why context alone will not fix it
Agent stall	Loop never terminates; agent repeats the same tool call	The harness, not the context, controls loop termination
Wrong tool	Agent picks a generic tool when a specific one exists	Tool granularity is a harness design choice
Permission breach	Agent takes a destructive action without confirmation	Permission boundaries live in the harness, not the prompt
Silent drift	Output looks plausible; nobody catches the regression	Sensors (tests, reviewers) are a harness concern
Over-long session	Context window exhausted mid-task	Memory management and summarization are harness features

Three Paradigms, Not One

Prompt engineering, context engineering, and harness engineering address three different failure surfaces. They compound; they do not replace each other.

Knowledge compounds across layers. Yesterday's prompt techniques live inside today's SDKs. Today's context expertise informs tomorrow's harness design. A team jumping straight to "adopt harness engineering" without a working context layer will hit the same problems the context-engineering proposals were written to solve.

Definition

Agent Equals Model Plus Harness

Martin Fowler frames the identity directly in Harness engineering for coding agent users:

"Harness engineering refers to everything in an AI coding agent except the model itself."

Parallel Web Systems extends the definition beyond coding:

"The harness is what connects an AI model to the outside world, enabling it to use tools, remember information between steps, and interact with complex environments."

The harness has five observable components:

Tools — the callable surface the model reaches through. Coarse-grained or fine-grained, domain-general or domain-specific.
Loop control — stop conditions, escalation triggers, budget limits, multi-agent coordination.
Memory and state — working context, session log, long-term memory, summarization and retrieval.
Guardrails — permission boundaries, schema validation, safety filters.
Sensors — tests, linters, type checkers, review agents, runtime monitors that observe output after the agent acts.

Guides and Sensors

Fowler splits a harness into two control types based on when they act:

Guides shape output before the agent acts. They raise the probability of a good first attempt.

Sensors observe output after the agent acts. They catch bad attempts and feed the signal back into the loop.

Computational and Inferential Controls

A second axis cuts across Guides and Sensors: how the control itself runs.

Type	Latency	Cost	Determinism	Examples
Computational	Milliseconds to seconds	Near zero	Deterministic	Linters, type checkers, unit tests, schema validators
Inferential	Seconds to minutes	LLM call	Non-deterministic	Review agents, LLM-as-judge, semantic-drift detectors

Computational controls catch mechanical problems reliably. Inferential controls add semantic judgment where mechanical rules cannot express intent. A mature harness uses both.

Operating System Analogy

A common framing borrowed from the Chinese-language analysis by KodeLAB:

Model is the CPU — raw computation.
Context window is RAM — working memory, bounded.
Harness is the operating system — schedules work, manages resources, mediates access to tools and memory.

A naked LLM is a CPU without an OS. It can compute. It cannot do useful work on its own.

Proposed Solution

Adopt a shared organizational harness layered over the model. Three layers:

Layer 1: Base Harness

A standard agent runtime configuration shared across all teams.

Element	Recommended Default
Agent runtime	Claude Code (or an equivalent with skills, hooks, MCP support)
Permission defaults	Minimal — no write or execute without explicit allow-list
Hooks	Pre-tool-use hook enforcing the project `AGENTS.md` boundary
Settings	Committed `.claude/settings.json` per project
Model	Latest generally available frontier model, unless a project pins otherwise

The point of Layer 1 is not to pick the right runtime once and stop. It is to make the choice explicit and the configuration shared, so that every team inherits the same defaults and diverges only with justification.

Layer 2: Organizational Guides and Sensors

The existing proposals in this chapter are the organization's guides and sensors. Harness engineering is the framing that ties them together.

Role	Component	Existing Proposal
Guide	AI-accessible context surface	AI-First Context Infrastructure
Guide	Markdown knowledge base	Agent-Friendly Knowledge Base
Guide	Shared Claude Skills	Claude Skills Adoption
Guide	Specification retrieval	Internal Spec Platform
Guide	Terminology constraints	Ubiquitous Language
Guide	Existing-fact specs	Spec Extraction
Guide	Requirement source of truth	Global Requirement Store
Guide	Component inventory	Design System
Guide	Component ownership model	shadcn/ui Foundation
Guide	Spec hierarchy	Multi-Product Spec Management
Guide	Document graph	Frontmatter Spec Coordination
Guide	Project AI guidance file	CLAUDE.md Standards (planned)
Sensor	Linters and type checkers	Tooling baseline
Sensor	Continuous cleanup review	Continuous Context Cleanup
Sensor	Tech stack alignment	Tech Radar and Roadmaps
Loop Control	AI-first decision points	AI-First Decision Making
Loop Control	Elaboration sessions	AI-DLC Mob Elaboration

Layer 3: The Steering Loop

When the same failure mode recurs, iterate the harness, not the prompt.

Diagnostic rubric:

Symptom	Likely layer
Wrong output shape, formatting drift	Prompt
Missing fact, outdated reference	Context
Agent stall, wrong tool, permission breach, silent regression	Harness

The steering loop has a named owner. Harness changes go through a lightweight review, the same as any other infrastructure change.

Implementation Roadmap

Four phases, staged to avoid the "shipped the harness, nobody uses it" failure.

Phase 1: Baseline the Harness

Deliverables:

Standard .claude/settings.json committed in a reference repository.
AGENTS.md template published.
Permission hook blocking unauthorized write or execute actions.
One-page "What runs on your machine" doc for every engineer.

Exit criteria:

Every active project has an AGENTS.md file.
Default permission boundary is enforced by a pre-tool-use hook.

Phase 2: Seed the Guides

Deliverables:

Shared skill library (see Claude Skills Adoption).
Knowledge base migrated to Git-backed Markdown (see Agent-Friendly Knowledge Base).
Ubiquitous language glossary published (see Ubiquitous Language).

Exit criteria:

Two or more teams are consuming shared skills.
Agents can retrieve domain knowledge without manual paste.

Phase 3: Seed the Sensors

Deliverables:

CI-integrated linters and type checkers on every repository.
LLM-as-judge review pipeline for PRs above a size threshold.
Continuous context cleanup process running (see Continuous Context Cleanup).

Exit criteria:

Sensor signal is written back into the agent loop (not just human dashboards).
At least one class of regression has been caught by sensors pre-merge.

Phase 4: Institutionalize the Steering Loop

Deliverables:

Named owner for the organizational harness.
Monthly harness retro reviewing recurring failure modes.
Documented cycle time target: issue observed to harness updated.

Exit criteria:

Harness changes are tracked in the same backlog as product work.
Recurrence rate of named failure modes is trending down month over month.

Success Metrics

Metric	Target	How to Measure
Recurrence rate of named failure modes	Declining month over month	Track tagged issues per failure class
Skill reuse rate across teams	> 50% of active skills used by more than one team	Skill invocation logs
Agent sessions requiring human rescue	< 10% of sessions per week	Session telemetry or self-report
Issue-to-harness-update cycle time	< 1 sprint (median)	Timestamp from issue open to harness change merged
Shared AGENTS.md adoption	100% of active projects	File presence check in CI

Anti-Patterns

These are failure patterns observed in teams adopting agent frameworks without a harness discipline.

Anti-pattern	What it looks like	Why it backfires
Prompt stuffing	Every new failure responds with more text in the system prompt	Prompts grow unreadable; context budget shrinks; root cause is usually a missing tool or sensor
Context bloat	Every new failure responds with more documents piped into context	Signal-to-noise drops; model output quality declines
Harness sprawl	Multiple competing skills, hooks, or MCP servers that overlap	Agents pick the wrong one; maintenance burden compounds
Orphan harness	Harness exists, no named owner, nobody updates it	Drift accumulates silently; teams quietly stop using it
Single-layer thinking	Treating harness as a replacement for context engineering	Missing knowledge still produces wrong code; the three layers compound, they do not substitute

CLAUDE.md Integration

A project's CLAUDE.md (or AGENTS.md) is the primary Guide at the project level. At minimum it declares:

Which skills are in scope for this project.
Which tools the agent is permitted to invoke without confirmation.
Where to find the project's specification, design system, and knowledge base.
What the steering-loop owner expects to be notified about.

See CLAUDE.md Standards (planned) for the full schema.

Role in Harness	Proposal
Guide	AI-First Context Infrastructure
Guide	Agent-Friendly Knowledge Base
Guide	Claude Skills Adoption
Guide	Internal Spec Platform
Guide	Ubiquitous Language
Guide	Spec Extraction
Guide	Global Requirement Store
Guide	Design System
Guide	shadcn/ui Foundation
Guide	Multi-Product Spec Management
Guide	Frontmatter Spec Coordination
Sensor	Continuous Context Cleanup
Sensor	Tech Radar and Roadmaps
Loop Control	AI-First Decision Making
Loop Control	AI-DLC Mob Elaboration

References

Martin Fowler, Harness engineering for coding agent users — https://martinfowler.com/articles/exploring-gen-ai/harness-engineering.html
Parallel Web Systems, What is an agent harness in the context of large-language models? — https://parallel.ai/articles/what-is-an-agent-harness
KodeLAB, Harness Engineering: AI Agent 從提示詞工程、上下文工程演進的新顯學 — https://klab.tw/2026/04/from-prompt-to-harness-engineering/
ABMedia, Harness Engineering 是什麼? AI 的下一個戰場不是模型，而是模型外面的那層架構 — https://abmedia.io/harness-engineering-ai-agent-framework-explained
awesome-harness-engineering — https://github.com/ai-boost/awesome-harness-engineering
YouTube talk, Harness Engineering: 有時候語言模型不是不夠聰明，只是沒有人類好好引導 — https://www.youtube.com/watch?v=R6fZR_9kmIw
Anthropic, Effective Context Engineering for AI Agents — https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

Workflow Framework

Guiding Principles

Governance

Culture

Execution

Harness Engineering

Problem Statement

The Ceiling of "More Context"

Three Paradigms, Not One

Definition

Agent Equals Model Plus Harness

Guides and Sensors

Computational and Inferential Controls

Operating System Analogy

Proposed Solution

Layer 1: Base Harness

Layer 2: Organizational Guides and Sensors

Layer 3: The Steering Loop

Implementation Roadmap

Phase 1: Baseline the Harness

Phase 2: Seed the Guides

Phase 3: Seed the Sensors

Phase 4: Institutionalize the Steering Loop

Success Metrics

Anti-Patterns

CLAUDE.md Integration

References

Harness Engineering ​

Problem Statement ​

The Ceiling of "More Context" ​

Three Paradigms, Not One ​

Definition ​

Agent Equals Model Plus Harness ​

Guides and Sensors ​

Computational and Inferential Controls ​

Operating System Analogy ​

Proposed Solution ​

Layer 1: Base Harness ​

Layer 2: Organizational Guides and Sensors ​

Layer 3: The Steering Loop ​

Implementation Roadmap ​

Phase 1: Baseline the Harness ​

Phase 2: Seed the Guides ​

Phase 3: Seed the Sensors ​

Phase 4: Institutionalize the Steering Loop ​

Success Metrics ​

Anti-Patterns ​

CLAUDE.md Integration ​

Related Proposals ​

References ​

Harness Engineering

Problem Statement

The Ceiling of "More Context"

Three Paradigms, Not One

Definition

Agent Equals Model Plus Harness

Guides and Sensors

Computational and Inferential Controls

Operating System Analogy

Proposed Solution

Layer 1: Base Harness

Layer 2: Organizational Guides and Sensors

Layer 3: The Steering Loop

Implementation Roadmap

Phase 1: Baseline the Harness

Phase 2: Seed the Guides

Phase 3: Seed the Sensors

Phase 4: Institutionalize the Steering Loop

Success Metrics

Anti-Patterns

CLAUDE.md Integration

Related Proposals

References