Harness Engineering for Enterprise AI

From sandbox definition to audit evidence

Seven layers. One control loop. ContactLab is not just a sandbox runner — it is the control loop for governed AI agent operations. This is how every run moves from informal usage to traceable, governed execution.

Seven architecture layers

The platform separates the application plane from the agent execution plane. Each layer has defined responsibilities and isolation boundaries. Together, they form the complete governance stack for AI agent operations.

Tenant application plane

Each customer receives isolated application infrastructure: dashboard, governance catalog, settings, users, roles, and usage visibility — all scoped to organizational boundaries. The application plane manages sandbox definitions, team policies, reviewer workflows, and evidence access. It never runs agent code.

Governance catalog

Reusable resources that eliminate the cold-start problem: base images, 600+ skills, prompt templates, 17 managed egress profiles, and connector definitions. Security pre-approves catalog resources. Teams self-serve within governed boundaries. Adoption starts in hours, not weeks.

Sandbox definition layer

A governed profile that defines what an agent can use before execution begins: runtime, base image, tools, skills, prompt templates, managed files, connector integrations, scoped secrets, and approved egress destinations. Every parameter is defined upfront. Nothing is left to runtime discretion.

Agent execution plane

A separate execution namespace for AI agents — Claude Code and Codex CLI — fully isolated from the application plane. Agents run in ephemeral runners that exist only for the task duration. The execution plane has no persistent access to data, credentials, or infrastructure beyond what the sandbox definition explicitly permits.

Event and policy pipeline

Execution events, policy decisions, tool invocations, file access, and runtime metadata are streamed through an event-driven pipeline into normalized session history. Every agent action becomes a queryable event. Policy blocks and approval requirements are captured in real time alongside the execution flow.

Evidence layer

Artifacts, logs, manifests, diffs, audit events, and review outcomes retained in tenant-scoped storage with configurable retention. Evidence survives the ephemeral runner. Reviewers see what changed, what policies applied, and who approved. Auditors get structured records without disrupting workflows.

Data flow: from agent run to evidence trail

Every governed run follows the same event-driven flow. The agent executes inside boundaries. Events are normalized in real time. Evidence is stored. The reviewer decides.

1 Sandbox definition loaded from catalog
2 Ephemeral runner provisioned
3 Agent executes inside boundaries
4 Events normalized in real time
5 Policy blocks and approvals captured
6 Artifacts captured and stored
7 Runner destroyed
8 Evidence retained for review
9 Usage metrics and audit trail updated

What gets captured

Session metadata: agent type, sandbox template, timestamp, duration, effort level. Tool invocations: every file read, every command executed, every network request. Policy decisions: what was allowed, what was denied, what triggered a violation, what required human approval. Artifacts: diffs, manifests, generated files. Review outcomes: approval, rejection, escalation. Usage signals: token consumption, cost metrics, team activity.

What does not persist

The ephemeral runner is destroyed after execution. No residual credentials. No persistent access. No cross-session contamination. Only artifacts and audit-relevant output survive in tenant-scoped storage. The evidence trail is complete without retaining any runtime state.

Security model

Execution isolation

Every agent run is treated as an untrusted workload. The execution plane is separate from the application plane. Runners are ephemeral — provisioned, used, destroyed. No agent code runs on shared infrastructure. No persistent state between runs. No cross-tenant access at any layer.

Network and credential boundaries

Default-deny networking. Scoped cloud identity per workload. Secrets injected only during execution and never persist beyond the session. Artifact-only retention — no runtime state survives. The security model assumes the agent is untrusted and enforces boundaries at the platform level, not at the agent level.

Human control layer

Governance is not just isolation — it's control. Approval queues, cancellation controls, reviewer workflows, and role-based access ensure that humans stay in the loop where it matters.

Approval queues

Route sensitive execution steps through human approval before the agent continues. Security and platform teams define what requires review. No agent action bypasses your approval workflow.

Cancellation controls

Cancel runaway runs in real time. Keep execution inside predefined boundaries. Monitor live run status, tool activity, and policy blocks. Intervene when behavior deviates from expectations.

Role-based access

Control who can define sandboxes, launch runs, approve actions, and review evidence. Tenant-scoped login, permissions, and user administration. Every action is attributable to a named user with defined authority.

Discuss your architecture requirements

Book a 30-minute architecture review. We'll walk through the reference architecture, map it to your environment, and identify the governance gaps that matter most. Whether your teams are in engineering, legal, finance, or operations — the architecture scales.