Lab Notes
May 2026agentspolicy

The Control Boundary for Agent Systems

Published by Membrane Labs

Most people are still thinking about AI safety at the wrong boundary.

They ask: Did the model say something bad?

But agents create a different problem: Did the agent do something bad?

That shift matters. An LLM response is text. An agent action can be:

  • send an email
  • refund a customer
  • run SQL
  • write a file
  • call a model
  • read memory
  • query a vector store
  • invoke an MCP tool
  • open a pull request
  • run shell commands
  • use a credential
  • delegate to another agent

So the control boundary should not only be around prompts and outputs. It should be around capability use. The simplest useful model has five primitives:

  • Capability
  • AgentAction
  • PolicyContext
  • Policy
  • Decision

In compressed form:

Capability + AgentAction + PolicyContext → Policy → Decision

That is the policy-control loop.

A capability is what the agent can use.
An AgentAction is one attempted use of that capability.
PolicyContext is the policy author's view of the action.
Policy is the rule system that evaluates that context.
Decision is what the runtime enforces.

The question is not: Can this tool ever be called?

The real question is: Can this agent, acting for this principal, in this tenant, in this environment, use this capability, with these arguments, right now?

That is the boundary agent systems need.


Capability Is The Governed Thing

A capability is anything an agent can use. Not just a tool.

A capability can be:

  • tool.refund_customer
  • tool.send_email
  • model.gpt_4
  • database.customer_readonly
  • memory.support_cases
  • retriever.customer_docs
  • mcp.github.create_pr
  • sandbox.python_network_off
  • secret.stripe_api_key
  • agent.billing_specialist

The mental shift is simple: Do not give agents arbitrary tools. Give them governed capabilities.

A capability describes the thing behind the door:

  • name
  • type
  • risk
  • tenant
  • environment
  • required scopes
  • data namespace
  • primary effect
  • side effects

Example:

refund_customer
type: tool
risk: high
primary effect: financial state change
required scope: refunds:create
data namespace: billing.refunds

That metadata matters.

Without this layer, an agent just calls code.
With this layer, the runtime knows: This is a high-risk financial capability. This touches billing state. This requires delegated authority. This may need approval. This turns a function into a security object.

That is the first primitive.


Effects And Side Effects Matter

A capability has an effect. It may also have side effects.

The effect is the primary expected outcome.
The side effects are secondary consequences.

Example:

send_email
primary effect: send a message
side effects:
  create CRM activity
  notify customer
  expose human-visible content

Example:

run_shell
primary effect: execute command
side effects:
  write files
  access network
  consume compute

This distinction matters because policy does not always need to say allow or deny.

Sometimes policy should say: Allow the main effect, but disable this side effect.

send_email(..., log_to_crm=false)
run_code(..., network=false)
search(..., persist_history=false)

That is a better control surface.

The runtime should not only know what tool is being called.
It should know what the capability does. And what else it might cause.


AgentAction Is The Attempted Use

A capability is what exists.
An AgentAction is one attempted use of it.

That distinction matters.

The question is not: Can refund_customer ever be called?

The real question is: Can this agent, acting for this user (principal), in this tenant, in prod, refund this amount, right now?

That requires an action record. An AgentAction should answer:

  • Which agent?
  • Acting for which principal?
  • For which tenant?
  • In which environment?
  • Using which capability?
  • With what input?
  • Inside which trace?

Example:

support-agent
acting for user_123
in tenant_acme
in prod
wants to use refund_customer
with amount=249

That is much more useful than a tool name. Policy cannot make good decisions from:

tool_name = refund_customer

It needs the attempted use.

The action is the thing the runtime can inspect, authorize, modify, audit, or block.

That is the second primitive.


PolicyContext Is The Policy Author's View

PolicyContext is not the same thing as raw action data. It is the policy author's view of the action.

That difference matters.

The runtime may have a large internal record:

  • trace IDs
  • tool arguments
  • agent metadata
  • tenant metadata
  • workflow state
  • identity claims
  • credential bindings
  • environment flags
  • risk scores
  • side-effect metadata

The policy author should not have to deal with all of that directly.

PolicyContext is the clean interface. It should make policy feel natural:

ctx.agent_id
ctx.principal_id
ctx.tenant_id
ctx.capability
ctx.arg("amount")
ctx.is_prod
ctx.is_high_risk
ctx.has_side_effects
ctx.agent_has_scope("refunds:create")

The goal is not to make security teams write giant rule engines on day one.
The goal is to let developers express real application policy:

if ctx.is_prod and ctx.is_high_risk:
    require_review()

if ctx.arg("amount") > tenant_limit:
    deny()

if ctx.capability.data_namespace == "customer.pii":
    require_pii_scope()

Agent policy often needs business context:

  • tenant plan
  • customer tier
  • refund amount
  • data namespace
  • environment
  • current workflow state
  • side effects
  • principal identity
  • cost so far
  • approval history

That is hard to express if policy only sees:

tool_name + args

A good PolicyContext makes the right policy easy to write.

That is the third primitive.


Policy Is The Rule System

Policy is the thing that evaluates the PolicyContext. It is where the application says what is allowed.

Not in the prompt.
Not scattered across every tool implementation.
Not hidden inside ad hoc if-statements.

Policy should be a separate control layer.

It answers:

  • Is this agent allowed to use this capability?
  • Is this principal allowed to authorize this action?
  • Is this tenant allowed to access this namespace?
  • Is this environment safe for this capability?
  • Is the risk low enough to proceed?
  • Does this require approval?
  • Should this be sandboxed?
  • Should the input or output be transformed?

This is where agent control becomes real application logic.

Example:

support-agent can refund customers
but only in the support workflow
only for the current tenant
only up to the tenant limit
only if the principal has the right scope
only if the action is audited
and only above a threshold with human approval

That is not a prompt instruction. That is policy.

Policies can start simple. Exact matches. Wildcards. Local functions. Inline rules. A small policy file.

The first version does not need to be a giant governance platform. But policy should still be explicit. Because once policy is explicit, it can be tested. It can be audited. It can be reused. It can move from local code to a central policy system later.

That is the fourth primitive.


Decision Is The Runtime Contract

A Decision is the output of policy.

Start with:

  • allow
  • deny

That is enough to build the first control loop.

The agent asks to use a capability. The runtime creates an AgentAction. The runtime builds a PolicyContext. Policy evaluates it. Policy returns a Decision. The runtime enforces it.

The first version should be boring:

action comes in
policy runs
decision comes out

But the decision space will grow. Eventually policy needs more than allow or deny:

  • approval_required
  • redact
  • transform_input
  • transform_output
  • route
  • sandbox
  • log_only

Examples:

  • allow: execute the action
  • deny: block the action
  • approval_required: pause until a human approves
  • redact: hide sensitive fields before returning output
  • route: use a different model, tool, or capability
  • sandbox: execute with constrained network, filesystem, or time
  • log_only: allow but record the action for later review

The important part is that the decision is structured. A structured decision can be audited, composed, explained, and eventually served from a central policy system.

That is the fifth primitive.

Capability + AgentAction + PolicyContext → Policy → Decision

That is the control loop.


Access And Policy Are Two Gates

There are two different questions:

  • Access: Can this agent ever use this capability?
  • Policy: Should this specific use be allowed right now?

Access is more static.

support-agent can use refund_customer
research-agent cannot

Policy is contextual.

refund <= 100: allow
refund > 100: require review
refund > 500: deny

Do not collapse these into one thing too early.

A clean architecture separates:

  • Capability Registry: what exists
  • Access / Grants: who can use what
  • Policy Engine: what is allowed in this context

The registry defines the governed surface area.
Grants define who can reach which capabilities.
Policy decides whether this specific action should proceed.

That separation matters because agent systems change over time. New tools get added. New tenants get onboarded. New workflows appear. New approval rules emerge. If everything is hardcoded into tool wrappers, the system becomes brittle.

If capabilities, grants, and policies are separate, the runtime can evolve.


Put a programmable membrane around capability use.

The agent can still be powerful.
But every consequential action passes through a runtime that understands:

  • identity
  • tenant
  • environment
  • capability
  • effect
  • side effects
  • scope
  • data namespace
  • policy
  • decision
  • audit

That is how agent systems become governable.

The control boundary for agents is not the text they produce.
It is the capabilities they use.

That is what we are building at Membrane Labs: the policy-control layer for governing agent capabilities at runtime.

This is the kind of problem we're solving with Brane — policy-as-code for AI agents. Write Python policies that run before every agent action. Block, allow, or rewrite, with a full traced decision.

ro@membranelabs.org@MLabsResearch