May 2026

runtime guardrails for ai agents

Guardrails for agents need to move from advice to enforcement. The enforcement point is the moment before an agent action becomes a side effect.

A chatbot can be made safer by improving prompts, classifiers, and output filters. A production agent needs something more specific: a runtime decision before it touches a tool, model, database, memory store, MCP server, file, secret, or another agent.

This is the distinction between model safety and agent safety. Model safety asks whether a model response is acceptable. Agent safety asks whether the system should allow a particular action by a particular agent, for a particular principal and tenant, in a particular environment, with particular arguments.

The useful boundary is capability execution

The last safe moment to block a dangerous action is before the capability executes. After execution, the email may already be sent, the row may already be deleted, the refund may already be issued, or the secret may already have left the trust boundary.

agent attempts action
  -> runtime builds PolicyContext
  -> policy returns Decision
  -> action runs or is blocked

Brane names this loop explicitly: Capability + AgentAction + PolicyContext → Policy → Decision. That formula matters because it gives teams a vocabulary for where enforcement lives.

Prompt guardrails, output filters, and runtime guardrails

The word guardrails often gets compressed into one category, but the controls sit at different points in the agent lifecycle. Prompt guardrails shape the model's behavior before generation. Output filters inspect text after generation. Runtime guardrails evaluate the attempted action at the point where the system would otherwise touch a capability.

Layer	Runs when	Good at	Weak at
Prompt guardrails	Before generation	Stating intended behavior and constraints	Guaranteeing tool behavior under pressure
Output filters	After generation	Checking text before it reaches a user	Preventing side effects that already happened
IAM and app auth	Before resource access	Coarse identity and resource permissions	Understanding agent-specific intent and arguments
Runtime guardrails	Before or after capability execution	Deciding whether an attempted agent action should execute	Replacing lower-level security controls

The point is not to replace prompts, IAM, secrets management, or database permissions. The point is to add the layer that understands the agent action itself. Brane's docs describe this as AI agent guardrails and capability control, with a direct comparison in Brane vs Prompt Guardrails.

A concrete failure scenario

Imagine a support agent with a prompt that says: only issue refunds below $100 unless a human has approved the request. A user complains about a $349 purchase. The model reasons that the customer is unhappy, decides a refund is appropriate, and calls refund_customerwith amount_usd=349.

An output filter might later prevent the model from saying something unsafe, but the refund already happened. A runtime policy can block the action before the payment provider is called:

@runtime.before_capability("refund_customer")
def refund_limit(ctx):
    if ctx.arg("amount_usd", 0) > 100:
        return Decision(type="deny", reason="Refund requires approval")
    return Decision(type="allow")

This is the practical difference. Prompt guardrails tell the model what should happen. Runtime guardrails decide what is allowed to happen at the action boundary.

What a runtime guardrail needs to know

A useful runtime decision cannot be based only on a function name. A policy needs context:

Which agent is acting?
Which user or service is it acting for?
Which tenant owns the data?
Is this development, staging, or production?
What capability is being attempted?
How risky is that capability?
What arguments are being passed?
What output came back, for after-execution checks?

This is why Brane wraps attempts as AgentActions and exposes a PolicyContext instead of asking every tool implementation to invent its own safety model.

Example policies

A runtime guardrail can be simple and still valuable:

@runtime.before_capability("*")
def block_high_risk_in_prod(ctx):
    if ctx.is_prod and ctx.is_high_risk:
        return Decision(type="deny", reason="High-risk action blocked in prod")
    return Decision(type="allow")

Another common pattern is data access:

@runtime.before_capability("database.customer_query")
def read_only_sql(ctx):
    query = ctx.arg("query", "").strip().lower()
    if not query.startswith("select"):
        return Decision(type="deny", reason="Only SELECT queries are allowed")
    return Decision(type="allow")

These are not prompt rules. They are application policies evaluated at the boundary where the action would otherwise execute. See before-capability policies and after-capability policies for the concrete runtime stages.

Where this fits today

Brane Core is currently a local, framework-independent Python runtime with synchronous callables, allow/deny decisions, and local audit events. The current implementation is intentionally small; the direction is broader runtime control across frameworks, MCP, memory, retrieval, model routing, approvals, and cloud audit. The implementation status is tracked in the Current Status page.

The threat model changes when systems become agents

A traditional application exposes a known set of routes, jobs, queues, and service calls. A production agent has a less predictable execution path. It can choose which tool to call, decide whether to retrieve data, loop through intermediate steps, call another model, or delegate to another agent. That flexibility is the value of agents, and also the reason static authorization alone does not capture the whole risk.

OWASP's agentic AI work frames the problem as a new attack surface around tool use, memory, planning, multi-agent coordination, privilege misuse, and cascading behavior. NIST's AI Risk Management Framework similarly pushes teams to govern, map, measure, and manage AI risk rather than treating safety as a one-time model choice. Runtime guardrails are how those governance ideas become an enforcement point in a running agent system.

Runtime guardrails are not one thing

The phrase can mean several different controls. A useful production architecture usually combines them:

Input guardrails: validate user requests before an agent starts work.
Planning guardrails: inspect a proposed plan before execution.
Tool guardrails: decide whether a specific tool call should run.
Capability guardrails: govern all action surfaces, including tools, models, memory, retrieval, MCP, files, and secrets.
Output guardrails: inspect returned content before it reaches a user or downstream system.
Audit guardrails: record enough context to explain and review decisions later.

Brane is deliberately centered on capability guardrails. Tool calls are included, but the abstraction is broader because agents do more than call tools. A model call can create cost or data residency risk. A memory write can poison future context. A retrieval call can cross a tenant boundary. An MCP tool can hide a powerful external operation behind a friendly name.

What mature teams should require

A runtime guardrail layer should satisfy a practical checklist:

Policies are explicit code, not hidden prompt text.
Every decision has an action id and a policy reason.
Policies can inspect arguments, agent identity, tenant, environment, and capability risk.
High-risk actions can be blocked before execution.
After-execution policies can inspect returned output.
The system is framework-independent enough to survive agent framework changes.
Audit events are useful to security, engineering, and compliance reviewers.

If a guardrail system cannot answer why an action was allowed, which policy evaluated it, and what context the policy saw, then it is not yet a production control. It is a hint.

Choosing the right control for the failure

If the failure is that a model produces offensive or irrelevant text, use prompt design, model-side safety, evaluations, and output filters. If the failure is that an agent calls the wrong tool with the wrong arguments, use runtime policy. If the failure is that an infrastructure credential has too much power, fix IAM and secret scoping. If the failure is that nobody can explain what happened, improve audit and traces.

Confusing these layers leads to brittle systems. A prompt cannot fix an over-permissioned database credential. A database permission cannot decide whether a support agent should refund a specific customer in a specific conversation. An output filter cannot undo a side effect.

Where to put the boundary in code

The cleanest implementation point is the function or adapter boundary just before external work happens. That can be a Python function exposed as an agent tool, an MCP adapter call, a database helper, a model router, a retrieval function, or a workflow executor. The point is to wrap the capability once and let policies live centrally.

This keeps the agent planner flexible while making the action layer governed. The model can still propose work. The runtime decides what actually executes.

A maturity model for runtime guardrails

Teams usually move through four stages. The first stage is informal: prompts tell the agent not to do dangerous things, and engineers watch demos closely. The second stage adds tool-level checks: individual functions refuse obviously invalid requests. The third stage centralizes policy: every action passes through one runtime policy layer. The fourth stage adds organization-level control: policies are versioned, audited, tested, promoted across environments, and reviewed like other production controls.

Brane Core is aimed at the third stage today. It gives teams a local runtime for registering capabilities, writing before and after policies, enforcing allow/deny decisions, and recording audit events. The cloud and dashboard direction is aimed at the fourth stage: shared policy bundles, approval workflows, cross-agent traces, and centralized review.

What not to put in runtime policy

Runtime guardrails should not become a dumping ground for every security concern. Database permissions should still live in the database. Network boundaries should still live in infrastructure. Secrets should still be managed by a secret manager. Application authorization should still enforce user and service permissions.

The runtime policy layer should focus on the agent-specific question: given all of those lower-level controls, should this agent action be attempted in this context? Keeping that boundary clear makes the policy layer understandable instead of turning it into another overloaded authorization system.

This is the kind of problem we're solving with Brane — policy-as-code for AI agents. Write Python policies that run before agent actions execute. Block or allow high-risk capability use, with an audit-ready decision trace.

See Brane →Agent guardrails guide →Capability control →Apply for private beta →