AI Agents

Build Skills, Not Agents: Why Banks Should Ration Agentic Autonomy

Eric Lam

May 29, 2026 · 13 min read

The boundaries that matter in a bank are not agent boundaries — they are control boundaries.

Walk into many banking technology reviews in 2026 and you will hear the same ambition: agentic AI. Every line of business wants agents — a payments agent, a lending agent, a fraud agent, a complaints agent, a KYC agent. The instinct is understandable. Agents feel like the unit of intelligence, so more intelligence must mean more agents.

It is an understandable instinct, but in a regulated bank it is often the expensive one.

The pattern emerging across mature enterprise deployments is the opposite of agent proliferation. Build a small number of agents. Give them a large library of well-governed, well-tested skills. And recognise that the boundaries that matter in a bank are not agent boundaries at all — they are control boundaries, enforced by identity, policy, deterministic services, approval gates, and audit. Get that distinction right and your agentic system inherits your control framework instead of fighting it.

One principle organises everything that follows:

Autonomy is a risk budget. Spend it only where deterministic workflows, governed skills, and policy-controlled services cannot meet the business goal.

The decision rule that follows from it is short: default to skills and workflows; add agent autonomy only when the problem genuinely requires dynamic control; and enforce authority through control boundaries, not agent packaging. The rest of this article is what that means in a regulated bank.

Four words, used precisely

Most of the confusion in this space comes from a sloppy binary of “agents” versus everything else. The useful vocabulary has more than two words, and the distinctions carry the whole argument.

A tool (or API) is a single callable capability: get a balance, post a ledger entry, return a sanctions-screening result.

A skill is a reusable, governed package built around one or more tools — its instructions, procedures, parameter contracts, templates, and domain behaviour. A skill is the unit you approve, test, monitor, and reuse. “Sanctions screening” should be exposed as a governed capability family across the bank, with consistent contracts, policy, and audit semantics — not a behaviour each team reinvents. (A note on the word: vendors use “skill” in narrower ways — Anthropic’s “Agent Skills,” for instance, are folders of instructions and scripts an agent loads on demand. In this article I use skill in the enterprise-platform sense: a governed, reusable capability that may bundle tools, instructions, policies, templates, test sets, and audit contracts.)

A workflow is a predefined, mostly deterministic orchestration path. When the steps and their order are known in advance — as they are for most regulated processes — a workflow is the right tool, and it should be preferred over agentic autonomy. This matches current industry guidance, which is consistent on the point: use the simplest pattern that works, and reach for a model-driven agent only when the problem genuinely needs dynamic, open-ended control.

An agent is a model-driven planner that dynamically directs its own tool use to reach a goal. It reasons, holds context across a multi-step interaction, and decides which skills to call and when to stop. Autonomy is the thing an agent adds — and the thing you should ration.

Separately from all of these sits the control boundary: the point where identity, permission scope, accountable ownership, approval, and audit change. This is the boundary that governs a bank. It is enforced by infrastructure and policy — not by whether a capability happens to be packaged as an agent.

A clear sign of architectural overreach is when a diagram shows a “Balance Lookup Agent,” a “Search Agent,” or a “Sanctions Agent.” Looking up a balance, querying a store, returning a screening result — these are governed capabilities exposed as skills behind tools. Packaging them as agents adds orchestration hops, non-deterministic decision points, and cost, while delivering none of the autonomy that would justify an agent in the first place.

The control-boundary test

Here is the rule that resolves most design decisions:

Default to a skill or workflow. Add agent autonomy only where the problem genuinely needs dynamic control — and enforce authority with a control boundary wherever the right to act changes, regardless of how the capability is packaged.

The critical correction to the popular “authority boundary = agent” framing is this: an agent is not how you enforce authority in a bank. You do not make a payment safe by calling it “the payment agent.” You make it safe with least-privilege identity, policy checks, explicit customer consent, transaction limits, idempotency, applicable financial-crime and account controls, and a tamper-evident audit trail. Those are deterministic controls. An LLM-shaped boundary is exactly the wrong place to put the right to move money.

Consider a retail-banking servicing assistant fielding, “Did my salary land?” Answering it requires verifying identity, fetching recent transactions, and summarising — all read-only, all within the same customer-data trust domain. One agent, three skills. Now the customer says, “Move £500 to my savings.” Everything changes: this is state-changing, irreversible, and financial. But the right response is not to introduce a payment agent by default. The response is to route the request across a control boundary into a path with its own identity, its own approval gate, its own policy and limit checks, and its own audit record. That path may be implemented as a deterministic service, a workflow, or — if it genuinely needs dynamic reasoning — a separate agent. The packaging is an implementation choice. The control boundary is the requirement.

Make the handoff explicit. In the payment example, the servicing agent may understand the intent, collect the destination account, explain fees or limits, and prepare the request. But execution moves across the control boundary: the payment workflow revalidates identity, checks entitlement, applies limits, runs the applicable financial-crime and fraud controls, requests explicit confirmation, executes through an idempotent payment service, and writes an append-only, tamper-evident audit record. The agent assists; the controlled path acts.

And it is worth stating plainly: the agent can assist the decision path, but it cannot be the accountable owner of the control. Accountability remains with the business process, the system owner, and the control owner — a point risk, compliance, and audit will rightly insist on.

Apply this lens across a typical banking stack and most candidate “agents” resolve into skills or workflows:

Capability type	Default pattern	Agent autonomy?	Mandatory controls
Simple read-only lookup	Tool / skill	No	Auth, data entitlement, audit
Retrieval & summarisation	Skill	Limited	Source grounding, citation, leakage checks
Known regulated process	Workflow	Usually no	Policy engine, approvals, audit, exception handling
Recommendation affecting a customer outcome	Skill or decision service	Bounded only	Model-risk governance, explainability, human review where required
Credit / affordability decision	Deterministic workflow or decision service	No agent autonomy for the final decision	MRM, adverse-action evidence, fairness, audit
Payment / transaction	Deterministic service or workflow	No agent autonomy for execution	Consent, step-up auth, applicable fraud, AML, sanctions, limit, and account-control checks, idempotency
Open-ended investigation	Agent with governed skills	Yes, bounded	Context isolation, tool allowlist, traceability, human oversight

Note the change in the test itself. The question is not “does it write?” but “does it have material customer, financial, legal, or compliance impact?” That distinction matters because a recommendation with no direct side effect can still be regulated. Under the EU AI Act, AI systems used to evaluate creditworthiness or establish a credit score are classified high-risk (Annex III, point 5(b)), with a narrow carve-out for detecting financial fraud. A “credit recommendation skill” that influences a lending outcome is therefore not a low-stakes read; it carries governance obligations whether or not it executes the decision itself.

Govern skills like products

This is where the skills-first model becomes operationally powerful, and it is the part most teams skip. A skill is not a thin wrapper around an API. In a bank, a production skill is a product, and it needs the things products have: an owner, a versioned contract, a data classification, a permission model, a test set, documented failure modes, an audit schema, and a retirement path.

Concretely, that contract is something you can write down — a balance-lookup skill’s declaration might read:

skill: account-balance-lookup
owner: retail-banking-platform
version: 1.4.2
data_classification: confidential-customer-data
allowed_callers:
  - customer-servicing-agent
  - branch-assistant-agent
auth_scope:
  - accounts.read.balance
controls:
  - customer_identity_verified
  - account_entitlement_check
  - pii_redaction_for_logs
evaluation:
  - entitlement_accuracy
  - parameter_extraction_accuracy
  - refusal_accuracy
audit_fields:
  - user_id
  - agent_id
  - skill_version
  - policy_decision
  - source_system
  - timestamp

That is what “governed skill” means in practice: ownership, a version, a data classification, an explicit caller allowlist, a minimal auth scope, enforced controls, an evaluation suite, and an audit schema — all declared, all testable.

A balance-lookup skill, a sanctions-screening skill, an affordability-assessment skill — each should be tested, validated, and approved independently of any agent that consumes it. Sanctions screening in particular should be exposed as a governed capability family — it varies by jurisdiction, customer type, product, list provider, matching threshold, and escalation path — with consistent policy, versioned contracts, and unified audit semantics across the family, not described as if it were a single, purely deterministic lookup.

The payoff is not merely fewer agents. It is fewer places where a control has to be reinterpreted. One governed screening capability family means one policy framework, one audit semantics, one validation and approval approach, and one shared contract pattern — instead of six subtly different screening behaviours scattered across six agents, each a separate way to get it wrong. This is also where the architecture earns its regulatory keep: separating the capability that recommends from the controlled path that executes is segregation of duties expressed in software, and it is a control auditors already expect.

The control most teams forget: evaluate the skills, not just the agents

For a 2026 agentic system, audit alone is not enough. Production agentic AI is judged by evaluation, tracing, and operational metrics, and current enterprise guidance is explicit about it: ground-truth datasets, tool-selection accuracy, parameter-extraction accuracy, refusal accuracy, latency, and cost-per-query.

The implication for a skills-first architecture is direct. A reusable skill must ship with its own evaluation suite — golden paths, known failure cases, policy tests — and its own trace schema, so that every invocation can be correlated across user, agent, tool, and system identity. A skill without a test set and an audit schema is not a governed skill. It is a reusable risk.

Security: name the controls, not just the threat

It is true that every surplus agent widens the attack surface, but “prompt injection” is too vague to act on. The relevant failure modes are well catalogued — prompt injection, excessive agency, insecure tool/plugin design, sensitive-information disclosure, and supply-chain compromise of the skills themselves — and each has a concrete countermeasure.

In a banking context that means: tool allowlists per agent; runtime authorisation on every action, not just at session start; strict parameter validation; mandatory human confirmation for irreversible actions; egress controls on what a skill can reach; provenance and signing for skills so a malicious or tampered skill cannot enter the catalogue; versioned contracts; and trace correlation across agent, tool, user, and system identity. These are the controls that turn “we use agents carefully” into something you can evidence to a regulator.

What this means for your platform

If you are building agentic capability in a bank, the implementation question becomes refreshingly narrow: how do I expose capabilities as governed, evaluated, reusable skills — and how do I enforce control boundaries, deterministically, where the right to act changes?

Modern agent platforms are converging on exactly this shape. On AWS, for instance, you can turn existing APIs into governed tools that authorised agents invoke through a managed gateway, and manage agent identity, delegated credentials, and authorisation so that authority is enforced per action rather than assumed. The point is not the product names; it is that the platform lets capability live in reusable skills while authority lives in deterministic controls. You should not need to wrap a capability in an agent to make it reusable or secure — you expose it as a governed skill, and you draw a control boundary, with real infrastructure behind it, only where authority genuinely changes.

What this means for model risk

For banks, this is also a model-risk-management question. Agent autonomy, skill reuse, and tool execution should map onto the bank’s existing model inventory, validation standards, control testing, monitoring, and independent challenge. A skill that influences a customer-impacting decision should not escape governance because it is “just a tool”; an agent that orchestrates multiple models should not be treated as a black box because it is “just a channel.” The disciplines already exist. The 2026 Revised Guidance on Model Risk Management (SR 26-2), which supersedes SR 11-7 and SR 21-8, reinforces a risk-based approach: inventory, validation, monitoring, effective challenge, and clear ownership, tailored to the organisation’s model risk profile. It does not scope generative or agentic AI as a special category, but those same governance disciplines — together with the NIST AI Risk Management Framework’s Govern / Map / Measure / Manage functions — remain the right lens for deciding how agents, skills, tools, and workflows are inventoried, tested, monitored, and challenged. A skills-first architecture maps onto them cleanly, because the skill is the unit you inventory, validate, and monitor.

The architecture-board checklist

Before creating another agent, ask:

Is the process genuinely dynamic, or can a workflow handle it?
Does the capability need autonomy, or just a reusable skill?
Where does the control boundary actually sit?
Which identity executes the action?
What policy decides whether the action is allowed?
What evidence will satisfy audit, model risk, security, and compliance?
How will the skill or agent be evaluated, before and after release?

If a workflow or a governed skill answers the need, you have spent none of your autonomy budget — which is exactly the goal.

The takeaway

A maturity signal in enterprise agentic AI is not how many agents you have. It is how deliberately you ration autonomy — and how well-governed the skills beneath it are.

Build a rich, approved, evaluated, and monitored library of skills. Prefer workflows for the deterministic processes that make up most of banking. Reserve agent autonomy for the genuinely open-ended problems that need it. And enforce authority where it actually changes hands, with identity, policy, approval, and audit — not by promoting a capability to “an agent.”

Agents, workflows, services, and skills are implementation choices. The control boundary is the requirement. Architect to it, and the rest follows.

References

Anthropic — Building Effective Agents: the workflow-vs-agent distinction and the “simplest pattern first” principle.
Anthropic — Writing Effective Tools for AI Agents: tool design and evaluation thinking.
AWS — Amazon Bedrock AgentCore: Gateway (exposing APIs and Lambda functions as governed, MCP-compatible tools) and Identity (per-action authorization, delegated credentials, audit).
AWS — Amazon Bedrock AgentCore Evaluations (GA, March 2026): built-in evaluators for response quality, safety, task completion, and tool usage.
OWASP — Top 10 for LLM Applications: prompt injection, excessive agency, sensitive-information disclosure, supply-chain, and insecure tool/plugin design.
EU AI Act — Annex III and Article 6: creditworthiness and credit-scoring systems classified high-risk.
NIST — AI Risk Management Framework: the Govern / Map / Measure / Manage functions.
US Federal Reserve / OCC / FDIC — SR 26-2, Revised Guidance on Model Risk Management (April 17, 2026): supersedes and replaces SR 11-7 and SR 21-8; the model-risk disciplines of inventory, validation, monitoring, ownership, and effective challenge that this architecture maps onto.