
Locking Down Claude Code: Permissions, Sandbox, Hooks, and Enterprise Controls
A defense-in-depth playbook for Claude Code — from .claudeignore to OS-level sandbox, with real-world scenarios and …
Sources (verified March 2026):
Claude Code’s context window is finite — 200K tokens for Opus. Every turn, the entire conversation is re-sent to the model (see LLM Token Economics for background). That means anything loaded into context costs tokens on every single request, not just the first one.
Skills and MCP servers are the two main ways to extend Claude Code. Both add capabilities, but they load into context very differently — and that difference has a real impact on your token bill and how quickly you hit the context limit.
MCP (Model Context Protocol) servers connect Claude Code to external tools — databases, GitHub, Slack, monitoring dashboards, filesystem utilities. Each MCP server exposes tool definitions: JSON schemas describing what each tool does, its parameters, and return types.
When you start a Claude Code session, all tool definitions from all connected MCP servers are loaded into context. They stay there for every request in the session, whether you use them or not.
Session with 3 MCP servers (GitHub, Slack, Filesystem):
Request 1: [system prompt] + [MCP tools: ~3,000 tokens] + [your message]
Request 2: [system prompt] + [MCP tools: ~3,000 tokens] + [conversation history] + [your message]
Request 3: [system prompt] + [MCP tools: ~3,000 tokens] + [conversation history] + [your message]
...
Request 50: [system prompt] + [MCP tools: ~3,000 tokens] + [conversation history] + [your message]
Those ~3,000 tokens are present in EVERY request.
Over 50 requests, that's ~150,000 tokens of overhead — just for tool definitions.
You can see the actual cost with the /mcp command, which shows token overhead per connected server.
Think of it like carrying every tool in the toolbox to every room — even if you only need a screwdriver.
Claude Code has a built-in optimization called Tool Search. As of March 2026, Tool Search is enabled by default — tool definitions are deferred and discovered on-demand rather than loaded upfront. Claude uses a search mechanism to find and load only the tools it needs for each request.
Without Tool Search:
All tool definitions loaded → every request pays the cost
With Tool Search (default):
Tool definitions deferred → Claude searches for tools when needed
Only discovered tools loaded into context
You can configure this behavior:
# Auto mode: only defer when tools exceed 5% of context
ENABLE_TOOL_SEARCH=auto:5 claude
# Force tool search on — always defer (current default)
ENABLE_TOOL_SEARCH=true claude
# Force tool search off — always load all upfront
ENABLE_TOOL_SEARCH=false claude
Source: Claude Code MCP docs: “Claude Code automatically enables Tool Search when your MCP tool descriptions would consume more than 10% of the context window.” Note: the default is now
true(always enabled), withauto:Navailable for threshold-based triggering.
Skills are markdown-based extensions that teach Claude domain knowledge, provide reference material, or define reusable workflows. They live in .claude/skills/<skill-name>/SKILL.md files.
Skills use a fundamentally different loading strategy. Only skill descriptions (a few sentences each) load at session start. The full content loads only when the skill is actually used.
Session with 3 skills (deploy, review, conventions):
Request 1: [system prompt] + [skill descriptions: ~300 tokens] + [your message]
→ Claude reads descriptions, decides none are needed
Request 2: [system prompt] + [skill descriptions: ~300 tokens] + [conversation] + [your message]
→ Claude decides to use "conventions" skill
→ Full skill content (~1,500 tokens) loads for THIS request only
Request 3: [system prompt] + [skill descriptions: ~300 tokens] + [conversation] + [your message]
→ Back to just descriptions
Idle cost: ~300 tokens per request (descriptions only)
Active cost: ~1,800 tokens (descriptions + one full skill)
Think of it like a table of contents — you see the chapter titles on every page, but you only open the chapter you need.
Skills have a disable-model-invocation setting that controls context behavior:
| Setting | Claude can invoke? | You can invoke? | Context behavior |
|---|---|---|---|
false (default) | Yes | Yes (/<name>) | Description in context every request; full content loads when used |
true | No | Yes (/<name>) | Nothing in context until you invoke manually |
Setting disable-model-invocation: true is powerful for workflows you only trigger yourself (like /deploy or /ai-pulse). The idle context cost drops to zero.
# .claude/skills/deploy/SKILL.md
---
name: deploy
description: Deploy the application to production
disable-model-invocation: true
---
# Deploy steps...
Source: Claude Code Skills docs: “Set to
trueto prevent Claude from automatically loading this skill. Use for workflows you only want to trigger manually.”
Skill descriptions share a character budget that scales at 2% of the context window (fallback: 16,000 characters). If you have many skills, some descriptions may be excluded. Check with /context and override the limit if needed:
SLASH_COMMAND_TOOL_CHAR_BUDGET=32000 claude
Skills behave differently in subagents. Instead of lazy loading, skills passed to a subagent are fully preloaded into its context at launch. They aren’t inherited from the parent session — you must list them explicitly.
Source: Claude Code Features Overview: “In subagents: Skills work differently in subagents. Instead of on-demand loading, skills passed to a subagent are fully preloaded into its context at launch.”
The official Claude Code features overview provides this comparison:
| Feature | When it loads | What loads | Context cost |
|---|---|---|---|
| CLAUDE.md | Session start | Full content | Every request |
| Skills | Session start + when used | Descriptions at start, full content when used | Low (descriptions every request) |
| MCP servers | Session start | All tool definitions and schemas | Every request |
| Subagents | When spawned | Fresh context with specified skills | Isolated from main session |
| Hooks | On trigger | Nothing (runs externally) | Zero |
Source: Claude Code Features Overview — context loading comparison table.
Let’s compare the approaches with 5 extensions, assuming each MCP server adds ~600 tokens of tool definitions and each skill has a ~100 token description and ~1,500 tokens of full content:
5 MCP servers (below Tool Search threshold):
Idle overhead: 5 × 600 = 3,000 tokens per request
Over 50 turns: 3,000 × 50 = 150,000 tokens of tool definitions
Used or not: Same cost either way
5 Skills (default, model-invocable):
Idle overhead: 5 × 100 = 500 tokens per request (descriptions)
Over 50 turns: 500 × 50 = 25,000 tokens of descriptions
If 2 skills used once each: + 2 × 1,500 = 3,000 tokens (only on those turns)
Total: ~28,000 tokens
5 Skills (manual-only, disable-model-invocation: true):
Idle overhead: 0 tokens
Over 50 turns: 0 tokens until invoked
If 2 skills invoked once each: 2 × 1,500 = 3,000 tokens (only on those turns)
Total: ~3,000 tokens
Context window usage over a session:
MCP servers: [████████████████████░░░░░░░░░░] 150K tokens (tool defs alone)
Skills (auto): [██░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 28K tokens
Skills (manual):[░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 3K tokens
█ = token overhead from extensions
░ = available for conversation
The difference is dramatic. MCP servers consume 5-50x more context than equivalent skills, depending on usage patterns.
| Scenario | Use | Why |
|---|---|---|
| External API integration (GitHub, Jira, databases) | MCP server | Skills can’t make external API calls; MCP servers provide real tool execution |
| Coding conventions / style guide | Skill | Reference material, only needed when writing code in that style |
Deployment workflow (/deploy) | Skill (manual-only) | Triggered explicitly, zero idle cost |
| File system operations beyond CWD | MCP server | Built-in tools are scoped to CWD; MCP can extend reach |
| Code review checklist | Skill | On-demand reference, not needed every turn |
| Database queries during debugging | MCP server | Needs live connection; frequent use justifies the overhead |
| Large prompt template / boilerplate | Skill (manual-only) | Only loaded when you need it |
Key insight: If the same capability exists as both a CLI tool and an MCP server, prefer the CLI tool. Claude can run gh, aws, gcloud, and sentry-cli directly via Bash without any persistent context overhead.
Source: Claude Code Costs docs: “Prefer CLI tools when available: Tools like
gh,aws,gcloud, andsentry-cliare more context-efficient than MCP servers because they don’t add persistent tool definitions.”
Run /mcp in any session to see token costs per server. Disconnect servers you aren’t actively using. Each idle server silently consumes tokens on every request.
If a CLI exists for the service, use it. gh pr list costs zero idle tokens. A GitHub MCP server’s tool definitions cost tokens on every request, whether you create a PR or not.
Set disable-model-invocation: true on skills you only trigger yourself. This eliminates all idle context cost — the skill loads only when you type /<name>.
CLAUDE.md loads in full on every request. The official guidance recommends keeping it under ~500 lines (per the costs and features-overview docs). Files over 200 lines may start to reduce adherence to instructions. If yours is larger, move specialized sections (coding standards for a specific language, deployment procedures, review checklists) into skills. They’ll load only when relevant.
Source: Claude Code Costs docs: “Aim to keep CLAUDE.md under ~500 lines by including only essentials.” Memory docs: “Files over 200 lines consume more context and may reduce adherence.”
Tool Search is enabled by default. If you want threshold-based triggering instead (only defer when tools are large), use the auto mode:
ENABLE_TOOL_SEARCH=auto:5 claude
This triggers deferred loading only when tool definitions exceed 5% of the context window. The default (true) always defers regardless of size.
Use both commands during sessions:
/context — visual grid showing context usage with optimization suggestions/cost — token usage statistics and cost breakdownLarge MCP tool outputs can flood your context. Claude Code warns at 10,000 tokens per output and has a hard limit of 25,000 tokens (configurable):
# Increase if your MCP tools legitimately return large payloads
MAX_MCP_OUTPUT_TOKENS=50000 claude
| Concept | Key Takeaway |
|---|---|
| MCP servers | Tool definitions load at session start, present every request — persistent overhead |
| Tool Search | Enabled by default — defers MCP tools and discovers on-demand (configurable via ENABLE_TOOL_SEARCH) |
| Skills (default) | Descriptions load at start (~low cost), full content only when used |
| Skills (manual-only) | Zero context cost until you invoke with /<name> |
| Subagent skills | Fully preloaded at launch, not lazy-loaded — explicit opt-in only |
| CLI vs MCP | CLI tools have zero idle overhead — prefer when available |
| CLAUDE.md | Loads fully every request — keep under ~500 lines, overflow to skills |
| Monitoring | /mcp for server costs, /context for usage grid, /cost for token stats |
The core principle: load what you need, when you need it. Skills do this by default. MCP servers don’t — but Tool Search helps at scale. Choose the right extension type for the job, audit your context regularly, and your token budget (and context window) will go much further.
Last verified: March 2026. Claude Code features evolve rapidly — always check the official documentation for current behavior.
Questions or feedback? Reach out on LinkedIn