Agent Skills Writing Guide

A comprehensive guide to writing AI Agent Skills based on best practices from Vercel AI SDK, Anthropic Claude, and OpenAI ecosystems

Posted by Loscoy on 2026-02-28
Estimated Reading Time 12 Minutes
Words 2k In Total
Viewed Times

This guide synthesizes best practices from the Vercel AI SDK, Anthropic Claude, and OpenAI ecosystems to systematically explain the definition, structural design, writing standards, and quality assurance framework for AI Agent Skills.


1. What is an Agent Skill

A Skill is a reusable functional module that an Agent can invoke — the core building block that extends a large language model’s generative capabilities into autonomous execution.

It is not a simple set of text instructions, but rather a decoupled composite package containing metadata, natural language instructions, executable scripts, and reference resources.

By encapsulating domain-specific expertise (such as code optimization rules, design specifications) into independent modules loaded on demand, Skills effectively address the context bloat and hallucination problems that plague LLMs during long-running tasks.


2. Three-Layer Progressive Disclosure Architecture

High-quality Skills employ a Progressive Disclosure three-layer architecture to maximize context window utilization:

1
2
3
4
5
Layer 1: Metadata (~100 tokens, always loaded at startup)

Layer 2: SKILL.md body (≤5000 tokens / 500 lines, loaded when Skill is activated)

Layer 3: Resource layer (scripts/ references/ assets/, loaded on demand)

2.1 Layer 1: Metadata

  • Loading timing: Always loaded during startup/initialization (~100 tokens)
  • Purpose: Serves as the LLM’s "decision boundary" for semantic matching and routing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
---
name: analyzing-spreadsheets # 1-64 chars, lowercase gerund form
description: >
A skill for analyzing and processing Excel/CSV spreadsheet data.
Use when data needs cleaning, sorting, or statistical aggregation.
Do not invoke this skill when modifying system config or when data exceeds 1GB.
license: MIT # optional
compatibility: ">=1.0.0" # optional
metadata: # optional
author: "Data Team"
version: "1.1.0"
allowed-tools: # optional - least privilege
- read_file
- write_file
- python_interpreter
---

2.2 Layer 2: SKILL.md Body

  • Loading timing: Loaded when the Skill is activated
  • Core requirement: Strongly recommended to stay under 500 lines (or below 5000 tokens)
  • Role: The navigation hub of the entire Skill

2.3 Layer 3: Resources

  • Loading timing: Loaded on demand per instructions
Directory Purpose Example Contents
scripts/ Executable code and validation scripts validate_json.js, run_tests.sh
references/ Detailed docs and dense rules api_endpoints.md, error_codes.md
assets/ Templates and large config files config_template.json

Reference rule: Always use relative paths, maintain one level of depth, and avoid deeply nested reference chains.


3. Directory Structure Template

1
2
3
4
5
6
7
8
9
10
11
12
skill-root/
├── SKILL.md # Body instructions (navigation hub)
├── scripts/
│ ├── validate_json.js # Validate LLM output
│ ├── calculate_tax.py # Complex calculations (LLMs are error-prone)
│ └── run_tests.sh # TDD quality gate script
├── references/
│ ├── api_endpoints.md # Full API endpoint listing
│ ├── ui_guidelines.md # Color system and accessibility specs
│ └── error_codes.md # Troubleshooting and error code table
└── assets/
└── config_template.json # Config template

4. Complete SKILL.md Template

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
name: analyzing-spreadsheets
description: >
A skill for analyzing and processing Excel/CSV spreadsheet data.
Use when data needs cleaning, sorting, or statistical aggregation.
Do not invoke this skill when modifying system config or when data exceeds 1GB.
metadata:
author: "Data Team"
version: "1.1.0"
allowed-tools: ["read_file", "write_file", "python_interpreter"]
---

# Analyzing Spreadsheets

## 1. Core Objective and Intent
This skill helps agents clean and analyze structured tabular data in a deterministic manner.

## 2. Applicable Scenarios (When to Apply)
- Cleaning CSV files with null values or formatting errors
- Grouping and aggregating data by specific columns

## 3. Workflow and Execution Steps
1. **Read data**: First use the `read_file` tool to read the target table
2. **Clean data**: Call `scripts/clean_data.py` to handle missing values
3. **Output analysis**: Generate a structured (JSON) analysis report per user needs

## 4. Key Rules and Priority
- **[CRITICAL]** Never delete original files; always output results to a new file
- **[HIGH]** Output must be in JSON format for downstream system parsing

## 5. Code and Usage Examples
python scripts/clean_data.py --input raw.csv --output cleaned.csv

## 6. Detailed Reference Navigation
- Data structure spec → references/data_structures.md
- Common errors & FAQ → references/error_codes.md

5. Writing Standards

5.1 name Field Naming Conventions

Rule Description Example
Length 1-64 characters processing-pdfs
Characters Lowercase letters, digits, hyphens only analyzing-spreadsheets
Part of speech Must use gerund form (verb + -ing) building-apis
Prohibited No vague generic names helper utils tools

5.2 description Writing Standards

The description is the LLM’s decision boundary — it determines whether a Skill is correctly routed and activated. Maximum 1024 characters.

Must include:

  • Third-person, objective professional tone ("This Skill" / "This tool")
  • Specific key terms
  • Clear answers to "what", "when to use", and "when not to use"
  • 3-5 negative examples ("Do not invoke this Skill when…")

Bad example:

❌ I’m a tool that can help you optimize your code. If you think your code runs slow, you can come to me and I’ll tell you how to make it better.

Why it’s wrong: Uses first person "I", extremely vague description, no key terms, no trigger boundaries, no negative examples.

Good example:

✅ A skill for comprehensive performance optimization of React and Next.js applications. Use when resolving async waterfalls, reducing JavaScript bundle size, or optimizing RSC server-side rendering. Do not invoke when handling backend database migrations, configuring Nginx servers, or making pure CSS style modifications.

Adding negative examples and boundary cases can reduce false trigger rates by over 20%.

5.3 Trigger Condition Writing

1
2
3
4
5
6
7
8
9
**When to use this skill:**
- When analyzing financial data in .csv or .xlsx format
- When users request trend prediction reports based on tabular data
- When extracting intersection data from multiple tables

**When NOT to use (do not invoke this skill when):**
- Input data is unstructured text (e.g., PDF papers)
- Real-time web scraping is needed (use the web-scraping skill instead)
- The operation involves directly modifying underlying database records

5.4 Conciseness Principle

  • The context window is an extremely precious shared resource
  • Assume the Agent is already smart by default — don’t re-explain concepts it already knows
  • Remove redundant modifiers; only add information specific to the Skill
  • Every token must justify its existence

5.5 Information Organization Tips

  • Use tables to organize complex information
  • Use prefix systems (e.g., async-, bundle-) to tag related items
  • Provide clear priority markers (CRITICAL, HIGH, LOW) to guide LLM focus
  • Include complete code snippets and input/output examples to reduce hallucination

6. Key Design Patterns

6.1 Decoupling Declarative Instructions from Imperative Tools

Core principle: Intent belongs to the agent, execution belongs to tools

  • Use natural language to precisely define task boundaries (declarative)
  • Delegate complex logic, regex matching, etc. to dedicated scripts (imperative)
  • Eliminate LLM’s computational weaknesses

6.2 Matching Freedom to Task Fragility

Adjust instruction specificity based on the risk level of the task:

Freedom Level Applicable Scenarios Implementation
High Creative tasks, data analysis Provide principles, allow flexible implementation
Medium Code style, best practices Recommend specific methods, allow some flexibility
Low Database migrations, security configs Exact commands, strict steps, direct script calls

6.3 Dynamic Capability Acquisition

Don’t require all rules to be hardcoded. Use dynamic fetching tools (like WebFetch) to retrieve the latest specifications at runtime, turning the Skill into a "rules gateway":

1
2
3
4
5
6
7
## Step 1: Fetch latest design specs
Before reviewing any UI component, first call the `fetch_url` tool:
fetch_url https://design.company.com/api/latest-guidelines.json

## Step 2: Execute review
Compare local code against fetched specs, output in this format:
{file_path}:{line_number} - {violation_description} (ref: {spec_ID})

6.4 Mandatory Workflows (Quality Gates)

Instead of trusting a single LLM output, set up non-negotiable quality gates:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/bin/bash
# scripts/tdd_gate.sh - Mandatory TDD quality gate
TEST_FILE=$1
CODE_FILE=$2

if [ ! -f "$TEST_FILE" ]; then
echo "[ERROR] Quality gate blocked: must write test file $TEST_FILE first!"
exit 1
fi

npm run test -- $TEST_FILE
if [ $? -ne 0 ]; then
echo "[ERROR] Tests failed. Please revisit the logic in $CODE_FILE."
exit 1
fi

echo "[SUCCESS] Tests passed, changes approved for commit."

7. Three Ecosystem Paradigm Comparison

Dimension Vercel AI SDK Anthropic Claude OpenAI
Focus Knowledge-driven best practice distribution File-system level integration & sandboxed ops Long-running agents & complex workflow control
Highlights Dynamic capability acquisition, seamless IDE integration Freedom matching, strict naming conventions Context compression, error recovery
Output file:line format for IDE integration Sandboxed code execution Persistent execution

Points of consensus (high industry alignment):

  • Progressive disclosure three-layer loading architecture
  • Conciseness principle (assume the Agent is already smart)
  • Systematic verification and quality gates

8. Four-Step Systematic Verification Process

Every Skill must pass all four verification steps before release:

1
1. Discovery Validation → 2. Logic Validation → 3. Boundary Testing → 4. Architecture Optimization

Step 1: Discovery Validation

  • Create 3 prompts that should trigger the Skill
  • Create 3 prompts that should NOT trigger the Skill
  • Test whether the LLM’s routing interpretation of the description is 100% accurate

Step 2: Logic Validation

  • Simulate the Agent executing the full workflow
  • Flag any execution blockers
  • Ensure steps are deterministic and don’t force the model to hallucinate

Step 3: Boundary Case Testing

  • Input unsupported configurations
  • Test extreme input data
  • Simulate network disconnection failure states
  • Verify the Agent can gracefully exit or report errors

Step 4: Architecture Optimization

  • Verify SKILL.md is under 500 lines
  • Confirm dense rules have been moved to references/
  • Confirm large configs have been moved to assets/
  • Verify token usage is efficient

9. Security and Reliability Strategies

9.1 Security and Permission Governance

  • Least privilege: Use short-lived, narrowly scoped credentials
  • Output sanitization: Strictly filter output to prevent indirect prompt injection
  • Audit trail: Log the complete payload and reasoning trace for every tool call
  • Human-in-the-loop (HITL): Operations involving money/deletion/production deployment must require secondary confirmation

9.2 Fault Tolerance and Exception Handling

Strategy Description
Type safety Use strong constraint validation like Zod Schema for parameter types
Retry with backoff Build exponential backoff retry logic for unstable APIs
Degradation strategy Auto-switch to fallback models when primary model is rate-limited
Loop prevention Enforce maxSteps (e.g., 20 steps) and timeout thresholds

9.3 State Visibility and Transparency

  • Real-time decision logs: Show currently invoked APIs and reasoning stages
  • Progress indicators: Provide step-by-step task checklists for long chain tasks
  • Diff preview: Generate diff views or Git worktree previews before modifications
  • Universal undo: Support one-click rollback; failed attempts don’t pollute the main codebase

10. Token Budget Management Strategies

  1. Progressive disclosure: Initialize with ~100 tokens metadata → load ≤5000 tokens body on activation → read resource layer on demand
  2. Eliminate redundancy: Remove unnecessary modifiers and repeated concepts
  3. Context compaction: Periodically summarize and clear early reasoning traces during long-running tasks to prevent token exhaustion crashes

11. Version and Lifecycle Management

Version Management

  • Explicitly include a version field in metadata
  • Maintain a changelog to record iteration history
  • Use Git branching strategies; publish new versions for breaking changes with backward compatibility guides

Lifecycle

1
Discovery & Distribution → Deployment & Monitoring → Data-Driven Optimization → (cycle)
  • Discovery & distribution: Package via standardized specs (e.g., agentskills.io), one-click install and cross-platform distribution
  • Production monitoring: Observe tool call accuracy, token consumption distribution, hallucination trigger points
  • Continuous improvement: Collect runtime reasoning traces for annotation, continuously optimize description positive and negative examples

12. Quick Reference Checklist

Pre-release checklist for Skills:

  • [ ] name uses lowercase gerund form, 1-64 characters
  • [ ] description includes positive and negative trigger conditions
  • [ ] SKILL.md does not exceed 500 lines / 5000 tokens
  • [ ] Complex calculations are delegated to scripts in scripts/
  • [ ] Dense rules have been moved to the references/ directory
  • [ ] Resource references use relative paths with one level of depth
  • [ ] Passed discovery validation (3 positive + 3 negative prompt tests)
  • [ ] Passed logic validation (no execution blockers)
  • [ ] Passed boundary case testing
  • [ ] Token usage is efficient with no redundant information
  • [ ] High-risk operations have human-in-the-loop gates
  • [ ] Version number is annotated in metadata

If you like this blog or find it useful for you, you are welcome to comment on it. You are also welcome to share this blog, so that more people can participate in it. If the images used in the blog infringe your copyright, please contact the author to delete them. Thank you !