Agent Skills Writing Guide

This guide synthesizes best practices from the Vercel AI SDK, Anthropic Claude, and OpenAI ecosystems to systematically explain the definition, structural design, writing standards, and quality assurance framework for AI Agent Skills.

1. What is an Agent Skill

A Skill is a reusable functional module that an Agent can invoke — the core building block that extends a large language model’s generative capabilities into autonomous execution.

It is not a simple set of text instructions, but rather a decoupled composite package containing metadata, natural language instructions, executable scripts, and reference resources.

By encapsulating domain-specific expertise (such as code optimization rules, design specifications) into independent modules loaded on demand, Skills effectively address the context bloat and hallucination problems that plague LLMs during long-running tasks.

2. Three-Layer Progressive Disclosure Architecture

High-quality Skills employ a Progressive Disclosure three-layer architecture to maximize context window utilization:

Layer 1: Metadata (~100 tokens, always loaded at startup)
    ↓
Layer 2: SKILL.md body (≤5000 tokens / 500 lines, loaded when Skill is activated)
    ↓
Layer 3: Resource layer (scripts/ references/ assets/, loaded on demand)

2.1 Layer 1: Metadata

Loading timing: Always loaded during startup/initialization (~100 tokens)
Purpose: Serves as the LLM’s "decision boundary" for semantic matching and routing

---
name: analyzing-spreadsheets          # 1-64 chars, lowercase gerund form
description: >
  A skill for analyzing and processing Excel/CSV spreadsheet data.
  Use when data needs cleaning, sorting, or statistical aggregation.
  Do not invoke this skill when modifying system config or when data exceeds 1GB.
license: MIT                          # optional
compatibility: ">=1.0.0"              # optional
metadata:                             # optional
  author: "Data Team"
  version: "1.1.0"
allowed-tools:                        # optional - least privilege
  - read_file
  - write_file
  - python_interpreter
---

2.2 Layer 2: SKILL.md Body

Loading timing: Loaded when the Skill is activated
Core requirement: Strongly recommended to stay under 500 lines (or below 5000 tokens)
Role: The navigation hub of the entire Skill

2.3 Layer 3: Resources

Loading timing: Loaded on demand per instructions

Directory	Purpose	Example Contents
`scripts/`	Executable code and validation scripts	`validate_json.js`, `run_tests.sh`
`references/`	Detailed docs and dense rules	`api_endpoints.md`, `error_codes.md`
`assets/`	Templates and large config files	`config_template.json`

Reference rule: Always use relative paths, maintain one level of depth, and avoid deeply nested reference chains.

3. Directory Structure Template

skill-root/
├── SKILL.md                    # Body instructions (navigation hub)
├── scripts/
│   ├── validate_json.js        # Validate LLM output
│   ├── calculate_tax.py        # Complex calculations (LLMs are error-prone)
│   └── run_tests.sh            # TDD quality gate script
├── references/
│   ├── api_endpoints.md        # Full API endpoint listing
│   ├── ui_guidelines.md        # Color system and accessibility specs
│   └── error_codes.md          # Troubleshooting and error code table
└── assets/
    └── config_template.json    # Config template

4. Complete SKILL.md Template

---
name: analyzing-spreadsheets
description: >
  A skill for analyzing and processing Excel/CSV spreadsheet data.
  Use when data needs cleaning, sorting, or statistical aggregation.
  Do not invoke this skill when modifying system config or when data exceeds 1GB.
metadata:
  author: "Data Team"
  version: "1.1.0"
allowed-tools: ["read_file", "write_file", "python_interpreter"]
---

# Analyzing Spreadsheets

## 1. Core Objective and Intent
This skill helps agents clean and analyze structured tabular data in a deterministic manner.

## 2. Applicable Scenarios (When to Apply)
- Cleaning CSV files with null values or formatting errors
- Grouping and aggregating data by specific columns

## 3. Workflow and Execution Steps
1. **Read data**: First use the `read_file` tool to read the target table
2. **Clean data**: Call `scripts/clean_data.py` to handle missing values
3. **Output analysis**: Generate a structured (JSON) analysis report per user needs

## 4. Key Rules and Priority
- **[CRITICAL]** Never delete original files; always output results to a new file
- **[HIGH]** Output must be in JSON format for downstream system parsing

## 5. Code and Usage Examples
python scripts/clean_data.py --input raw.csv --output cleaned.csv

## 6. Detailed Reference Navigation
- Data structure spec → references/data_structures.md
- Common errors & FAQ → references/error_codes.md

5. Writing Standards

5.1 `name` Field Naming Conventions

Rule	Description	Example
Length	1-64 characters	`processing-pdfs`
Characters	Lowercase letters, digits, hyphens only	`analyzing-spreadsheets`
Part of speech	Must use gerund form (verb + -ing)	`building-apis`
Prohibited	No vague generic names	~~`helper`~~ ~~`utils`~~ ~~`tools`~~

5.2 `description` Writing Standards

The description is the LLM’s decision boundary — it determines whether a Skill is correctly routed and activated. Maximum 1024 characters.

Must include:

Third-person, objective professional tone ("This Skill" / "This tool")
Specific key terms
Clear answers to "what", "when to use", and "when not to use"
3-5 negative examples ("Do not invoke this Skill when…")

Bad example:

❌ I’m a tool that can help you optimize your code. If you think your code runs slow, you can come to me and I’ll tell you how to make it better.

Why it’s wrong: Uses first person "I", extremely vague description, no key terms, no trigger boundaries, no negative examples.

Good example:

✅ A skill for comprehensive performance optimization of React and Next.js applications. Use when resolving async waterfalls, reducing JavaScript bundle size, or optimizing RSC server-side rendering. Do not invoke when handling backend database migrations, configuring Nginx servers, or making pure CSS style modifications.

Adding negative examples and boundary cases can reduce false trigger rates by over 20%.

5.3 Trigger Condition Writing

**When to use this skill:**
- When analyzing financial data in .csv or .xlsx format
- When users request trend prediction reports based on tabular data
- When extracting intersection data from multiple tables

**When NOT to use (do not invoke this skill when):**
- Input data is unstructured text (e.g., PDF papers)
- Real-time web scraping is needed (use the web-scraping skill instead)
- The operation involves directly modifying underlying database records

5.4 Conciseness Principle

The context window is an extremely precious shared resource
Assume the Agent is already smart by default — don’t re-explain concepts it already knows
Remove redundant modifiers; only add information specific to the Skill
Every token must justify its existence

5.5 Information Organization Tips

Use tables to organize complex information
Use prefix systems (e.g., async-, bundle-) to tag related items
Provide clear priority markers (CRITICAL, HIGH, LOW) to guide LLM focus
Include complete code snippets and input/output examples to reduce hallucination

6. Key Design Patterns

6.1 Decoupling Declarative Instructions from Imperative Tools

Core principle: Intent belongs to the agent, execution belongs to tools

Use natural language to precisely define task boundaries (declarative)
Delegate complex logic, regex matching, etc. to dedicated scripts (imperative)
Eliminate LLM’s computational weaknesses

6.2 Matching Freedom to Task Fragility

Adjust instruction specificity based on the risk level of the task:

Freedom Level	Applicable Scenarios	Implementation
High	Creative tasks, data analysis	Provide principles, allow flexible implementation
Medium	Code style, best practices	Recommend specific methods, allow some flexibility
Low	Database migrations, security configs	Exact commands, strict steps, direct script calls

6.3 Dynamic Capability Acquisition

Don’t require all rules to be hardcoded. Use dynamic fetching tools (like WebFetch) to retrieve the latest specifications at runtime, turning the Skill into a "rules gateway":

## Step 1: Fetch latest design specs
Before reviewing any UI component, first call the `fetch_url` tool:
fetch_url https://design.company.com/api/latest-guidelines.json

## Step 2: Execute review
Compare local code against fetched specs, output in this format:
{file_path}:{line_number} - {violation_description} (ref: {spec_ID})

6.4 Mandatory Workflows (Quality Gates)

Instead of trusting a single LLM output, set up non-negotiable quality gates:

#!/bin/bash
# scripts/tdd_gate.sh - Mandatory TDD quality gate
TEST_FILE=$1
CODE_FILE=$2

if [ ! -f "$TEST_FILE" ]; then
  echo "[ERROR] Quality gate blocked: must write test file $TEST_FILE first!"
  exit 1
fi

npm run test -- $TEST_FILE
if [ $? -ne 0 ]; then
  echo "[ERROR] Tests failed. Please revisit the logic in $CODE_FILE."
  exit 1
fi

echo "[SUCCESS] Tests passed, changes approved for commit."

7. Three Ecosystem Paradigm Comparison

Dimension	Vercel AI SDK	Anthropic Claude	OpenAI
Focus	Knowledge-driven best practice distribution	File-system level integration & sandboxed ops	Long-running agents & complex workflow control
Highlights	Dynamic capability acquisition, seamless IDE integration	Freedom matching, strict naming conventions	Context compression, error recovery
Output	`file:line` format for IDE integration	Sandboxed code execution	Persistent execution

Points of consensus (high industry alignment):

Progressive disclosure three-layer loading architecture
Conciseness principle (assume the Agent is already smart)
Systematic verification and quality gates

8. Four-Step Systematic Verification Process

Every Skill must pass all four verification steps before release:

1	1. Discovery Validation → 2. Logic Validation → 3. Boundary Testing → 4. Architecture Optimization

Step 1: Discovery Validation

Create 3 prompts that should trigger the Skill
Create 3 prompts that should NOT trigger the Skill
Test whether the LLM’s routing interpretation of the description is 100% accurate

Step 2: Logic Validation

Simulate the Agent executing the full workflow
Flag any execution blockers
Ensure steps are deterministic and don’t force the model to hallucinate

Step 3: Boundary Case Testing

Input unsupported configurations
Test extreme input data
Simulate network disconnection failure states
Verify the Agent can gracefully exit or report errors

Step 4: Architecture Optimization

Verify SKILL.md is under 500 lines
Confirm dense rules have been moved to references/
Confirm large configs have been moved to assets/
Verify token usage is efficient

9. Security and Reliability Strategies

9.1 Security and Permission Governance

Least privilege: Use short-lived, narrowly scoped credentials
Output sanitization: Strictly filter output to prevent indirect prompt injection
Audit trail: Log the complete payload and reasoning trace for every tool call
Human-in-the-loop (HITL): Operations involving money/deletion/production deployment must require secondary confirmation

9.2 Fault Tolerance and Exception Handling

Strategy	Description
Type safety	Use strong constraint validation like Zod Schema for parameter types
Retry with backoff	Build exponential backoff retry logic for unstable APIs
Degradation strategy	Auto-switch to fallback models when primary model is rate-limited
Loop prevention	Enforce `maxSteps` (e.g., 20 steps) and timeout thresholds

9.3 State Visibility and Transparency

Real-time decision logs: Show currently invoked APIs and reasoning stages
Progress indicators: Provide step-by-step task checklists for long chain tasks
Diff preview: Generate diff views or Git worktree previews before modifications
Universal undo: Support one-click rollback; failed attempts don’t pollute the main codebase

10. Token Budget Management Strategies

Progressive disclosure: Initialize with ~100 tokens metadata → load ≤5000 tokens body on activation → read resource layer on demand
Eliminate redundancy: Remove unnecessary modifiers and repeated concepts
Context compaction: Periodically summarize and clear early reasoning traces during long-running tasks to prevent token exhaustion crashes

11. Version and Lifecycle Management

Version Management

Explicitly include a version field in metadata
Maintain a changelog to record iteration history
Use Git branching strategies; publish new versions for breaking changes with backward compatibility guides

Lifecycle

1	Discovery & Distribution → Deployment & Monitoring → Data-Driven Optimization → (cycle)

Discovery & distribution: Package via standardized specs (e.g., agentskills.io), one-click install and cross-platform distribution
Production monitoring: Observe tool call accuracy, token consumption distribution, hallucination trigger points
Continuous improvement: Collect runtime reasoning traces for annotation, continuously optimize description positive and negative examples

12. Quick Reference Checklist

Pre-release checklist for Skills:

[ ] name uses lowercase gerund form, 1-64 characters
[ ] description includes positive and negative trigger conditions
[ ] SKILL.md does not exceed 500 lines / 5000 tokens
[ ] Complex calculations are delegated to scripts in scripts/
[ ] Dense rules have been moved to the references/ directory
[ ] Resource references use relative paths with one level of depth
[ ] Passed discovery validation (3 positive + 3 negative prompt tests)
[ ] Passed logic validation (no execution blockers)
[ ] Passed boundary case testing
[ ] Token usage is efficient with no redundant information
[ ] High-risk operations have human-in-the-loop gates
[ ] Version number is annotated in metadata

If you like this blog or find it useful for you, you are welcome to comment on it. You are also welcome to share this blog, so that more people can participate in it. If the images used in the blog infringe your copyright, please contact the author to delete them. Thank you !

Agent Skills Writing Guide

A comprehensive guide to writing AI Agent Skills based on best practices from Vercel AI SDK, Anthropic Claude, and OpenAI ecosystems

1. What is an Agent Skill

2. Three-Layer Progressive Disclosure Architecture

2.1 Layer 1: Metadata

2.2 Layer 2: SKILL.md Body

2.3 Layer 3: Resources

3. Directory Structure Template

4. Complete SKILL.md Template

5. Writing Standards

5.1 `name` Field Naming Conventions

5.2 `description` Writing Standards

5.3 Trigger Condition Writing

5.4 Conciseness Principle

5.5 Information Organization Tips

6. Key Design Patterns

6.1 Decoupling Declarative Instructions from Imperative Tools

6.2 Matching Freedom to Task Fragility

6.3 Dynamic Capability Acquisition

6.4 Mandatory Workflows (Quality Gates)

7. Three Ecosystem Paradigm Comparison

8. Four-Step Systematic Verification Process

Step 1: Discovery Validation

Step 2: Logic Validation

Step 3: Boundary Case Testing

Step 4: Architecture Optimization

9. Security and Reliability Strategies

9.1 Security and Permission Governance

9.2 Fault Tolerance and Exception Handling

9.3 State Visibility and Transparency

10. Token Budget Management Strategies

11. Version and Lifecycle Management

Version Management

Lifecycle

12. Quick Reference Checklist

FEATURED TAGS

FRIENDS

1. What is an Agent Skill

2. Three-Layer Progressive Disclosure Architecture

2.1 Layer 1: Metadata

2.2 Layer 2: SKILL.md Body

2.3 Layer 3: Resources

3. Directory Structure Template

4. Complete SKILL.md Template

5. Writing Standards

5.1 name Field Naming Conventions

5.2 description Writing Standards

5.3 Trigger Condition Writing

5.4 Conciseness Principle

5.5 Information Organization Tips

6. Key Design Patterns

6.1 Decoupling Declarative Instructions from Imperative Tools

6.2 Matching Freedom to Task Fragility

6.3 Dynamic Capability Acquisition

6.4 Mandatory Workflows (Quality Gates)

7. Three Ecosystem Paradigm Comparison

8. Four-Step Systematic Verification Process

Step 1: Discovery Validation

Step 2: Logic Validation

Step 3: Boundary Case Testing

Step 4: Architecture Optimization

9. Security and Reliability Strategies

9.1 Security and Permission Governance

9.2 Fault Tolerance and Exception Handling

9.3 State Visibility and Transparency

10. Token Budget Management Strategies

11. Version and Lifecycle Management

Version Management

Lifecycle

12. Quick Reference Checklist

FEATURED TAGS

FRIENDS

5.1 `name` Field Naming Conventions

5.2 `description` Writing Standards