This guide synthesizes best practices from the Vercel AI SDK, Anthropic Claude, and OpenAI ecosystems to systematically explain the definition, structural design, writing standards, and quality assurance framework for AI Agent Skills.
1. What is an Agent Skill
A Skill is a reusable functional module that an Agent can invoke — the core building block that extends a large language model’s generative capabilities into autonomous execution.
It is not a simple set of text instructions, but rather a decoupled composite package containing metadata, natural language instructions, executable scripts, and reference resources.
By encapsulating domain-specific expertise (such as code optimization rules, design specifications) into independent modules loaded on demand, Skills effectively address the context bloat and hallucination problems that plague LLMs during long-running tasks.
2. Three-Layer Progressive Disclosure Architecture
High-quality Skills employ a Progressive Disclosure three-layer architecture to maximize context window utilization:
1 | Layer 1: Metadata (~100 tokens, always loaded at startup) |
2.1 Layer 1: Metadata
- Loading timing: Always loaded during startup/initialization (~100 tokens)
- Purpose: Serves as the LLM’s "decision boundary" for semantic matching and routing
1 |
|
2.2 Layer 2: SKILL.md Body
- Loading timing: Loaded when the Skill is activated
- Core requirement: Strongly recommended to stay under 500 lines (or below 5000 tokens)
- Role: The navigation hub of the entire Skill
2.3 Layer 3: Resources
- Loading timing: Loaded on demand per instructions
| Directory | Purpose | Example Contents |
|---|---|---|
scripts/ |
Executable code and validation scripts | validate_json.js, run_tests.sh |
references/ |
Detailed docs and dense rules | api_endpoints.md, error_codes.md |
assets/ |
Templates and large config files | config_template.json |
Reference rule: Always use relative paths, maintain one level of depth, and avoid deeply nested reference chains.
3. Directory Structure Template
1 | skill-root/ |
4. Complete SKILL.md Template
1 | --- |
5. Writing Standards
5.1 name Field Naming Conventions
| Rule | Description | Example |
|---|---|---|
| Length | 1-64 characters | processing-pdfs |
| Characters | Lowercase letters, digits, hyphens only | analyzing-spreadsheets |
| Part of speech | Must use gerund form (verb + -ing) | building-apis |
| Prohibited | No vague generic names | helperutilstools |
5.2 description Writing Standards
The description is the LLM’s decision boundary — it determines whether a Skill is correctly routed and activated. Maximum 1024 characters.
Must include:
- Third-person, objective professional tone ("This Skill" / "This tool")
- Specific key terms
- Clear answers to "what", "when to use", and "when not to use"
- 3-5 negative examples ("Do not invoke this Skill when…")
Bad example:
❌ I’m a tool that can help you optimize your code. If you think your code runs slow, you can come to me and I’ll tell you how to make it better.
Why it’s wrong: Uses first person "I", extremely vague description, no key terms, no trigger boundaries, no negative examples.
Good example:
✅ A skill for comprehensive performance optimization of React and Next.js applications. Use when resolving async waterfalls, reducing JavaScript bundle size, or optimizing RSC server-side rendering. Do not invoke when handling backend database migrations, configuring Nginx servers, or making pure CSS style modifications.
Adding negative examples and boundary cases can reduce false trigger rates by over 20%.
5.3 Trigger Condition Writing
1 | **When to use this skill:** |
5.4 Conciseness Principle
- The context window is an extremely precious shared resource
- Assume the Agent is already smart by default — don’t re-explain concepts it already knows
- Remove redundant modifiers; only add information specific to the Skill
- Every token must justify its existence
5.5 Information Organization Tips
- Use tables to organize complex information
- Use prefix systems (e.g.,
async-,bundle-) to tag related items - Provide clear priority markers (
CRITICAL,HIGH,LOW) to guide LLM focus - Include complete code snippets and input/output examples to reduce hallucination
6. Key Design Patterns
6.1 Decoupling Declarative Instructions from Imperative Tools
Core principle: Intent belongs to the agent, execution belongs to tools
- Use natural language to precisely define task boundaries (declarative)
- Delegate complex logic, regex matching, etc. to dedicated scripts (imperative)
- Eliminate LLM’s computational weaknesses
6.2 Matching Freedom to Task Fragility
Adjust instruction specificity based on the risk level of the task:
| Freedom Level | Applicable Scenarios | Implementation |
|---|---|---|
| High | Creative tasks, data analysis | Provide principles, allow flexible implementation |
| Medium | Code style, best practices | Recommend specific methods, allow some flexibility |
| Low | Database migrations, security configs | Exact commands, strict steps, direct script calls |
6.3 Dynamic Capability Acquisition
Don’t require all rules to be hardcoded. Use dynamic fetching tools (like WebFetch) to retrieve the latest specifications at runtime, turning the Skill into a "rules gateway":
1 | ## Step 1: Fetch latest design specs |
6.4 Mandatory Workflows (Quality Gates)
Instead of trusting a single LLM output, set up non-negotiable quality gates:
1 |
|
7. Three Ecosystem Paradigm Comparison
| Dimension | Vercel AI SDK | Anthropic Claude | OpenAI |
|---|---|---|---|
| Focus | Knowledge-driven best practice distribution | File-system level integration & sandboxed ops | Long-running agents & complex workflow control |
| Highlights | Dynamic capability acquisition, seamless IDE integration | Freedom matching, strict naming conventions | Context compression, error recovery |
| Output | file:line format for IDE integration |
Sandboxed code execution | Persistent execution |
Points of consensus (high industry alignment):
- Progressive disclosure three-layer loading architecture
- Conciseness principle (assume the Agent is already smart)
- Systematic verification and quality gates
8. Four-Step Systematic Verification Process
Every Skill must pass all four verification steps before release:
1 | 1. Discovery Validation → 2. Logic Validation → 3. Boundary Testing → 4. Architecture Optimization |
Step 1: Discovery Validation
- Create 3 prompts that should trigger the Skill
- Create 3 prompts that should NOT trigger the Skill
- Test whether the LLM’s routing interpretation of the
descriptionis 100% accurate
Step 2: Logic Validation
- Simulate the Agent executing the full workflow
- Flag any execution blockers
- Ensure steps are deterministic and don’t force the model to hallucinate
Step 3: Boundary Case Testing
- Input unsupported configurations
- Test extreme input data
- Simulate network disconnection failure states
- Verify the Agent can gracefully exit or report errors
Step 4: Architecture Optimization
- Verify SKILL.md is under 500 lines
- Confirm dense rules have been moved to
references/ - Confirm large configs have been moved to
assets/ - Verify token usage is efficient
9. Security and Reliability Strategies
9.1 Security and Permission Governance
- Least privilege: Use short-lived, narrowly scoped credentials
- Output sanitization: Strictly filter output to prevent indirect prompt injection
- Audit trail: Log the complete payload and reasoning trace for every tool call
- Human-in-the-loop (HITL): Operations involving money/deletion/production deployment must require secondary confirmation
9.2 Fault Tolerance and Exception Handling
| Strategy | Description |
|---|---|
| Type safety | Use strong constraint validation like Zod Schema for parameter types |
| Retry with backoff | Build exponential backoff retry logic for unstable APIs |
| Degradation strategy | Auto-switch to fallback models when primary model is rate-limited |
| Loop prevention | Enforce maxSteps (e.g., 20 steps) and timeout thresholds |
9.3 State Visibility and Transparency
- Real-time decision logs: Show currently invoked APIs and reasoning stages
- Progress indicators: Provide step-by-step task checklists for long chain tasks
- Diff preview: Generate diff views or Git worktree previews before modifications
- Universal undo: Support one-click rollback; failed attempts don’t pollute the main codebase
10. Token Budget Management Strategies
- Progressive disclosure: Initialize with ~100 tokens metadata → load ≤5000 tokens body on activation → read resource layer on demand
- Eliminate redundancy: Remove unnecessary modifiers and repeated concepts
- Context compaction: Periodically summarize and clear early reasoning traces during long-running tasks to prevent token exhaustion crashes
11. Version and Lifecycle Management
Version Management
- Explicitly include a
versionfield in metadata - Maintain a changelog to record iteration history
- Use Git branching strategies; publish new versions for breaking changes with backward compatibility guides
Lifecycle
1 | Discovery & Distribution → Deployment & Monitoring → Data-Driven Optimization → (cycle) |
- Discovery & distribution: Package via standardized specs (e.g.,
agentskills.io), one-click install and cross-platform distribution - Production monitoring: Observe tool call accuracy, token consumption distribution, hallucination trigger points
- Continuous improvement: Collect runtime reasoning traces for annotation, continuously optimize
descriptionpositive and negative examples
12. Quick Reference Checklist
Pre-release checklist for Skills:
- [ ]
nameuses lowercase gerund form, 1-64 characters - [ ]
descriptionincludes positive and negative trigger conditions - [ ] SKILL.md does not exceed 500 lines / 5000 tokens
- [ ] Complex calculations are delegated to scripts in
scripts/ - [ ] Dense rules have been moved to the
references/directory - [ ] Resource references use relative paths with one level of depth
- [ ] Passed discovery validation (3 positive + 3 negative prompt tests)
- [ ] Passed logic validation (no execution blockers)
- [ ] Passed boundary case testing
- [ ] Token usage is efficient with no redundant information
- [ ] High-risk operations have human-in-the-loop gates
- [ ] Version number is annotated in metadata
If you like this blog or find it useful for you, you are welcome to comment on it. You are also welcome to share this blog, so that more people can participate in it. If the images used in the blog infringe your copyright, please contact the author to delete them. Thank you !