Context
Use this skill as part of a technical SEO audit or when indexation issues appear in GSC. Crawlability is the foundation of all SEO: if crawlers cannot access and index content, no other optimization matters. With the rise of AI engines, crawler access now includes GPTBot, ClaudeBot, PerplexityBot, and Google-Extended alongside traditional Googlebot.
Procedure
- Audit robots.txt line by line: verify each Allow/Disallow rule is intentional and correct.
- Check for common robots.txt mistakes: blocking CSS/JS, blocking entire directories accidentally, wildcard overreach.
- Document AI crawler status: for each known AI crawler (GPTBot, ChatGPT-User, PerplexityBot, ClaudeBot, Google-Extended, Bingbot), confirm whether allowed, blocked, or not mentioned.
- Validate sitemap: check URL count, last modified dates, HTTP status of listed URLs, presence of orphan URLs.
- Cross-reference GSC indexing report: identify pages submitted but not indexed, and diagnose the crawl/index reason.
- Assess crawl budget signals: page count vs. crawl rate, server response times, duplicate content that wastes crawl budget.
- Produce fix plan with specific file changes and expected impact.
Output Format
# Crawlability Audit: [Domain]
## robots.txt Assessment
| Line | Rule | Assessment | Action |
|------|------|-----------|--------|
| 1 | User-agent: * | | |
| 2 | Disallow: /admin/ | Correct | None |
| 3 | Disallow: /blog/ | ERROR: blocks all blog content | Remove |
## AI Crawler Access
| Crawler | Status | Recommended | Action |
|---------|--------|-------------|--------|
| GPTBot | Blocked | Allow | Add Allow rule |
| ChatGPT-User | Not mentioned | Allow | Add User-agent + Allow |
| PerplexityBot | Allowed | Allow | None |
| ClaudeBot | Not mentioned | Allow | Add User-agent + Allow |
| Google-Extended | Blocked | Allow | Change to Allow |
| Bingbot | Allowed | Allow | None |
## Proposed robots.txt
User-agent: *
Disallow: /admin/
Allow: /
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
Sitemap: https://[domain]/sitemap.xml
## Sitemap Health
| Metric | Value | Status |
|--------|-------|--------|
| Total URLs in sitemap | | |
| URLs returning 200 | | OK/Issue |
| URLs returning 404/301 | | Fix needed |
| Last modified freshness | | OK/Stale |
| Orphan URLs (in sitemap, no internal links) | | |
## Indexation Issues (from GSC)
| URL | GSC Status | Reason | Fix |
|-----|-----------|--------|-----|
| | Not indexed | [Crawled not indexed/Excluded by robots] | |
## Crawl Budget Recommendations
| Issue | Impact | Fix |
|-------|--------|-----|
| | High/Med/Low | |
QA Rubric (scored)
- robots.txt accuracy (0-5): every rule assessed, mistakes identified, fix provided.
- AI crawler coverage (0-5): all major AI crawlers documented with clear allow/block status.
- Sitemap validation (0-5): URL count, freshness, and error rate checked.
- Fix plan specificity (0-5): exact file changes provided, not vague recommendations.
Examples (good/bad)
- Good: "robots.txt line 7 'Disallow: /api/' blocks /api/products/ which is needed for structured data testing. Recommendation: change to 'Disallow: /api/internal/' to block only internal endpoints."
- Bad: "Fix your robots.txt." (no specific issue, no line reference, no proposed change)
Variants
- Quick check variant: robots.txt + AI crawler access audit only (15-minute turnaround).
- Full audit variant: robots.txt + sitemap + GSC indexation + crawl budget + server log analysis.