Robots.txt & Sitemap Validator
Validate your SEO files for errors and best practices.
Paste your robots.txt content and click "Validate" to check for issues
Paste your sitemap.xml content and click "Validate" to check for issues
π Complete Guide to SEO File Validation
Search engine optimization relies on properly configured technical files that communicate with search engine crawlers. Two of the most critical filesβrobots.txt and sitemap.xmlβdirectly influence how search engines discover, crawl, and index your website. Errors in these files can prevent pages from being indexed, waste crawl budget, or even accidentally hide your entire site from search engines. Our validator helps you catch these issues before they impact your rankings.
While these files seem simple, subtle syntax errors or misconfigurations are surprisingly common. A misplaced character, incorrect date format, or invalid directive can silently cause problems that are difficult to diagnose without proper validation tools.
π€ Understanding Robots.txt
| Directive | Purpose | Example |
|---|---|---|
| User-agent | Specifies which crawler the rules apply to | User-agent: Googlebot |
| Disallow | Blocks crawling of specified paths | Disallow: /admin/ |
| Allow | Explicitly permits crawling (overrides Disallow) | Allow: /admin/public/ |
| Sitemap | Points crawlers to your XML sitemap | Sitemap: https://site.com/sitemap.xml |
| Crawl-delay | Requests delay between requests (seconds) | Crawl-delay: 10 |
πΊοΈ Understanding Sitemap.xml
| Element | Required | Purpose |
|---|---|---|
| <loc> | Yes | Full URL of the page |
| <lastmod> | No (recommended) | Last modification date (W3C format) |
| <changefreq> | No | How often the page changes |
| <priority> | No | Relative importance (0.0 to 1.0) |
β What Our Validator Checks
Our validator performs comprehensive checks on both file types:
- Robots.txt Validation:
- Syntax correctness (directive: value format)
- Valid directive names (User-agent, Disallow, Allow, Sitemap, Crawl-delay)
- Proper ordering (rules after User-agent)
- Sitemap URL format validation
- Reasonable crawl-delay values
- Best practice recommendations
- Sitemap.xml Validation:
- Valid XML structure and parsing
- Correct namespace declaration
- Required <loc> elements present
- Valid URL formats (absolute URLs with protocol)
- W3C date format compliance for <lastmod>
- Valid <priority> values (0.0-1.0)
- Valid <changefreq> values
- File size limits (50,000 URLs, 50MB max)
- Sitemap index structure validation
π‘ Pro Tip: After validating here, always test your live robots.txt with Google Search Console's robots.txt Tester. Our tool validates syntax and best practices, but Google's tool shows you exactly how Googlebot interprets your file and whether specific URLs are blocked or allowed. Use both tools together for comprehensive validation.
β οΈ Common Mistakes to Avoid
| Mistake | Problem | Solution |
|---|---|---|
| Disallow: / | Blocks your entire site from crawling | Only use on staging/dev sites; use Allow: / for production |
| Relative sitemap URL | Crawlers can't find your sitemap | Always use absolute URLs: https://site.com/sitemap.xml |
| Wrong date format | Dates ignored by search engines | Use W3C format: YYYY-MM-DD or full ISO 8601 |
| Missing namespace | Sitemap may not be parsed correctly | Include xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" |
| Over 50,000 URLs | Sitemap rejected by search engines | Split into multiple sitemaps with a sitemap index |