Robots.txt & Sitemap Validator

📝 Enter your robots.txt content

📋 Load Example Template

🤖

Paste your robots.txt content and click "Validate" to check for issues

📝 Enter your sitemap.xml content

📋 Load Example Template

🗺️

Paste your sitemap.xml content and click "Validate" to check for issues

💜 Support DC Tools

📚 Learn to Code ⚡ Resources ❤️ Donate 👕 Merch

📖 Complete Guide to SEO File Validation

Search engine optimization relies on properly configured technical files that communicate with search engine crawlers. Two of the most critical files—robots.txt and sitemap.xml—directly influence how search engines discover, crawl, and index your website. Errors in these files can prevent pages from being indexed, waste crawl budget, or even accidentally hide your entire site from search engines. Our validator helps you catch these issues before they impact your rankings.

While these files seem simple, subtle syntax errors or misconfigurations are surprisingly common. A misplaced character, incorrect date format, or invalid directive can silently cause problems that are difficult to diagnose without proper validation tools.

🤖 Understanding Robots.txt

Directive	Purpose	Example
User-agent	Specifies which crawler the rules apply to	User-agent: Googlebot
Disallow	Blocks crawling of specified paths	Disallow: /admin/
Allow	Explicitly permits crawling (overrides Disallow)	Allow: /admin/public/
Sitemap	Points crawlers to your XML sitemap	Sitemap: https://site.com/sitemap.xml
Crawl-delay	Requests delay between requests (seconds)	Crawl-delay: 10

🗺️ Understanding Sitemap.xml

Element	Required	Purpose
<loc>	Yes	Full URL of the page
<lastmod>	No (recommended)	Last modification date (W3C format)
<changefreq>	No	How often the page changes
<priority>	No	Relative importance (0.0 to 1.0)

✅ What Our Validator Checks

Our validator performs comprehensive checks on both file types:

Robots.txt Validation:
- Syntax correctness (directive: value format)
- Valid directive names (User-agent, Disallow, Allow, Sitemap, Crawl-delay)
- Proper ordering (rules after User-agent)
- Sitemap URL format validation
- Reasonable crawl-delay values
- Best practice recommendations
Sitemap.xml Validation:
- Valid XML structure and parsing
- Correct namespace declaration
- Required <loc> elements present
- Valid URL formats (absolute URLs with protocol)
- W3C date format compliance for <lastmod>
- Valid <priority> values (0.0-1.0)
- Valid <changefreq> values
- File size limits (50,000 URLs, 50MB max)
- Sitemap index structure validation

💡 Pro Tip: After validating here, always test your live robots.txt with Google Search Console's robots.txt Tester. Our tool validates syntax and best practices, but Google's tool shows you exactly how Googlebot interprets your file and whether specific URLs are blocked or allowed. Use both tools together for comprehensive validation.

⚠️ Common Mistakes to Avoid

Mistake	Problem	Solution
Disallow: /	Blocks your entire site from crawling	Only use on staging/dev sites; use Allow: / for production
Relative sitemap URL	Crawlers can't find your sitemap	Always use absolute URLs: https://site.com/sitemap.xml
Wrong date format	Dates ignored by search engines	Use W3C format: YYYY-MM-DD or full ISO 8601
Missing namespace	Sitemap may not be parsed correctly	Include xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
Over 50,000 URLs	Sitemap rejected by search engines	Split into multiple sitemaps with a sitemap index

❓ Frequently Asked Questions

Robots.txt is a text file in your website's root directory that provides instructions to web crawlers. It tells search engines which pages they should or shouldn't request from your site. While it's a "request" that crawlers can ignore, major search engines like Google, Bing, and others respect it. Proper robots.txt configuration helps manage crawl budget (how many pages search engines crawl), protect private areas from being indexed, prevent duplicate content issues, and ensure important pages get crawled efficiently.

A sitemap should list all important URLs on your site that you want search engines to discover and index. Each URL entry requires a <loc> element with the full URL. Recommended optional elements include <lastmod> (when the page was last updated), <changefreq> (how often it typically changes), and <priority> (relative importance compared to other pages on your site). Don't include URLs that return errors, are blocked by robots.txt, or have noindex tags—sitemaps are for indexable content only.

Update your sitemap whenever you add new pages, remove pages, or make significant content changes. For dynamic sites (blogs, e-commerce), automated sitemap generation is best—most CMS platforms (WordPress, Shopify, etc.) can generate sitemaps automatically. The <lastmod> dates should reflect actual content changes, not just regeneration time. Search engines use lastmod to prioritize which pages to recrawl, so accurate dates help them discover your updates faster.

No—this is a critical misconception. Robots.txt prevents crawling, not indexing. If other sites link to a page you've blocked in robots.txt, search engines may still index that URL based on the link text, showing it in search results without having crawled the actual content. To truly prevent indexing, use the "noindex" meta tag or X-Robots-Tag HTTP header. For sensitive content, use proper authentication. Robots.txt is for controlling crawl behavior, not for hiding content.

A regular sitemap.xml contains URL entries (<url> elements with <loc> tags). A sitemap index (<sitemapindex>) is a master file that references multiple sitemap files. Use a sitemap index when you have more than 50,000 URLs (the single-sitemap limit), want to organize sitemaps by content type (products, blog posts, pages), or need to split sitemaps for easier management. The index file tells search engines where to find all your individual sitemaps.

Sitemaps use W3C Datetime format. The minimum is YYYY-MM-DD (e.g., 2024-01-15). For more precision, use the full ISO 8601 format: YYYY-MM-DDThh:mm:ss+TZ (e.g., 2024-01-15T14:30:00+00:00 or 2024-01-15T14:30:00Z for UTC). The timezone offset helps search engines understand exactly when changes occurred. Avoid formats like "January 15, 2024" or "15/01/2024"—these won't be parsed correctly and your lastmod dates will be ignored.

📖 Complete Guide to SEO File Validation

🤖 Understanding Robots.txt

🗺️ Understanding Sitemap.xml

✅ What Our Validator Checks

⚠️ Common Mistakes to Avoid

❓ Frequently Asked Questions

Why Ads?

Found a Bug?

Robots.txt & Sitemap Validator

📖 Complete Guide to SEO File Validation

🤖 Understanding Robots.txt

🗺️ Understanding Sitemap.xml

✅ What Our Validator Checks

⚠️ Common Mistakes to Avoid

❓ Frequently Asked Questions

Why Ads?

Found a Bug?

🔧 Related Tools