XML sitemap best practices
An XML sitemap lists the urls you want crawled, with optional lastmod, changefreq, and priority hints, helping search and AI engines discover and re-crawl your content efficiently. Best practice: include only canonical, indexable, 200-status urls; keep lastmod accurate; reference the sitemap from robots.txt; split into a sitemap index once you exceed 50,000 urls or 50MB; and never list redirected, noindexed, or duplicate pages. A messy sitemap wastes crawl budget and erodes trust in your signals.
what belongs in a sitemap
A sitemap is a list of the pages you actually want indexed — nothing more. Every url in it should be canonical, return a 200 status, and be eligible for indexing. Including junk teaches engines to distrust your sitemap and wastes the crawl budget you want spent on real content.
- Only canonical, indexable, 200-status urls — no redirects, no noindex, no duplicates.
- Use absolute urls with your preferred protocol and host (https, www-or-not, consistently).
- Set lastmod to the real last-modified date so engines know what to re-crawl.
- Keep priority and changefreq honest — or omit them; engines largely treat them as hints.
structure and scale
A single sitemap holds up to 50,000 urls and 50MB uncompressed. Past that, use a sitemap index file that points to multiple sitemaps — and many sites do this deliberately by section (posts, products, pages) so they can spot crawl issues per area.
Reference your sitemap from robots.txt with a `Sitemap:` line and submit it in Google Search Console and Bing Webmaster Tools. AI engines and crawlers discover sitemaps the same way, so a discoverable, clean sitemap helps GEO too.
common sitemap mistakes
Sitemap problems are quiet — pages just don't get indexed, and you rarely get an error. The fixes are simple once you know what to look for.
- Listing redirected or 404 urls — drop them; they waste crawl and signal staleness.
- Listing noindex or canonicalized-away pages — contradicts your own signals.
- Stale lastmod dates that never change — engines learn to ignore them.
- Forgetting to reference the sitemap from robots.txt.
- Not regenerating the sitemap when content changes.
Check your own page
seocheck scores your on-page SEO, GEO (AI answer engines), and sitemap health in seconds — free, no account.
FAQ
- do priority and changefreq actually matter?
- Google has said it largely ignores priority and changefreq, treating them as weak hints. lastmod is the most useful field when it's accurate. Don't over-invest in tuning priority — focus on listing only clean, canonical urls.
- should I include images and videos in my sitemap?
- You can use image and video sitemap extensions if media discovery matters for your site, but for most sites a clean url sitemap is the priority. Add media extensions only when you have substantial media you want indexed.
- how do I know if my sitemap is healthy?
- Run your site through seocheck — it discovers your robots.txt and sitemap, counts the urls, and flags sitemap health as part of the audit, so you can catch missing or broken sitemaps fast.