XML sitemap best practices

SEOApril 21, 2026· 6 min read

An XML sitemap lists the urls you want crawled, with optional lastmod, changefreq, and priority hints, helping search and AI engines discover and re-crawl your content efficiently. Best practice: include only canonical, indexable, 200-status urls; keep lastmod accurate; reference the sitemap from robots.txt; split into a sitemap index once you exceed 50,000 urls or 50MB; and never list redirected, noindexed, or duplicate pages. A messy sitemap wastes crawl budget and erodes trust in your signals.

what belongs in a sitemap

A sitemap is a list of the pages you actually want indexed — nothing more. Every url in it should be canonical, return a 200 status, and be eligible for indexing. Including junk teaches engines to distrust your sitemap and wastes the crawl budget you want spent on real content.

  • Only canonical, indexable, 200-status urls — no redirects, no noindex, no duplicates.
  • Use absolute urls with your preferred protocol and host (https, www-or-not, consistently).
  • Set lastmod to the real last-modified date so engines know what to re-crawl.
  • Keep priority and changefreq honest — or omit them; engines largely treat them as hints.

structure and scale

A single sitemap holds up to 50,000 urls and 50MB uncompressed. Past that, use a sitemap index file that points to multiple sitemaps — and many sites do this deliberately by section (posts, products, pages) so they can spot crawl issues per area.

Reference your sitemap from robots.txt with a `Sitemap:` line and submit it in Google Search Console and Bing Webmaster Tools. AI engines and crawlers discover sitemaps the same way, so a discoverable, clean sitemap helps GEO too.

common sitemap mistakes

Sitemap problems are quiet — pages just don't get indexed, and you rarely get an error. The fixes are simple once you know what to look for.

  • Listing redirected or 404 urls — drop them; they waste crawl and signal staleness.
  • Listing noindex or canonicalized-away pages — contradicts your own signals.
  • Stale lastmod dates that never change — engines learn to ignore them.
  • Forgetting to reference the sitemap from robots.txt.
  • Not regenerating the sitemap when content changes.

Check your own page

seocheck scores your on-page SEO, GEO (AI answer engines), and sitemap health in seconds — free, no account.

FAQ

do priority and changefreq actually matter?
Google has said it largely ignores priority and changefreq, treating them as weak hints. lastmod is the most useful field when it's accurate. Don't over-invest in tuning priority — focus on listing only clean, canonical urls.
should I include images and videos in my sitemap?
You can use image and video sitemap extensions if media discovery matters for your site, but for most sites a clean url sitemap is the priority. Add media extensions only when you have substantial media you want indexed.
how do I know if my sitemap is healthy?
Run your site through seocheck — it discovers your robots.txt and sitemap, counts the urls, and flags sitemap health as part of the audit, so you can catch missing or broken sitemaps fast.

Keep reading