Technical SEO
XML Sitemap: complete guide — attributes, types, localisation and robots.txt

Complete XML sitemap guide: required and optional attributes, sitemap index, specialised maps for images, video and news, hreflang for localised sites, the single-line robots.txt rule — with links to Google and Yandex documentation.
An XML sitemap is a file you hand directly to the search crawler instead of waiting for it to discover pages by following links. You explicitly say: "here is a list of URLs I want indexed, here is when they were last updated". It is not a guarantee of indexing, but it significantly speeds up discovery and reduces crawl budget waste.
The sitemaps.org protocol was introduced in 2005 — Google adopted it first, followed by Yandex a year later. Today all major search engines support the standard.
What is an XML sitemap and why you need it
A search bot discovers pages in two ways: by following internal links and by reading sitemaps. Link-based crawling works well for pages with many inbound links. But pages without any links, recently published content, or sections with sparse internal linking may be missed or discovered too slowly. That is exactly where a sitemap helps.
Speeds up crawling
The bot receives a URL list directly and doesn't have to traverse internal links page by page.
Signals updates
The lastmod attribute tells the search engine a page has changed and needs to be re-crawled.
Supports multilingual sites
Through xhtml:link you can declare hreflang relationships directly in the sitemap, without duplicating tags in HTML.
Basic file structure
A minimal valid sitemap is an XML file with an encoding declaration, a root urlset element, and url elements inside it. Each url must contain at least one required attribute — loc.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2026-05-15</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>https://example.com/about</loc>
<lastmod>2026-04-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.7</priority>
</url>
<url>
<loc>https://example.com/blog/article-slug</loc>
<lastmod>2026-05-10</lastmod>
<changefreq>monthly</changefreq>
<priority>0.75</priority>
</url>
</urlset>The file must be UTF-8 encoded. Special characters in URLs are XML-escaped: & becomes &, < becomes <. A single sitemap file is limited to 50,000 URLs and 50 MB uncompressed.
Attributes: loc, lastmod, changefreq, priority
| Attribute | Required | Format | Description |
|---|---|---|---|
| loc | Yes | Absolute URL | Full page address including protocol and domain. Maximum 2048 characters. |
| lastmod | No | W3C Datetime (YYYY-MM-DD) | Last modification date. Google only trusts it when the value is stable and accurate. |
| changefreq | No | always / hourly / daily / weekly / monthly / yearly / never | Hint about update frequency. Google uses it as one signal but does not follow it strictly. |
| priority | No | 0.0 — 1.0 | Relative importance of the URL within your site. Does not affect rankings in search results. |
loc — the only required attribute
The URL must be absolute and match what the server actually returns: if the page is served over HTTPS, loc must use HTTPS. If the site uses www, loc must include www. A mismatch between loc and the real address causes Google to ignore the entry.
lastmod — the most valuable optional attribute
The date must reflect real content changes, not template updates or sitemap regeneration. If you update lastmod on every deploy without changing actual content, Google stops trusting the field and ignores it. Accepted formats: 2026-05-15, 2026-05-15T10:30:00+03:00, 2026-05-15T07:30:00Z.
changefreq — a hint, not a directive
changefreq does not control the crawler's schedule — it is just a hint. Google officially states it uses the value as one of many signals. Yandex treats the field as informational too. Practical defaults: homepage — daily, blog posts — monthly, legal pages — yearly.
priority — relative importance within your site
Priority from 0.0 to 1.0 tells the search engine which pages you consider more important — relative to your own site, not compared to other sites. Setting 1.0 for every page is meaningless: the search engine treats it as no prioritisation at all. A sensible scheme: homepage 1.0, section hubs 0.85, articles and products 0.7–0.75, utility pages 0.5.
Sitemap Index: when one file isn't enough
If you have more than 50,000 pages or the file exceeds 50 MB — you need a Sitemap Index. This is an XML file that references child sitemaps rather than URLs directly. Each child file is a regular urlset, capped at 50,000 entries.
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap/blog.xml</loc>
<lastmod>2026-05-15T00:00:00Z</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap/products.xml</loc>
<lastmod>2026-05-14T00:00:00Z</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap/static.xml</loc>
<lastmod>2026-04-01T00:00:00Z</lastmod>
</sitemap>
</sitemapindex>You can split by content type (blog, products, static pages) or by locale (ru.xml, en.xml). Both approaches are valid. Splitting by locale makes it easy to monitor indexation status per language separately.
Image sitemap
Images can appear in Google Images and bring additional traffic. To help the crawler discover and understand them, add the image extension to your regular urlset. Each url entry can contain up to 1,000 images.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>https://example.com/gallery/city</loc>
<image:image>
<image:loc>https://example.com/images/city-skyline.jpg</image:loc>
<image:title>City skyline at sunset</image:title>
<image:caption>Downtown panorama taken from the observation deck, 2025</image:caption>
<image:geo_location>New York, USA</image:geo_location>
<image:license>https://creativecommons.org/licenses/by/4.0/</image:license>
</image:image>
</url>
</urlset><url> tag and allowed values:| Tag | Required | Description |
|---|---|---|
| image:loc | Yes | Absolute image URL. Can be on a different domain (e.g. CDN). |
| image:title | No | Image title. Equivalent to the img title attribute. |
| image:caption | No | Image caption. Equivalent to the img alt attribute. |
| image:geo_location | No | Geographic location of the subject in the image. |
| image:license | No | URL of the image license. |
Images in the sitemap do not replace the alt attribute in HTML — both work together. The sitemap helps the crawler discover images; alt describes their meaning.
Video sitemap
A video sitemap enables your content to appear in Google Video search and in rich snippets with video previews. The namespace is video.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
<url>
<loc>https://example.com/tutorials/getting-started</loc>
<video:video>
<video:thumbnail_loc>https://example.com/thumbnails/tutorial-1.jpg</video:thumbnail_loc>
<video:title>Getting started with the product</video:title>
<video:description>Step-by-step guide for first-time users</video:description>
<video:content_loc>https://example.com/videos/tutorial-1.mp4</video:content_loc>
<video:duration>183</video:duration>
<video:publication_date>2026-03-10T12:00:00+00:00</video:publication_date>
<video:family_friendly>yes</video:family_friendly>
</video:video>
</url>
</urlset>video:duration is in seconds. video:content_loc must point to a playable file (mp4, webm), not a player page. For YouTube or Vimeo hosted videos, use video:player_loc instead of video:content_loc.
News sitemap
Google News Sitemap is a special format for publishers participating in Google News. It includes only articles published within the last 48 hours. Older content should not be included — it is automatically dropped from the news index.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
<url>
<loc>https://example.com/news/tech-breakthrough-2026</loc>
<news:news>
<news:publication>
<news:name>Example Daily</news:name>
<news:language>en</news:language>
</news:publication>
<news:publication_date>2026-05-15T09:00:00+00:00</news:publication_date>
<news:title>Technology breakthrough set to transform the industry</news:title>
</news:news>
</url>
</urlset>Localisation: hreflang in sitemaps
If your site exists in multiple languages or for multiple regions, you need to connect the corresponding pages through hreflang. This can be done three ways: via link rel tags in HTML, via HTTP Link headers, or directly in the sitemap. The sitemap approach is most practical for large sites because it requires no changes to page templates.
In the sitemap, hreflang is declared through the xhtml namespace. Each page must appear as a url entry, and each entry must list all language variants — including itself.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<!-- Russian version -->
<url>
<loc>https://example.com/blog/seo-guide</loc>
<xhtml:link rel="alternate" hreflang="ru" href="https://example.com/blog/seo-guide"/>
<xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/blog/seo-guide"/>
<xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/blog/seo-guide"/>
</url>
<!-- English version -->
<url>
<loc>https://example.com/en/blog/seo-guide</loc>
<xhtml:link rel="alternate" hreflang="ru" href="https://example.com/blog/seo-guide"/>
<xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/blog/seo-guide"/>
<xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/blog/seo-guide"/>
</url>
</urlset>The x-default value indicates the fallback page — the one shown to users who don't match any explicit language setting. This is typically the site's primary language version.
| hreflang code | Use case |
|---|---|
| ru | Russian language, any region |
| en | English language, any region |
| en-US | English language, US region |
| ru-RU | Russian language, Russia region |
| x-default | Default page for undetermined language/region |
Robots.txt: one line is enough
To let search engines find your sitemap, add a single Sitemap directive to robots.txt. If you use a Sitemap Index, point to it. The crawler discovers all child files through the index automatically.
User-agent: *
Disallow: /admin/
Disallow: /api/
Allow: /
Sitemap: https://example.com/sitemap.xmlIf you have multiple root index files (e.g. separate ones for ru and en), you can list multiple Sitemap directives:
Sitemap: https://example.com/sitemap/index-ru.xml
Sitemap: https://example.com/sitemap/index-en.xmlGoogle and Yandex documentation
Official Google sitemap documentation
Official Yandex Webmaster Sitemap documentation
Yandex supports the sitemaps.org standard and additionally supports the news extension. Yandex reads changefreq and priority but recommends focusing on accurate lastmod — it matters more for re-crawling updated pages.
Checklist
- File is UTF-8 encoded with <?xml version="1.0" encoding="UTF-8"?> on the first line
- All URLs are absolute (with protocol and domain) and match the real canonical
- HTTPS in loc when the site serves HTTPS
- lastmod reflects the real content change date — not the build date
- File does not exceed 50,000 URLs or 50 MB uncompressed
- For large sites — Sitemap Index with child files
- For multilingual pages — xhtml:link with hreflang including x-default
- For media content — appropriate extensions (image, video, news)
- robots.txt contains a Sitemap directive with an absolute URL
- Sitemap submitted in Google Search Console and Yandex Webmaster
- Pages with noindex are not included in the sitemap
- Pages with canonical pointing to a different URL are not included