Robots.txt
A root-level text file that tells crawlers which paths may be fetched and where sitemaps live. It steers crawling—it is not a reliable way to hide URLs from search results.
Robots.txt implements the Robots Exclusion Protocol—User-agent groups, Disallow/Allow prefix rules, optional Sitemap lines, and occasional vendor extensions. One mis-scoped rule can block large parts of a site.
Purpose
Crawlers fetch robots.txt to learn path-level fetch permissions before large-scale crawling. Each hostname (and scheme) needs its own file at the web root; CDNs and staging hosts are easy to misconfigure.
Directives
Disallow prefixes block fetches; Allow can reopen nested paths depending on crawler precedence rules. List absolute sitemap URLs with repeated Sitemap lines when needed. Wildcards are supported in limited ways—verify against Google's spec before relying on them.
- Order rules thoughtfully when multiple prefixes overlap.
- Never rely on robots.txt for secrets—use authentication.
- Treat staging domains as first-class citizens with explicit policies.
Example file
Below is a teaching snippet with comments and a Sitemap line. This site's live rules are always at /robots.txt on the same origin you are browsing from—the path is identical on local and production hosts; only the hostname in the address bar changes. That file shows what is disallowed from crawling (here, /api/). For absolute URLs elsewhere (e.g. the Sitemap line emitted by Next), set NEXT_PUBLIC_SITE_URL in your deploy environment.
User-agent: *
Allow: /
Disallow: /api/
# Teaching: block a tree but reopen a branch
# Disallow: /admin/
# Allow: /admin/public/
Sitemap: https://www.example.com/sitemap.xmlCommon mistakes
- Blocking CSS/JS so Googlebot cannot render the page faithfully.
- Typos in paths that silently fail to match intended URLs.
- Post-migration robots regressions that block entire sections.
- Confusing robots.txt with meta robots or X-Robots-Tag semantics.
Practice
- Keep rendering assets crawlable unless you accept partial rendering.
- Reference sitemap indexes from robots.txt and retest after releases.
- Track robots.txt changes in version control alongside deploy notes.
Common questions
Discuss your project?
Share your goals and website context — I will suggest a practical next step.