Does robots.txt remove pages from Google?

No—it manages crawling. Use noindex, appropriate HTTP statuses, authenticated walls, or Search Console removals depending on intent.

Multiple robots files?

Each hostname needs its own robots.txt at its root; subdomains are separate hosts.

Does Bing follow the same rules?

Core REP syntax aligns, but always verify vendor-specific notes for wildcards and crawl-delay style directives.

Should Disallow end with a slash?

Prefix matching is literal—/admin blocks /administration unless you scope rules carefully. Test in crawl reports.

Robots.txt — What is it?

A root-level text file that tells crawlers which paths may be fetched and where sitemaps live. It steers crawling—it is not a reliable way to hide URLs from search results.

Purpose

Crawlers fetch robots.txt to learn path-level fetch permissions before large-scale crawling. Each hostname (and scheme) needs its own file at the web root; CDNs and staging hosts are easy to misconfigure.

Blocking a URL with robots.txt prevents crawling but not necessarily indexing—Google may still list the URL if links exist, sometimes without a snippet.

Directives

Disallow prefixes block fetches; Allow can reopen nested paths depending on crawler precedence rules. List absolute sitemap URLs with repeated Sitemap lines when needed. Wildcards are supported in limited ways—verify against Google's spec before relying on them.

Order rules thoughtfully when multiple prefixes overlap.
Never rely on robots.txt for secrets—use authentication.
Treat staging domains as first-class citizens with explicit policies.

Example file

Below is a teaching snippet with comments and a Sitemap line. This site's live rules are always at /robots.txt on the same origin you are browsing from—the path is identical on local and production hosts; only the hostname in the address bar changes. That file shows what is disallowed from crawling (here, /api/). For absolute URLs elsewhere (e.g. the Sitemap line emitted by Next), set NEXT_PUBLIC_SITE_URL in your deploy environment.

TXT

User-agent: *
Allow: /
Disallow: /api/

# Teaching: block a tree but reopen a branch
# Disallow: /admin/
# Allow: /admin/public/

Sitemap: https://www.example.com/sitemap.xml

Common mistakes

Blocking CSS/JS so Googlebot cannot render the page faithfully.
Typos in paths that silently fail to match intended URLs.
Post-migration robots regressions that block entire sections.
Confusing robots.txt with meta robots or X-Robots-Tag semantics.

Practice

Keep rendering assets crawlable unless you accept partial rendering.
Reference sitemap indexes from robots.txt and retest after releases.
Track robots.txt changes in version control alongside deploy notes.

Robots.txt

Purpose

Directives

Example file

Common mistakes

Practice

Common questions

Discuss your project?