Pagination in SEO

How pagination affects crawl budget and indexing, when to use noindex vs canonical, with code examples for HTML, PHP and sitemap.

Pagination splits large content lists into separate pages with unique URLs: /catalog/page/2, /catalog/page/3. Simple in concept, but pagination is one of the top crawl budget killers on large sites. Googlebot crawls hundreds of templated pages instead of product cards — and part of the catalog never makes it into the index.

What is pagination

Pagination splits a large set of content — products, articles, reviews — into sequential pages with unique URLs. Common patterns: /catalog/?page=2, /catalog/page/2/, /catalog/2/. Search engines treat each URL as a separate document.

Diagram: Googlebot crawls pagination pages consuming crawl budget — Googlebot crawls pagination pages sequentially. On large catalogs, deep pages receive fewer crawls — new products get indexed with significant delays.

Three pagination formats behave fundamentally differently from an SEO perspective:

Format 1Numbered pages

Buttons «1 2 3 ... 100», each page has a unique URL. Flexible SEO control: noindex, canonical, or full indexing.

Format 2Infinite scroll

Content loads on scroll without URL changes. Without JavaScript rendering, Googlebot only sees the first screen. Requires an HTML fallback.

Format 3"Load more" button

URL doesn't change, content adds on click. Same crawling problem — without JS rendering only the initial content is visible to Googlebot.

Crawl budget and duplicate content

A store with 500 products and 20 per page generates 25 pagination URLs per category. With 50 categories that's 1,250 URLs Googlebot must crawl before reaching individual product pages.

Real case. Electronics e-commerce store, 8,000 SKUs. Pagination without noindex — 400+ category URLs. New products took 3–4 weeks to get indexed. After adding noindex, follow to pages 2+, indexing time dropped to 3–5 days.

Three specific problems pagination creates for SEO:

Duplicate content. Title and Description for the same category are identical across /page/1 and /page/15. The section heading repeats. Google sees similar pages and doesn't know which one to rank.
PageRank dilution. Internal links are distributed across dozens of pagination URLs. Products on page 10+ receive minimal internal authority.
Crawl budget waste. On small and mid-size sites, Googlebot burns crawl budget on templated listing pages instead of priority content.

Pagination format comparison

Format	Crawlability	Duplicates	SEO solution
Numbered pages	Full crawl of all URLs	Yes (Title/Description)	noindex / canonical / full indexing
Infinite scroll	First screen only (no JS)	No (single URL)	HTML fallback + rendering
"Load more"	Visible content only	No (single URL)	SSR or JS rendering

SEO configuration strategies

Strategy selection comes down to one question: do pages 2+ have standalone search value? If users never land on /catalog/page/7 from organic search — there's no point spending crawl budget on it.

Decision tree: which pagination SEO strategy to choose — noindex, canonical or full indexing — Strategy selection: blogs close pagination with noindex; large e-commerce indexes fully with unique meta tags; mid-size catalogs use canonical pointing to page 1.

Strategy 1 — noindex pages 2+. Best for blogs, news and content sites. Pagination pages have no standalone search value — noindex, follow removes them from the index while preserving link crawling. Crawl budget flows to actual content.

Strategy 2 — rel=canonical to page 1. Softer than noindex: pages 2+ stay accessible (for direct links, ads) but canonical points to /page/1. Google treats the series as one section with page 1 as the canonical version.

Strategy 3 — full indexing. For large e-commerce where each pagination page contains unique products that need to be indexed. Requires unique Title and Description per page — e.g. «Buy Laptops — Page 3» — plus a self-referencing canonical on each page.

Quick test: open GSC → Performance and filter URLs by /page/. If pagination pages drive any organic traffic — think twice before applying noindex.

Technical implementations

Most tasks require just three <head> constructs: <meta name="robots">, <link rel="canonical">, and sitemap entries. Here are code examples for each.

rel=next/prev: deprecated attribute

Before 2019, Google supported rel="next" and rel="prev" to identify pagination series. In March 2019, Google announced it had stopped using these attributes. Adding them for Google is pointless. Bing formally supports them, but Bing's share in most markets is marginal.

Source: Google Search Central Blog, March 2019 — "An update on rel-prev-next". Google stated: "we've been using these as a hint, but dropping them doesn't affect indexing".

noindex for pagination pages

Add to <head> on pages 2+. Key detail: use noindex, follow — not noindex, nofollow. With nofollow, Googlebot won't follow links from pagination pages and won't index the products or articles they link to.

HTML

<!-- Page /catalog/page/2/ and above -->
<head>
  <meta name="robots" content="noindex, follow">
</head>

<!-- Page /catalog/ (first page) — no noindex -->
<head>
  <meta name="robots" content="index, follow">
</head>

In WordPress, implement this via a wp_head hook checking is_paged():

PHP

// functions.php
add_action('wp_head', function() {
    if (is_paged()) {
        echo '<meta name="robots" content="noindex, follow">' . "\n";
    }
});

rel=canonical to page 1

When pages 2+ must remain accessible but shouldn't duplicate the section, canonical points to the first page. Page 1 gets a self-referencing canonical.

HTML

<!-- /catalog/page/3/ -->
<head>
  <link rel="canonical" href="https://example.com/catalog/">
</head>

<!-- /catalog/ — self-canonical (required) -->
<head>
  <link rel="canonical" href="https://example.com/catalog/">
</head>

Canonical is a hint, not a directive. Google may ignore it and index page 3 as a standalone document. For a guaranteed outcome, use noindex.

Pagination in sitemap.xml

Rule: only include pages in sitemap.xml that you want indexed. Pages with noindex in the sitemap create conflicting signals and waste crawl budget.

XML

<!-- sitemap.xml — indexable pages only -->
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

  <!-- Page 1 — always include -->
  <url>
    <loc>https://example.com/catalog/</loc>
    <changefreq>daily</changefreq>
    <priority>0.8</priority>
  </url>

  <!-- Pages 2+ — only with full indexing strategy -->
  <url>
    <loc>https://example.com/catalog/page/2/</loc>
    <changefreq>weekly</changefreq>
    <priority>0.4</priority>
  </url>

</urlset>

Common pagination mistakes

Most mistakes come from inattentive implementation rather than conceptual misunderstanding. CMS sitemap generators and meta tag templates often fail to account for pagination:

noindex lands on page 1

The if page > 1 condition is written incorrectly — the main category page gets noindex too. Section traffic drops to zero within weeks.

Mistake 1

noindex pages included in sitemap

Sitemap generator automatically adds all URLs. Google receives conflicting signals: sitemap says "index it", meta robots says "no". Crawl budget gets wasted resolving the conflict.

Mistake 2

Canonical points to a non-existent URL

Page 1 is at /catalog/ but canonicals from pages 2+ point to /catalog/page/1/ (which returns 404). Google gets a broken canonical chain.

Mistake 3

Infinite scroll without HTML fallback

All pagination is JavaScript-only, no static URLs. Googlebot sees only the first N items — the rest of the catalog never gets indexed.

Mistake 4

Identical Title across all pagination pages

Under full indexing strategy, all pages share one Title tag. Google sees duplicate content and reduces the section's relevance. Each page needs a unique meta tag with the page number.

Mistake 5

Most of these errors surface in Google Search Console → Coverage: pages marked noindex, excluded URLs, and canonical conflicts are visible in the report immediately.

It depends on site type. For blogs and content sites — yes, pages 2+ offer no standalone search value and consume crawl budget. For large e-commerce — no: each page contains unique products that need to be indexed.

No. Since March 2019, Google officially stopped processing rel=next and rel=prev. Bing formally supports them, but for Google these attributes have no effect on indexing.

noindex is a directive — the page is guaranteed not to appear in the index. Canonical is a hint — Google may ignore it. If you need a guarantee, use noindex. If pages need to remain accessible for direct links or ad campaigns, canonical is softer.

Negatively without HTML fallback. Without JavaScript rendering, Googlebot only sees the first screen's content. Solution: implement static URLs for each content batch, or use server-side rendering (SSR).

Only if they are open for indexing. Pages with noindex in the sitemap create conflicting signals and waste crawl budget. Only include URLs in the sitemap that you want Google to index.