Canonicals, Duplicates, and Parameters
Control which version of your pages Google indexes.
Duplicate content does not cause penalties. But it does waste crawl budget and dilute signals. Canonicals and parameter handling give you control.
Duplicate content is not a penalty
There is a persistent myth that duplicate content causes a Google penalty. It does not. Google does not penalize sites for having duplicate content. What it does is choose one version to index and ignore the others.
The problem is not punishment. The problem is control. If you do not tell Google which version is the right one, Google decides for you. And Google does not always pick the version you want.
How duplicates happen
URL parameters. Sorting, filtering, tracking, and session parameters create multiple URLs for the same content. /products?sort=price and /products?sort=name show the same products in different orders. Google may treat each as a separate page.
WWW vs non-WWW. example.com and www.example.com are technically different URLs. If both resolve to the same content, you have duplicates.
HTTP vs HTTPS. Same issue. If both protocols serve the same content, you have duplicates.
Trailing slashes. /about and /about/ are different URLs. If both serve the same content, you have duplicates.
Pagination. Page 1 of a paginated list often has the same URL as the unpaginated version. /blog and /blog?page=1 may show identical content.
Syndication. If your content appears on other sites (syndication, republishing), Google needs to know which version is the original.
The canonical tag
The rel="canonical" tag tells Google which URL is the preferred version of a page. Place it in the head of every page, pointing to the URL you want indexed.
Self-referencing canonicals. Every page should have a canonical tag pointing to itself. This prevents issues when parameters or other URL variations are accessed.
Cross-domain canonicals. If your content is syndicated on another site, the syndicated version should have a canonical pointing back to your original.
Important: canonical is a hint, not a directive. Google can and does ignore canonical tags when it disagrees. If the canonical points to a page with completely different content, Google will ignore it. If the canonical page is not indexable, Google will ignore it.
Parameter handling
For URL parameters that create duplicate content:
Option 1: Canonical tags. Set the canonical on parameterized URLs to point to the clean URL. /products?sort=price canonicals to /products.
Option 2: Robots.txt. Block crawling of parameterized URL patterns. This saves crawl budget but does not consolidate signals.
Option 3: Noindex. Add noindex to parameterized pages. This prevents indexing but still allows crawling.
Option 4: Clean URL design. Avoid parameters entirely by using clean URL patterns. Instead of /products?category=shoes, use /products/shoes/.
The best approach depends on your site. For most sites, self-referencing canonicals plus clean URL design handles the majority of cases.
When Google ignores your canonical
Google may override your canonical when:
- The canonical URL returns an error or redirect
- The canonical URL has a noindex tag
- The content on the two URLs is substantially different
- Google believes a different URL is a better canonical based on its own signals
If Google is choosing the wrong canonical, check the URL Inspection tool in GSC. It shows which URL Google selected as canonical and why.
Takeaway
Add self-referencing canonical tags to every page on your site. Handle URL parameters with either canonical tags or clean URL design. Check GSC periodically to verify Google is respecting your canonicals. If it is not, the canonical tag is probably misconfigured or the content relationship is unclear.