What is Duplicate Content and Why You Should Avoid it

Blog Post Details


What is duplicate content? 


Duplicate content is content on the internet that appears on more than one place. Unique content is content that only appears on one web address. If the same content appears on more than one web address, it means it's duplicated. The duplication of content isn't technically a penalty, but can sometimes impact search engine rankings. When multiple pieces of content are found by Google, they call it 'appreciably similar'. Because the content appears twice, it is difficult for search engines to decide which version is the most relevant to a search query.


 duplicate content


What is the issue with Duplicate Content?


For search engines:

Duplicate content can cause these three main issues for search engines:


1. Search engines can't work out which version(s) to include/exclude from their indices.


2. Search engines aren't sure whether to direct the link metrics (trust, authority, link equity, anchor text) to one webpage, or keep it separated between multiple webpages where the duplicated content is apparent.


3. Search engines don't know which version(s) to rank for search query results suitably. This means they have to decide which page is the most relevant based on the query.


For site owners:

When duplicate content is recorded, site owners can experience a drop in rankings and a decrease in traffic. These issues often originate from two problems:


1. Search engines aren't programmed to show multiple versions of the same content and are forced to decide which page is most likely the best result based on the query. Due to this situation, this reduces the visibility of each of the duplicates, therefore, reducing traffic for both pages. 


2. Link equity (aka Link Juice) can be diminished because other sites have to choose between the duplicates too. In an ideal situation, all inbound links should be pointing to one piece of content which is on one page, but instead, they link to multiple pieces on multiple pages, spreading the link equity among the multiple duplicate pages. Because inbound links are a ranking factor, they impact the search visibility of a piece of content.


How duplicated content happens?

In most duplicate content cases, website owners don't purposely create duplicate content. However, that doesn't imply it's not out there. 


1. Variations of URL's

Many factors can create variations of URL's, such as click tracking and analytics codes; these can cause duplicate content issues. Duplicate content in URL's can be a problem caused not only by the additional tracking codes/parameters themselves but also what order the parameters appear in the URL.


For example:

 duplicate content URL example


Likewise, session IDs are a common cause of duplicate content. This results when each user that visits a website is assigned a different session ID that is stored in the URL. 


2. WWW vs non-WWW pages or HTTP vs HTTPS pages

Having separate versions of a site such as "www.website.com" and "website.com" that are both live and have the same content mean you've unknowingly created duplicates of each of those pages. The same rule applies to sites that have http:// as well as https://. If both versions of a page are live and visible to search engines, you're likely to run into duplicate content issues. 


3. Scraped or copied content

Content includes blog posts and editorial content, but also product information pages and sections. Scrapers republishing your unique blog content on their sites may be a more familiar source of duplicate content.


There's a common problem for many e-commerce sites, as well: product information descriptions. If multiple websites sell the same items, it's likely that they use the manufacturer's provided descriptions of those items. By using the same product description as every other website creates duplicate content in multiple locations across the internet.


How to fix duplicate content

Fixing duplicate content means one thing only: finding out which page is the right one. When the content on a site can be found at multiple URLs, it should be canonicalised for search engines. There are three main ways to fix duplicated content. By using a 301 redirect, performing a rel=canonical attribute, or using the parameter handling tool found in Google Search Console.


301 redirect

In many scenarios, the best route to combat duplicate content is to arrange a 301 redirect from the non-preferred URL from the "duplicate" page to the preferred URL original page. When multiple pages from the same site are competing with each other, combining them by creating a 301 redirect to a single page will not only stop competing with one another; they also become stronger as no SEO values are lost, more relevant and more accessible overall due to increased search visibility. This positively improves the "correct" page's ability to rank strong.


301 redirect duplicate content

Meta Robots Noindex


One meta tag that can deal with duplicate content is meta robots when used with "noindex, follow." Often named Meta Noindex,Follow and formally known as content=”noindex,follow” this meta robots tag is added to the HTML head of an individual page that should be excluded from a search engine's index.


The meta robots tag enables search engines to crawl the links on a webpage while keeping them from including the links in their indices. It's important to note that the duplicate page can still be crawled, although you're telling Google not to index it. This is because Google cautions against restricted crawl access to duplicate content. The reason is that search engines like to see everything in case an error is in your code. It allows them to make a "judgment call" in necessary situations.


Applying meta robots is a particularly good solution for duplicate content issues related to pagination. Depending on the URL structure and the cause of your duplicate content, setting up a preferred domain or parameter handling may provide a solution.


The main disadvantage of applying parameter handling as your original method for dealing with duplicate content is that the changes you execute will only operate for Google. Any rules made using Google Search Console does not impact how other search engine's crawlers interpret your website. For this method to work across all search engines, you'll need to use the webmaster tools for other search engines.




Another choice to deal with duplicate content is to use the rel=canonical attribute. A rel=canonical informs search engines to recognise that a given page is a copy of another URL. By doing this, all of the links, content metrics, and ranking power are transferred to the original specified URL. 


The rel=canonical attribute needs to be added to the HTML head of each duplicate version of a page, with the "original URL" piece replaced by a link to the original (canonical) page. The attribute carries roughly the same amount of link equity as a 301 redirect. However, because it's implemented on the page rather than at the server, it typically takes less time to implement.


rel=canonical duplicate content



Duplicate content isn't something you should ignore. It confuses search engines, decreases potential traffic and can negatively impact search rankings when not addressed. Duplicate content is something that requires regular monitoring, but it's easily fixable and rewarding. Ensuring your site is duplicate content free is an important aspect of overall site health and guarantees no search engine query disadvantages. 


Enjoyed this blog? Feel free to check out our constantly expanding blog channel