HOW TO SOLVE DUPLICATE CONTENT ISSUES: THE COMPLETE GUIDE

It is no secret that eliminating duplicate content in website architecture SEO can be one of the most difficult.

Too many content management systems and piss-poor developers build great sites for displaying content. Still, they have little consideration for how that content functions from a search-engine-friendly perspective.

This often leads to SEO problems with duplicate content.

There are two types of duplicate content. Both can cause problems.

Onsite Duplication refers to duplicate content on multiple URLs within your site. This is usually something that the site administrator and web developers can control.
Offsite Duplication refers to when multiple websites publish the same content. This problem can’t be managed directly and requires collaboration with third-parties and the owners of the offending sites.

Is duplicate content a problem?

It is best to explain why duplicate content can be harmful by first explaining why unique-content works well.

Unique content is a great way to stand out from the rest. Your website will stand out if it is unique and original. You are unique.

However, if you use the same content for your products or services or have the content republished elsewhere, you lose the uniqueness of being different.

In the event of duplicate content, individual pages lose their uniqueness.

Take a look at the following illustration. The Duplication of content A on two pages and pages B through Q links to it causes a split in the link value.

Imagine pages B-Q being linked to only page A. Instead of splitting up the value of each link, all of the value would be sent to one URL, increasing the likelihood of the content ranking in search.

All duplicate content, onsite and offsite, competes with itself. While each version might attract links and eyeballs, none will be as valuable as the original.

This all-in-one content-marketing toolkit will help you increase your online visibility, reach new clients, and drive sales.

It will most likely be found if valuable and unique content is only found on one URL. This is because the URL is the sole collector for authority signals related to that content.

Let’s now look at the problems and solutions to duplicate content.

Offsite Duplicate

Three primary sources of offsite Duplication are:

Third-party content that you have republished from your site. This is typically in the form of generic product descriptions provided to you by the manufacturer.
The content you have approved is to be republished on third-party sites. It is often in the form of article distribution.
Content is stolen from your website and republished without your permission.

Content scrapers & thieves

One of the most prolific offenders in duplicate content creation is content scrapers. Spammers and criminals create tools to grab content from other websites and publish it themselves.

These sites mainly use your content to drive traffic and get people to click on your ads.

You can do nothing to stop this except send a copyright violation report to Google in the hope that it will be removed from their search index. Sometimes, however, these reports can become a full-time job.

You can also ignore this content and hope Google can distinguish between your site and the site with the scraped content. It’s a risky strategy, as scraped content can rank higher than its originating source.

To counter the negative effects of scraped material, you can use absolute links (full URLs) in the content to point back to your website. People who steal content don’t usually intend to clean it up. Visitors can, however, follow the links back to your site.

A canonical tag can be added back to the source page. This is a good idea. The canonical tag will indicate to Google that you are the originator if scrapers capture any of the code.

Article Distribution

Every SEO seemed to republish content on “ezines” to build links. This was many years ago. Republishing was abandoned when Google imposed stricter link schemes and content quality guidelines.

It can still be a solid marketing strategy with the right focus. I didn’t say “SEO” strategy but “marketing.”

Generally speaking, if you publish content on other websites, they will want exclusive rights.

Why? They don’t want multiple copies of the same content on the internet, which would devalue what the publisher has to say.

Google has become better at assigning rights to content originators (better but not perfect), so many publishers allow content to be reused on author’s sites.

Is this a problem with duplicate content? It can be in a limited way. There are two versions of the content, each with the potential to generate links.

However, if duplicate copies are controlled and limited, the book’s impact will also be limited. The primary problem is actually with the author, not the secondary publisher.

In general, the first published version will be considered the authoritative version. These publishers will generally get more value than the author’s website that republishes the content in all but a few instances.

Generic Product Descriptions

Product descriptions are the most common form of duplicated content. Almost all sellers reuse them.

Many online retailers offer the same products as other stores, selling exactly the same products. Usually, the manufacturer provides product descriptions. These are then uploaded to each site’s database and displayed on the product pages.

Although the layouts of the pages may differ, most product page content (product description) will remain the same.

Multiply that number by millions of products and hundreds of thousands of websites selling them, and you’ll have a lot of content that is not unique.

What makes a search engine distinguish between different search terms?

It can’t, at a content-analysis level. The search engine must also consider other signals to determine which signal should rank.

Links are one of these signals. You can win the boring content sweepstakes if you have more links.

If you are up against a stronger competitor, it may be difficult to catch them in link building. This brings us back to the search for another competitive advantage.

It is worth the effort to create unique descriptions for each product. Depending on how many products you have, this can be a difficult task, but it will pay off in the end.

Below is an illustration. The yellow pages are the same product but with different product descriptions.

Which one would you choose to rank higher if you were Google?

Unique content will automatically give a page an advantage over duplicate or similar content. This may not be enough to rank higher than your competitors, but it is the foundation to stand out to Google and your customers.

Duplicate Content

Google treats all duplicate content as the same. Therefore, onsite duplicate content is no different from offsite.

However, onsite Duplication is not something you can ignore. This is a way to get your SEO efforts off the ground.

Bad site architecture is usually responsible for duplicate content. Or, more likely, bad website development!

Strong websites are built on a strong site architecture.

If developers don’t adhere to search-friendly best practices, they could lose valuable opportunities to get your content ranked.

Some people argue against good architecture. They cite Google propaganda about how Google “figures it out.”

Google can certainly determine that duplicate content should not be considered the same. However, the algorithms cannot guarantee that they will.

Another way of looking at it is that just knowing someone smart doesn’t mean they’ll be able to protect you from your stupidity. You’re in trouble if you trust Google to do the right thing and Google fails.

Let’s now look at some common duplicate content issues and their solutions.

Problem with Product Categorization Duplication

This type of Duplication is a problem that many ecommerce websites are suffering from. This is often caused by content management systems that organize products by category. A single product can then be tagged in multiple groups.

This is a good thing in and of itself, but it generates unique URLs for every category where a single product appears.

Suppose you are looking for a book about installing bathroom flooring on a home-repair site. These navigation paths might help you find the book you are looking for.

Home > flooring > bathroom > books
Home > bathroom > books > flooring
Home > books > flooring > bathroom

Each of these navigation paths is viable, but the problem is when each path has a unique URL:

https://www.myfakesite.com/flooring/bathroom/books/fake-book-by-fake-author
https://www.myfakesite.com/bathroom/books/flooring/fake-book-by-fake-author
https://www.myfakesite.com/books/flooring/bathroom/fake-book-by-fake-author

Sites like this can create up to 10 URLs for each product, turning a website with 5k products into a site with 45k duplicate pages. This is a problem.

If the product example above generated ten hyperlinks, these links would be split in three ways.

However, if a competitor’s page for the same product has the same number of links but only one URL, which URL will perform better in search results?

It’s the competitors!

Search engines also limit their crawl bandwidth to index unique and valuable content.

If your site contains too many duplicate pages, the engine may stop crawling it before it indexes even a small fraction of your original content.

This means that hundreds of pages are not available in search results. The pages that are indexed will compete with each other.

The Solution: Master Url Categorizations

This problem can be solved by limiting the number of products tagged for one category to avoid multiple tags. This solves the problem of Duplication, but it is not the best solution for shoppers. It eliminates all other navigation options that can be used to find the product they are looking for. This is the one you should remove from your list.

You can also remove all categorization entirely from URLs. This ensures that the product URL remains the same no matter what navigation path is used to locate the product. It might look something like the following:

https://www.myfakesite.com/products/fake-book-by-fake-author

This solves the duplicate without changing how the visitor navigates the products. This method has a downside: you lose the category keywords from the URL. This is a minor benefit, but it can make a difference in SEO.

You can take your solution to the next stage, achieving the highest optimization value and maintaining the user experience by creating an option that allows each product in addition to be assigned to a master category.

If a master category is active, the product can be found via multiple navigation paths. However, the product page is accessed through a single URL that uses the master category.

This could make the URL look like this:

https://www.myfakesite.com/flooring/fake-book-by-fake-authorOR
https://www.myfakesite.com/bathroom/fake-book-by-fake-authorOR
https://www.myfakesite.com/books/fake-book-by-fake-author

Although this solution is the most effective overall, it requires some programming. There is a second, more straightforward solution, but it’s not permanent.

Band-Aid Solution: Canonical Tags

Because the master-categorization option isn’t always available to out-of-the-box CMS or ecommerce solutions, an alternative option will “help” solve the duplicate content problem.

This prevents search engines from indexing non-canonical URLs. This can prevent duplicate pages from being included in the search index. However, it does not solve the problem of splitting the page’s authority. Any link value sent via non-indexable URLs will be lost.

Canonical tags are a better solution. This works similarly to choosing a master category but requires very little programming, if any.

Simply add a field to each product that allows you to assign a Canonical URL. This is a fancy way to say, “the URL you want in search.”

This is the canonical tag:

Regardless of the URL that the visitor is visiting, the hidden canonical tag at each duplicate URL would point back to the same URL.

This tells search engines that they should not index non-canonical URLs and that all other value metrics should be assigned to the Canonical.

Although this usually works, search engines use canonical tags only as a signal. They then decide to apply it or ignore it.

Passing all the link authority to the page you want is possible, but it may not be possible. You might also not include non-canonical pages in your index.

Although I recommend using a canonical label, it is unreliable and should be considered a placeholder until a more formal solution can be found.

Redundant URL Duplication

The browser access to pages is one of the most fundamental issues in website architecture.

You can access almost all pages of your website using the same URL. If you don’t check, every URL will lead to the same page with different content.

You can access the homepage using only four URLs.

http://site.com
http://www.site.com
https://site.com
https://www.site.com

You can also get an extra version of any URL by adding a trailing space to internal pages

http://site.com/page
http://site.com/page/
http://www.site.com/page
http://www.site.com/page/
Etc.

There can be up to eight alternative URLs per page. Google should be able to tell which URL should be treated as the same.

The solution: 301 redirects and internal link consistency

Apart from the canonical tags I discussed above, this solution ensures that you have alternate URLs that redirect to the canonical URL.

This is not a problem with your home page. This applies to all URLs. The redirects that are implemented should therefore be global.

Each redirect should be forced to the canonical URL. For instance, if the canonical URL is https://www.site.com, each redirect should point there. Many people make the error of adding redirect hops that may look something like this.

Site.com > https://site.com > https://www.site.com
Site.com > www.site.com > https://www.site.com

Instead, redirects should look something like this:

http://site.com > https://www.site.com/
http://www.site.com > https://www.site.com/
https://site.com > https://www.site.com/
https://www.site.com > https://www.site.com/
http://site.com/ > https://www.site.com/
http://www.site.com/ > https://www.site.com/
https://site.com/ > https://www.site.com/

You can speed up page loading, reduce server bandwidth and make it less likely that something will go wrong.

You will also need to ensure that all links within the site point to this canonical version.

The redirect should resolve the duplicate problem. However, redirects may fail if there is an issue on the server side or in the implementation.

Even if this happens temporarily, linking only to the internal canonical pages can help stop a sudden surge in duplicate content issues.

The Problem with URL Parameters and Query Strings

Session IDs were a problem for SEOs years ago.

Session IDs are almost obsolete today, thanks to modern technology. But, there is another problem: URL parameters.

Parameters can be used to pull new content from the server. They are usually based on one filter or selection.

These two examples show alternate URLs to a single URL site.com/shirts/.

The first URL displays shirts filtered by style, color, and size. The second URL displays shirts that have been sorted by price. Next, several products will be displayed per page.

Site.com/shirts/?color=red&size=small&style=long_sleeve
Site.com/shirts/?sort=price&display=12

These filters are all that search engines can use to find three URLs. However, the order of the parameters can change depending on how they were selected. This means that you may get multiple more accessible URLs.

Site.com/shirts/?size=small&color=red&style=long_sleeve
Site.com/shirts/?size=small&style=long_sleeve&color=red
Site.com/shirts/?display=12&sort=price

This:

Site.com/shirts/?size=small&color=red&style=long_sleeve&display=12&sort=price
Site.com/shirts/?display=12&size=small&color=red&sort=price
Site.com/shirts/?size=small&display=12&sort=price&color=red&style=long_sleeve
Etc.

This can result in a lot of URLs. Most of them will not pull any unique content.

The style is the only parameter you may want to use in sales content. The rest? Not so much.

The solution: Parameters to Filters, not Legitimate Landing Pages

Planning your URL structure and navigation strategically is crucial to avoid duplicate content issues.

This includes learning the differences between a legitimate landing page and one that allows visitors to filter their results.

These URLs should be treated accordingly.

This is how landing page URLs (and canonical URLs) should look:

Site.com/shirts/long-sleeve/
Site.com/shirts/v-neck/
Site.com/shirts/collared/

The URLs for filtered results would look something like the following:

Site.com/shirts/long-sleeve/?size=small&color=red&display=12&sort=price
Site.com/shirts/v-neck/?color=red
Site.com/shirts/collared/?size=small&display=12&sort=price&color=red

Two things can be done with URLs that are correctly constructed:

The correct canonical tag should be added (everything before the “?” in the URL).
Google Search Console: Tell Google to ignore these parameters

You won’t need to worry about Google crawling a valuable parameter if you use it only for filtering or sorting content.

The canonical tag is a signal. You must still complete step 2 to get the best results. This only applies to Google. The same applies to Bing.

Pro Developer Tip Search engines often ignore anything to the right or left of the URL’s pound symbol “#.”

You don’t need to worry about the Canonical only being a temporary solution if you put it into every URL before any parameter.

Site.com/shirts/long-sleeve/#?size=small&color=red&display=12&sort=price
Site.com/shirts/v-neck/#?color=red
Site.com/shirts/collared/#?size=small&display=12&sort=price&color=red

If any search engine attempted to access the URLs, it would only index the Canonical Part of the URL and ignore all the rest.

The Problem with Ad Landing Pages & A/B Testing Duplication

Marketers often create multiple versions of the same content for landing pages or A/B/multivariate testing purposes.

While this can provide valuable data and feedback, it can also lead to duplicate content issues if the pages are available for search engines to index and spider.

NoIndex is the Solution

Instead of using a canonical page tag to point to the master page’s page, adding a no-index tag to each page to keep them from the search engines’ index is better.

These pages are often considered orphans and don’t have any links from within site. Search engines will still find them, however.

The canonical tag transfers page authority and page value to the primary page. However, these pages should not collect valuable information, so keeping them out of the index is best.

Duplicate Content isn’t (Much) of a Problem

A duplicate content penalty is one of the most popular SEO myths.

It isn’t.

There is no penalty for not filling your car with gas and leaving it empty.

Google may not actively penalize duplicate content, but there are natural consequences.

Marketers have greater freedom to choose the consequences they accept without fear of penalties.

Although you should eliminate all duplicate content on your site and not just patch it up, offsite Duplication may have more value than the consequences.

Republishing valuable content off-site can help build brand recognition in a way you can’t publish it yourself. This is because offsite publishers can reach a wider audience and have greater social reach.

Although thousands may see your content, it could reach hundreds of thousands of published offsite.

While many publishers expect you to retain exclusive rights to their content, some allow you to republish it on your site after a brief waiting period. This gives you a chance to increase your exposure and build your audience by republishing your website’s content later.

This type of article distribution is not recommended for everyone. The value of your content decreases exponentially if it is sent to hundreds of sites for republishing.

It is not likely to help your brand, as sites that publish duplicate content in mass quantities are unlikely to be of any value.

Consider the pros and cons of publishing your content in multiple places.

If the cost of Duplication and branding is greater than the authority you would get from unique content on your site, consider a more measured republishing strategy.

However, the keyword is.

You don’t want a site with only duplicate content.

This is when you start to lose sight of the brand’s value.

Understanding the issues, solutions, and sometimes the value of duplicate content will help you eliminate Duplication and pursue the Duplication you want.

You want your site to be known for its unique, strong content and to then make use of that content to maximize its value.

HOW TO SOLVE DUPLICATE CONTENT ISSUES: THE COMPLETE GUIDE

Is duplicate content a problem?

Offsite Duplicate

Content scrapers & thieves

Article Distribution

Generic Product Descriptions

Duplicate Content

Problem with Product Categorization Duplication

Redundant URL Duplication

The Problem with URL Parameters and Query Strings

The Problem with Ad Landing Pages & A/B Testing Duplication

Duplicate Content isn’t (Much) of a Problem

You May Also Like

Replicating a WordPress Website: Preserving Features, Diverging Content

How To Back Up a WordPress Site: 4 Reliable Methods

Troubleshooting WordPress Redirecting to Install.php After Migration

Leave a Reply

Cancel reply