Identifying and Fixing Duplicate Content
Duplicate content most commonly refers to pieces of content within or across domains that are contained within either the exact same, or very similar, URLs. The majority of the time, duplicate content is not intended and is often overlooked by many organisations.
However, there are few cases in which duplicate content is used deliberately across domains in order to manipulate search engine rankings and increase exposure. Search engines such as Google are frequently penalising sites that are involved in this kind of activity, as the user experience is substantially decreased with duplicate content. Below are a few examples of common unintended duplicate content:
– URL parameters
– Two pages of same content, one still having ‘php’ in the URL
– Printer only versions of web pages
– Session ID pages
Spotting Duplicate Content
It is good practice to ensure that you are spotting duplicate content before Google has the chance to index your site. Below I have listed a few tools to use to spot duplicate content:
Google Webmaster tools
Using Google webmaster tools is an extremely easy way of finding pages that offer the same page titles and Meta descriptions under different domains. All you have to do is click on Search appearance at the top of the page, then click through to HTML improvements. From there, Webmaster tools will then show you the number of duplicates, and which pages have duplicate page titles and Meta descriptions.
Screaming frog is also another great tool, similar to Google Webmaster, to use when looking for duplicate content. After downloading, this software will allow you to crawl 500 of your indexed pages and analyse them. Similarly to Webmaster tools, screaming frog allows you to see duplicate Meta descriptions and page titles. However, by clicking on URL and selecting duplicate, this tool also allows you to see the exact duplicate URLs.
Fixing the Duplicate Content
There are many ways to prevent or fix duplicate page issues. Below I have listed 4 great ways to stop search engines finding duplicate content on your site:
Apply 301 redirect
The basis of a 301 redirect is to point the search engine away from an earlier structured URL to the newly designed URL. This is a way of stopping the search engine spiders or search bots from perceiving both URLs as duplicate content. If you happen to use the tools above and find that you have duplicate URLs, try having a go at implementing some 301 redirects. For more information on how to create 301 redirects, visit Google support.
Use Canonical tags
To enable 301 directs, you will need to access the server settings. I know that not everyone is tech savvy enough to do this themselves, so another way of preventing duplicate content without implementing a server change is to use canonical tagging.
Canonical tagging, like the 301 redirects, are pieces of code entered within the back end of the website that directs search engines such as Google, Bing and Yahoo away from duplicate content and toward the preferred URL. Canonical tagging doesn’t remove the duplicate content, yet simply redirects search engines away and lets them know that you want the wrong URL to point toward the correct URL. Below is an example of a canonical link:
<link rel=”canonical” href=http://example.com/keyword-x/
All you need to do for this process is place the correct URL within the ‘href’ section of the link, meaning search engines will now see that you want to redirect the current page to this URL, thus transferring most of the link value to your canonical page. For more information on canonical tags click here.
Google Webmaster tools and Meta tags indicating preferred indexed pages
Within Google Webmaster tools, there is an option to set preferred URLs. To access this, click on the configuration>site link>preferred domain; here you can choose between pages and URLs you would like to be preferred and which ones should be dropped. You can also use a similar method to set parameters to get rid of duplicate content from Google’s indexing however, I would recommend not doing so if you are not comfortable using webmaster tools, as the slightest mistake could result in major pages being dropped from Google’s index.
You can also implement certain Meta tags into your coding to do the exact same thing. For example, including the following into your code will tell the search engine that you want it to be seen by the user, but not indexed:
<Meta Name=”Robots” Content=”noindex, nofollow”>
Minimize similar content
As artless as it sounds, simply make sure you are not producing too many pages that have similar content, both on and off page. If you do find that some of your pages have similar content, try expanding on either of the pages, or even merge the two pages into one.
For instance, if you have a website recommending restaurants and have dedicated a separate page each to a couple of different yet similar Indian restaurants, try merging the two pages together, offering the similar descriptions but recommending the two restaurants independently. Alternatively, keep the pages separate but go deeper into describing each of the restaurants, making sure you are providing dissimilar content for each page.
When polishing up the on page content, make sure you are checking the Meta descriptions and page titles to ensure they too aren’t too similar.
Duplicate content happens more frequently than you may think; you may even have some on your own site right now. You need to make sure you are keeping a keen eye out and reviewing your site regularly for duplicate content, then implementing the methods such as the ones mentioned above to get yourself back on track.