If you are looking for a tool to detecting duplicate content, copied or plagiarized, it is very likely because you have already realized that information replication is a frequent problem in the Internet ecosystem. Sometimes it is a fact that is produced intentionally but can also happen unintentionally for various reasons. The issue is that it can have detrimental consequences in different aspects, so that different utilities or tools have emerged to find duplicate information relatively easily.
Below are some of the best tools or options you have available to find duplicate content from a website externally on other Internet sites or internally between the different addresses accessible on your own website, allowing us to act subsequently to solve it.
Consequences of Duplicate Content for SEO
Focusing on SEO, duplicate content can be quite detrimental to search engine rankings like Google, Bing, Yandex, etc. The different URLs of your web accessible by the robots of the search engines should have mainly unique and original contents that generate a positive response in the users. So it would be advisable to periodically monitor the information on your website to make sure you have this under control.
In case of detecting a high percentage of duplicate information for any reason, it should be attempted to resolve it as soon as possible to further improve the positioning in the SERPs or to avoid penalties for content considered to be of low quality or “Thin Content”. You can find more information on this Google help page.
Detect duplicate content on your website (On-Site Duplicity)
The internal duplicate content in a web is one of the cases that most happens if a content manager (CMS) or eCommerce platform like PrestaShop, WooCommerce + WordPress, Magento, Joomla !, Drupal, etc. is not properly controlled / configured. Even if you have a static web without using any of the platforms mentioned above, duplicate content may be generated without explicitly copying / replicating information; Tracking parameters from Google Analytics or other web analytics platforms could cause URLs with duplicate content, and search engine bots could be accessing to crawl / index the different addresses with those parameters, leading to duplicate information.
Especially in online stores, it is very common to find several parameters that are added to the URLs, for example to order or filter product listings, generating multiple additional URLs with the same information in titles, descriptions and other contents of each page. Additionally there are many more causes that could be causing duplication of the information for search engines (no “canonical” tags, duplication of the domain with / without www, inadequate architecture of tags / categories, incorrect handling of multi-languages / translations …).
Considering the aforementioned, it is important to monitor periodically if they are generating content with a high percentage of duplication in different directions of your website and if they are accessible by search engine robots. Fortunately, we have several tools available to perform these tasks more easily:
- SemRush: Use the Tools menu -> Site Audit.
- Virante Tools
- Hive Digital Duplicate Content Tool
Screaming Frog SEO Spider Tool: Installs, analyzes and sorts results to find duplicates according to titles, descriptions, etc.
Google Advanced Search Operators: Use the site operator: followed by your domain to see everything Google has indexed from your website and look for possible duplications.
Google Webmaster Tools / Search Console: Monitor the Search Aspects menu -> HTML enhancements within your Google Search Console panel (formerly called GWMT).
Detect plagiarized or duplicate content across the Internet
The content of our website that is duplicated, copied or plagiarized on other Internet sites, in most cases is beyond our control and is not easy to detect. Even so we can use some utilities to try to locate it in other Internet websites and to be able to act accordingly in each case (taking advantage of to obtain mentions / links, making sure that search engine robots first index our original contents, applying legal actions by stolen content with copyrights, etc.):
- Dupli Checker
- Dupe Free Pro
- Seotoolnetwork Plagiarism Checker
- SEOmatica Plagiarism Detector
PlagiumBot: Software to install and automate quoted text searches in Google to save yourself doing it manually using the advanced search operators, as indicated in the following section.
Google Advanced Search Operators: Simply manually search for some original phrases in your content by enclosing them in quotation marks.
Detect duplicate, same or similar images on the Internet
If you want to focus exclusively on detecting duplicate or copied photos on other Internet sites, there are also very useful tools for this functionality. Specifically we can use what is called “Reverse Image Search” and as bonus-extra in this article you have a list with some of the tools that I have also used on more than one occasion for this functionality. If you do some social action, you will see my 4 favorite tools to find duplicate photos on the Internet:
- Google Images: Use the camera icon that appears in the search box.
- Yandex Images: Searches using images using the camera icon that appears in the search box.
- Bing Search By Image: Use the “Image Matching” option next to the search box.
Important Considerations When Checking Copy or Duplicate Content
These tools are very helpful to know if you have duplicate content but are not 100% effective and there could be some cases undetected. It would be advisable to test or use several of the tools listed above to verify results more reliably. The disadvantage, as you may have noticed, is that some of these tools offer a limited free version and require a paid or premium version to be able to use their full functionalities periodically. It should be noted that some of these software require installation, configuration or manual processes that entail additional time, but ideally, you should initially try all the options to see the one that best meets your needs.
Obviously, the criteria of each of these tools may differ when deciding what is considered duplicate content; To quote some fragment of text or to make a brief summary referring to the source does not have to mean directly “duplicate content” if in that page it contributes much more original information and content of added value that generates a positive user response. Mainly I would say that you try to maintain the highest possible percentage of content differential value to generate positive response of users in each of your pages, implement mechanisms to prevent the existence of high percentages of duplicate or copied content and acts quickly to solve the incidents that are detected.
Taking all of the above into account, I could personally highlight SemRush as the tool that has saved me the most time and money from all of the above. It is not a tool that directly allows to check duplicate content externally throughout the Internet, but SemRush Site Audit Tool does a good analysis of possible duplicate content internally between the different URLs of a web, indicating periodically the detected incidents. You will also have advice to solve it and unify in a single platform other many functionalities essential for the SEO for a monthly fee that really compensates if you consider the cost that would have to obtain all those functionalities in different tools.
SemRush’s strongest point is Keyword Research for SEO / SEM, PPC but it perfectly fulfills a SEO audit of a web and many other essential functions for Online Marketing in a single platform: Monitor keyword positions of a web in the SERPs, analyze links of a web, etc.
What is your favorite tool for analyzing duplicate content? Tell your experience about it or leave a comment if you know of any other options that have not been quoted here.