THE BLOG

Content Scraping is Not a Good Idea

Content scraping is when a website directly copies content from another site and posts it on their website. Sometimes they acknowledge the source, but 99% of the time they don’t.

Examples of scraping:

  • Sites that directly copy content
  • Sites that copy content, modify it slightly and then republish it
  • Sites that reproduce feeds from other sites

It is not a good practice because it is:

Sometimes scraping is done manually, but most of the time content is collected by bots scanning websites. A recent, notable case is LinkedIn, who is suing an undisclosed competitor for scraping user profile information, in order to create a competing site. It has been done to eBay by a competitor scraping auction information from their website, airlines and their discount sellers and in many cases, blogs.

If you currently scrape content from other blogs, please consider other methods that give positive credit and acknowledgement to the original source:

  • Write a meaningful blog post with your own perspective. If you want to cite a source and raise points that other websites have, anchor link to their content
  • Share the link to your social networks

How to Detect if Your Site is Being Scraped:

  • Copyscape – paste in your URL to see how many duplicates are out there
  • Webmaster Tools – take a look at the sites linking to you, if there is one in particular (that is not a social media site) that is linking to you a fair amount, check them out, they may be scraping your content

You can contact the scraper and tell them to take it down, but this can be tricky because:

  • Copyright laws vary from country to country
  • A scraper site can easily disappear and reappear as something completely different
  • You may have a difficult time finding contact info for the culprit

Other Ways to Stop It

  1. Install anti-scraping software
  2. Put as many internal links and links to other sites in your copy as possible
  3. Catch scraper bots with javascript