The scraping threats

Scraping explained

 

Web scraping also known as screen scraping, data scraping, web data harvesting or just scraping is a constantly growing menace on the Internet. Scrapers or organizations that commission scraping do so because they know that it's easier and cheaper to copy and steal data rather than work hard to compile it.

There are dozens of commercial scraping packages available online, most offer support and even guaranteed anonymity. Scraping is theft plain and simple when it contravenes the Terms and Conditions of Usage of a site. The scraping activity may differ in technique and the value of individual records may differ but the effect is common in that the data advertised on your website will no longer belong to you once scraped.

Between July 2008 and December 2010 ScrapeSentry saw a tenfold increase in scraping activity on domains monitored. The graph below illustrates the number of blockings required at a typical site. It could be argued that this a reflection of the increase in criminal activity though the downturn in the economy.

 

 

Your business is at risk from scrapers

Your data is valuable to scrapers and competitors. Today business models are being built on scraping because it is easier to copy data from other sites than to create it.

Volumes of scraping are increasing

Increasing volumes and more sophisticated ways to avoid detection are increasing the pressure on your infrastructure. More importantly scrapers will put your business model at risk.

Scrapers are determined

Scrapers are determined to get to your data. Scrapers use increasingly sophisticated automated scripts, bots and human scraping banks that mimic human behaviour. This makes it extremely difficult to determine good from bad requests. Even CAPTCHA challenges are being broken by both optical character recognition and determined humans. Unless they are tracked 24/7 scrapers will find ways to get around automatic blocking techniques.

It’s difficult to differentiate between good and bad requests

Scrapers use Bots which mimic search engine requests. It is very difficult to differentiate between them. Benevolent bots are the ones you want on your site - like search engine bots - which index your site, giving you search engine ranking. Scraper bots decrease your sites performance and cause you to loose control over your data. Web scrapers can be divided into three categories:

Manual scrapers People who download data manually and/or use it in direct breach with the terms and conditions of your site. This can be either single individuals or groups of people such as for example a call centre using the site commercially.

Scripted scrapers In order to get large amounts of data quickly or perform transactions automatically it is most convenient for scrapers to use a script or a program to perform the scraping rather than doing it manually. Scripted web scrapers can use single or multiple IP:s making it seem that they are in fact a group of legitimate users.

Bots Bots are divided into benevolent bots and malign bots. Benevolent bots are the ones you want on your site - like search engine bots - which index your site, giving you search engine ranking.

Our Service

Customer experience

We offer effective solutions to companies in several sectors. Our clients, many of which are long term, are testament to our commitment.