FAQs

 

If you have questions regarding scraping or how to handle it, feel free to drop an email to [email protected] and we will answer. All information will be kept confidential, nothing that can be used to identify you or your company will be published.

Q: What is the scraped data used for?

A: This is very much dependant on the data of course and impossible to give a general answer but some examples may be:

  • Launching competing services
  • Building telemarketing databases (specific for yellow/white pages)
  • Building link farms
  • Reselling services
  • Content for adsense pages

As well other uses not listed here.

 

Q: Why is data scraping difficult to counter?

A: Parasitic scraping use tools that behave like legitimate bots used by spider, crawlers and search engines such as Google and Bing. It is difficult to differentiate between good and bad scrapers. It is imperative to avoid blocking legitimate scrapers.

 

Q: What is the impact of data scraping?

A: Data Scraping cab impact a site in 3 main way:

  • The uniqueness of your intellectual property is compromised
  • The sheer volume of scrapers amy impact, slow down or even create a denial of service of your site
  • Scraping by unknown parties may have a legal impact on your partner content

 

Q: Why don't you just use a captcha test? That will block all scripts!

A: Yes and no, in some environments a captcha test may be very useful, for example for registering a single thing but in other places it may be more or less useless. If you take for instance a large database in where users are supposed to do several searches giving each user a captcha test is not an option in most cases. Even if you use it in conjunction with rate limiting to detect site scrapers you will still have problems with large gateways and spiders.

Q: Is Screen Scraping Legal?

A: problems with legal action against scraping

There are two major problems with using legal action to stop web scraping.

  • The first is obviously that since the scraping is performed on the Internet the scraper may be located anywhere in the world and he or she may not abide the laws of the country where the site is located.
  • The Second problem is the sheer scale of scraping and the fact that it is not trivial to identify the scrapers at most times. If you have a large site with valuable information or business logic that attracts scrapers there will probably be hundreds of offenders each month and pursuing legal action against them all will be very costly.

You may think that it would be enough with one or two to deter the others but from our experience most scrapers care little for that risk and hide behind open proxy servers or other anonomizing services that make them close to impossible to identify.

The article looks at the case Southwest airlines vs Outtask and gives some helpful pointers to what laws may be used in the US at least.

Q: How can I block an IP from accessing my site?

A: There are three main ways of doing this,

  1. In a firewall or other packet filtering device
  2. In the webserver by using .htaccess or similar
  3. In the application itself

Of these three the second is probably the simplest way information about how to write a correct .htaccess file can be found here.

Our Service

Customer experience

We offer effective solutions to companies in several sectors. Our clients, many of which are long term, are testament to our commitment.