• START
  • SCRAPING
    • SCRAPING DEFINED
    • SCRAPER BOTS
    • SCRAPING THREATS
    • SECTORS AT RISK
  • SERVICES
  • CLIENTS
  • ABOUT
  • RESOURCES
    • SCRAPING NEWS
    • SCRAPESENTRY THREAT REPORT 2014
    • CASE STUDIES
    • SCRAPING FAQ
    • SCRAPING TERMINOLOGY
  • CONTACT

Blog Post

17
JAN
2014

Why is data scraping difficult to counter?

Tags : scraping techniques
Posted By : Martin Zetterlund
Comments : Off

Not all scraping is hard to counter. The simplest case is where normal web robots adhere to the Robot Exclusion Standard according to what you have entered into your robots.txt file for your website.  Generic robots not honoring the protocol that scrape indiscriminately are usually easy to stop by blocking the user-agent or the IP/ IP-range that they originate from.

Scrapers that are harder to stop are usually financially motivated, and have specific targets. Their persistency is directly related to financial gain or loss.   Once successful, they will spend lots of time and resource countering any obstacles put in place to stop them. The sensible way of looking at scraping in most cases is to aim at winning the war rather than the battle.  Any single countermeasure to prevent scraping will sooner or later be compromised by those that are properly motivated.   To win the war you need to be prepared to continuously evaluate the effectiveness of your countermeasures, and develop new ones once the previous ones have been evaded. By continuously making life more difficult for the scrapers, you will raise the cost of their utilization of your data, degrade their service to the point it makes scraping your site not cost effective.

Scraping as a service is big business in itself.  The industry is maturing much faster than the business of anti scraping.   The range of services offered stretch from renting a coder or a piece of simple software to fully managed services complete with SLA. The more serious scraping sites have access to vast amounts of IP addresses that they are using to hide themselves.  Taking data from sources that are open to the public has proven to be much easier than stopping the practice.  Many of these services do not make more than a single search from each IP before they change to the next address.  Anonymization seems to run counter to most scraping service provider’s legal disclaimers.  Which brings up the next question concerning the legality of scraping.

Others also read

  • Know your enemy and learn how to prevent screen scraping
  • Changing IP address in order to scrape website a violation against US lawChanging IP address in order to scrape website a violation against US law
  • Ryanair implements captchas to stop web scraping

Social Share

  • google-share

Need to Talk to an Expert?

We have helped several companies in various sectors since 2006. Are you afraid that your business is at risk? Then you should talk to one of our anti scraping experts. We operate with integrity and respect your confidentiality.
Contact us today!

Recent Articles

scraping_problems_in_ticketing

The Scraping Problem in Ticketing - View Slideshow

April 09, 2020
scraper_report_tool

Google Launches Google Scraper Report Form

April 07, 2020
scraper_bots_linkedin

Competitor Used Scraper Bots in Order to Copy Linkedin Profiles

April 04, 2020
scrapers_in_ticketing

Ticketmaster Sues Notorious Ticket Scraper Higs

March 28, 2020

Web Scraping – Definition, Detection and Prevention

March 26, 2020

Head Office:

Sentor MSS AB
Björns Trädgårdsgränd 1
116 21 Stockholm
Sweden
Phone:+46 8 545 333 00

UK Office:

Sentor MSS UK
35-37 Blackstock Road
London N4 2JF, UK
UK Phone: +44 77 69 75 63 77
USA/Canada Toll Free: 1-800-351-1691

Latest News

scraping_problems_in_ticketing

The Scraping Problem in Ticketing - View Slideshow

April 09, 2020
scraper_report_tool

Google Launches Google Scraper Report Form

April 07, 2020
Copyright © 2014 ScrapeSentry. All rights reserved