• START
  • SCRAPING
    • SCRAPING DEFINED
    • SCRAPER BOTS
    • SCRAPING THREATS
    • SECTORS AT RISK
  • SERVICES
  • CLIENTS
  • ABOUT
  • RESOURCES
    • SCRAPING NEWS
    • SCRAPESENTRY THREAT REPORT 2014
    • CASE STUDIES
    • SCRAPING FAQ
    • SCRAPING TERMINOLOGY
  • CONTACT

Blog Post

13
JAN
2014

How can I block an IP from accessing my site?

Tags : blocking, prevention
Posted By : Martin Zetterlund
Comments : Off

Generally the hard part of stopping screen scrapers is not placing a block on them, but rather finding them in the first place. Once you have identified a scraper, it is essential to place the block as quickly as possible to stop the activity from the current source. When designing blocking you should bear in mind that scrapers often distribute themselves over thousands or millions of IP addresses to hide themselves.  Any solution should be able to handle large lists of IP addresses and ranges of IPs.  Another key issue in handling lists of blocked IP addresses is keeping the lists up-to-date.   There is rarely a reason to block or allow IP addresses indefinitely. Without proper handling of white lists and black lists they tend to grow over time to a point where they become unmanageable.

Blocking of IP addresses can be done in various parts of the web infrastructure. Depending on how your site is built and which parts you have control over, you may choose to perform blocking through one or more of the following:

Firewall

Pros

  • Firewalls are typically built for the purpose of blocking IP addresses and can handle long lists of IPs without any noticeable performance impact.
  • There is normally a change process in place to manage firewall changes without interrupting normal site operations.
  • Blocking IP addresses in the firewall normally doesn’t require any development work.

Cons

  • A firewall change may take an unacceptable amount of time to implement from a procedural point.   Once you have identified a scraper it is important to get a block in place quickly to stop the unwanted behavior.
  • Firewalls normally only operate on the TCP/IP layer which means that you cannot target scrapers behind proxy servers or big gateways without impacting all users of that gateway.

Load balancer

Pros

  • Load balancers often operate on the HTTP layer which means you can block scrapers by user agent or cookie, giving a greater flexibility than blocking on IP alone.
  • Load balancers can often handle blocks without noticeable performance impact.

Cons

  • Depending on the brand and model of load balancer it may require complex changes to the configuration.
  • Many companies are reluctant to make changes to their load balancers outside maintenance windows.

.htaccess or similar

A description of the htaccess functionality can be found here: http://en.wikipedia.org/wiki/Htaccess

Pros

  • A simple way of blocking bots and scrapers on the IP level in the webserver.
  • Normally only requires a small change to the web server configuration file.
  • It is possible to use more advanced functions in the webserver to access parts of the HTTP header for blocking.

Cons

  • It may have a performance impact.
  • It may require development work.

In the application

Pros

  • Incorporating blocking functionality in the web application is often the most flexible way of blocking.
  • Correctly written, blocking in the application will allow you to place blocks immediately.

Cons

  • Requires development work

Update: @miss_sudo kindly provided this script that can be added to a website to block Tor nodes from accessing the site.

http://wiki.stopabuseonline.org/tiki-index.php?page=PHP_Block_TOR_POST

Others also read

  • Cultuzz gets rid of screen scraping
  • Know your enemy and learn how to prevent screen scraping
  • Will a CAPTCHA test stop scraping?Will a CAPTCHA test stop scraping?

Social Share

  • google-share

Need to Talk to an Expert?

We have helped several companies in various sectors since 2006. Are you afraid that your business is at risk? Then you should talk to one of our anti scraping experts. We operate with integrity and respect your confidentiality.
Contact us today!

Recent Articles

scraping_problems_in_ticketing

The Scraping Problem in Ticketing - View Slideshow

April 09, 2020
scraper_report_tool

Google Launches Google Scraper Report Form

April 07, 2020
scraper_bots_linkedin

Competitor Used Scraper Bots in Order to Copy Linkedin Profiles

April 04, 2020
scrapers_in_ticketing

Ticketmaster Sues Notorious Ticket Scraper Higs

March 28, 2020

Web Scraping – Definition, Detection and Prevention

March 26, 2020

Head Office:

Sentor MSS AB
Björns Trädgårdsgränd 1
116 21 Stockholm
Sweden
Phone:+46 8 545 333 00

UK Office:

Sentor MSS UK
35-37 Blackstock Road
London N4 2JF, UK
UK Phone: +44 77 69 75 63 77
USA/Canada Toll Free: 1-800-351-1691

Latest News

scraping_problems_in_ticketing

The Scraping Problem in Ticketing - View Slideshow

April 09, 2020
scraper_report_tool

Google Launches Google Scraper Report Form

April 07, 2020
Copyright © 2014 ScrapeSentry. All rights reserved