• START
  • SCRAPING
    • SCRAPING DEFINED
    • SCRAPER BOTS
    • SCRAPING THREATS
    • SECTORS AT RISK
  • SERVICES
  • CLIENTS
  • ABOUT
  • RESOURCES
    • SCRAPING NEWS
    • SCRAPESENTRY THREAT REPORT 2014
    • CASE STUDIES
    • SCRAPING FAQ
    • SCRAPING TERMINOLOGY
  • CONTACT

Blog Post

15
JAN
2014

Will a CAPTCHA test stop scraping?

Tags : captcha, prevention, scraping
Posted By : Martin Zetterlund
Comments : Off

Yes and No. CAPTCHA tests can be highly effective in the right place if the data is not too valuable for scrapers. There are two main ways of circumventing CAPTCHA tests, by using OCR (optical character recognition) software or to use labor in low cost countries to manually solve them.

OCR is the traditional way of cracking CAPTCHA tests.  By using increasingly complex algorithms, programmers have managed to get a 5% success rate at solving even the reCAPTCHA test which is one of the hardest CAPTCHA challenges out there.  An interesting side effect of this may be that using a reCAPTCHA test may significantly increase the scraping related traffic to your websites as they will need 20 searches instead of one at a 5% success rate.  Less effective programmers will require more attempts.

Using CAPTCHA crackers in the third world is surprisingly cheap, we have seen prices as low as $1 for 1000 successfully solved CAPTCHA test. This method is of course slower but that can be countered by using more people.  Depending on how the CAPTCHA challenge is implemented it is in some cases possible to pre solve CAPTCHA tests as well to further speed up the process.

The fact that CAPTCHA challenges can be circumvented is however not the primary objective against using them, it is that they degrade the usability of a website. The harder CAPTCHA challenges are troublesome even for humans to solve and used in the wrong place on a website may significantly lower the visitor numbers.

There are ways of limiting these effects by using CAPTCHA in conjunction with other means to detect scraping. The most basic example is to only send CAPTCHA tests to clients making more than a certain number of requests, this will help most users of the website including scrapers as they will not have to fill out as many CAPTCHA tests.  Another method is to send CAPTCHA challenges to IP addresses geographically located in places where you normally do not have many visitors. Many websites are country or language specific, and you can block off countries that normally harbor the open proxies or anonymizing services that scrapers use.

All implementations of CAPTCHA tests naturally come with the challenge of keeping whitelists up-to-date.   Almost all websites have partners, friendly bots, and other allowed automated users of the website.

Others also read

  • Ryanair implements captchas to stop web scraping
  • Know your enemy and learn how to prevent screen scraping
  • Cultuzz gets rid of screen scraping

Social Share

Need to talk to an expert?

We have helped several companies in various sectors since 2006. Are you afraid that your business is at risk? Then you should talk to one of our anti scraping experts. We operate with integrity and respect your confidentiality.
Contact us today!

Stop scraping and bad bots with ScrapeSentry

ScrapeSentry is a complete combination of technology, behavioral analysis, expertise and most importantly 24/7 human moderation. Find out how ScrapeSentry can secure your business!
Read more!

Recent Articles

web_scraping_abuse_act

Web scraping and the Computer Fraud and Abuse Act

September 02, 2020
scraping_problems_in_ticketing

The Scraping Problem in Ticketing - View Slideshow

April 09, 2020
scraper_report_tool

Google Launches Google Scraper Report Form

April 07, 2020
scraper_bots_linkedin

Competitor Used Scraper Bots in Order to Copy Linkedin Profiles

April 04, 2020
scrapers_in_ticketing

Ticketmaster Sues Notorious Ticket Scraper Higs

March 28, 2020

ScrapeSentry - The Anti Scraping Service

We offer guaranteed detection and scraping prevention in near real-time. A combination of technology, behavioral analysis, expertise and most importantly 24/7 human moderation.
More about ScrapeSentry!

The ScrapeSentry Threat Report

ScrapeSentry Threat Report 2014 is a report based on data from the world's largest database for scraping related activity. The report shows an increase in scraping related activity since 2013.
Download the report!

Recent Articles in our Newsroom

web_scraping_abuse_act

Web scraping and the Computer Fraud and Abuse Act

September 02, 2020
scraping_problems_in_ticketing

The Scraping Problem in Ticketing - View Slideshow

April 09, 2020
scraper_report_tool

Google Launches Google Scraper Report Form

April 07, 2020
scraper_bots_linkedin

Competitor Used Scraper Bots in Order to Copy Linkedin Profiles

April 04, 2020
scrapers_in_ticketing

Ticketmaster Sues Notorious Ticket Scraper Higs

March 28, 2020

Head Office:

Sentor MSS AB
Björns Trädgårdsgränd 1
116 21 Stockholm
Sweden
+46 8 545 333 00

UK Office:

Sentor MSS UK
35-37 Blackstock Road
London N4 2JF, UK
+44 77 69 75 63 77

US Office:

ScrapeSentry Inc.
326 Commercial Street
Boston, MA 02109
1-800-351-1691
Copyright © 2014 ScrapeSentry. All rights reserved