Will a CAPTCHA test stop scraping?
Yes and No. CAPTCHA tests can be highly effective in the right place if the data is not too valuable for scrapers. There are two main ways of circumventing CAPTCHA tests, by using OCR (optical character recognition) software or to use labor in low cost countries to manually solve them.
OCR is the traditional way of cracking CAPTCHA tests. By using increasingly complex algorithms, programmers have managed to get a 5% success rate at solving even the reCAPTCHA test which is one of the hardest CAPTCHA challenges out there. An interesting side effect of this may be that using a reCAPTCHA test may significantly increase the scraping related traffic to your websites as they will need 20 searches instead of one at a 5% success rate. Less effective programmers will require more attempts.
Using CAPTCHA crackers in the third world is surprisingly cheap, we have seen prices as low as $1 for 1000 successfully solved CAPTCHA test. This method is of course slower but that can be countered by using more people. Depending on how the CAPTCHA challenge is implemented it is in some cases possible to pre solve CAPTCHA tests as well to further speed up the process.
The fact that CAPTCHA challenges can be circumvented is however not the primary objective against using them, it is that they degrade the usability of a website. The harder CAPTCHA challenges are troublesome even for humans to solve and used in the wrong place on a website may significantly lower the visitor numbers.
There are ways of limiting these effects by using CAPTCHA in conjunction with other means to detect scraping. The most basic example is to only send CAPTCHA tests to clients making more than a certain number of requests, this will help most users of the website including scrapers as they will not have to fill out as many CAPTCHA tests. Another method is to send CAPTCHA challenges to IP addresses geographically located in places where you normally do not have many visitors. Many websites are country or language specific, and you can block off countries that normally harbor the open proxies or anonymizing services that scrapers use.
All implementations of CAPTCHA tests naturally come with the challenge of keeping whitelists up-to-date. Almost all websites have partners, friendly bots, and other allowed automated users of the website.