Why is data scraping difficult to counter?
Not all scraping is hard to counter. The simplest case is where normal web robots adhere to the Robot Exclusion Standard according to what you have entered into your robots.txt file for your website. Generic robots not honoring the protocol that scrape indiscriminately are usually easy to stop by blocking the user-agent or the IP/ IP-range that they originate from.
Scrapers that are harder to stop are usually financially motivated, and have specific targets. Their persistency is directly related to financial gain or loss. Once successful, they will spend lots of time and resource countering any obstacles put in place to stop them. The sensible way of looking at scraping in most cases is to aim at winning the war rather than the battle. Any single countermeasure to prevent scraping will sooner or later be compromised by those that are properly motivated. To win the war you need to be prepared to continuously evaluate the effectiveness of your countermeasures, and develop new ones once the previous ones have been evaded. By continuously making life more difficult for the scrapers, you will raise the cost of their utilization of your data, degrade their service to the point it makes scraping your site not cost effective.
Scraping as a service is big business in itself. The industry is maturing much faster than the business of anti scraping. The range of services offered stretch from renting a coder or a piece of simple software to fully managed services complete with SLA. The more serious scraping sites have access to vast amounts of IP addresses that they are using to hide themselves. Taking data from sources that are open to the public has proven to be much easier than stopping the practice. Many of these services do not make more than a single search from each IP before they change to the next address. Anonymization seems to run counter to most scraping service provider’s legal disclaimers. Which brings up the next question concerning the legality of scraping.