Short scraping news from 2007 and 2008
2020-07-15 - How CAPTCHA got trashed
Computerworld has an article about how CAPTCHA got trashed. It mentions products used to beat Craigslist anti-spam mechanism and how Cragslist try to fight back against spammers using phone verification.
2020-07-06 - Scraping case in Irish court
The Sunday Business Post reports that Ryanair is taking legal measures against Bravofly arguing that screen scraping violates the terms and conditions of their website. Read moreabout this and other legal scraping information.
2020-04-14 - More news about broken captchas
The Register has an article about British researchers showing that the MSN captchas are crackable. Using a normal home computer they manage to read them with a high success rate generally destroying their whole purpose. This is especially interesting in the light of the reports from google that spammers have been able to create massive amounts of email accounts on their gmail service.
2020-04-07 - Legal issues with web scraping
Article about the legal aspects of screen scraping. Before moving into the legal area the article goes through defining screen scraping and a couple of the more common countermeasures.
2020-03-18 - Captchas does not stop scraping
Labor cost varies a lot around the globe and this shows how a simple image test is not enough to stop scraping. Humans in low cost countries are hired to break CAPTCHA images designed to protect free services from automatic signups.
2020-02-04 - Scraping and data theft
SC Magazine has an article about scraping and data theft. Some comments on the article.
2020-12-20 - Article about scraping in Wired
A lengthy article in Wired about scraping, Should Web Giants Let Startups Use the Information They Have About You? The article is focused around web 2.0 and scraping showing both sides of it, the smaller start-ups that try to build services around someone else’s data and the large existing services working on preventing them from scraping.
2020-12-17 - Facebook Sues Porn Site for scraping
The popular social networking website Facebook has filed a lawsuit against a Canadian company for “unauthorized attempts to access and harvest proprietary information” - Scraping that is.
On more than 200,000 occasions during a two-week period it is said that the Canadian porn company Istra Holdings harvested information from the Facebook site with the help of automated spidering and scraping tools. Read more about the the scraping lawsuit at NewsFactor.
2020-11-14 - easyJet threatens to take legal action against screen scrapers
More on easyJet, quite a lengthy article about the problem. Apparently they will make their data available through a couple of global distribution systems but add a fee for it (to make sure it’s always cheapest on the web site). The only way to stop the scrapers mentioned is to take legal action against the scrapers, from our experience this is expensive and ineffective.
2020-11-02 - easyJet prefers to take direct control over sales rather than selling through third-party scrapers
EasyJet are trying to formalise the relationship with select companies that will be allowed to access the site through an api rather than the normal way, this to take control of some of the third party sales. Scraping seems to be the preferred method of unlicensed agents to get the details for the sales. They mention numbers of between 6-10% of sales made through unwanted third parties, that makes you wonder how many percent of the traffic is made up by scrapers really since you’d probably have to take a great amount of data for every sale. Read more at the Travolution blog.
2020-10-11 - Users blocked from white pages website
Yellowpages New Zealand apparently blocked the AA from accessing the whitepages site, the article does not go into depth about why they were really blocked so it is a bit hard to tell if it was by mistake or not. This is however the classic problem with rate limiting users, sooner or later one of your big customers will trigger that threshold so it is of great importance to properly investigate every alert.