Many people may have innocently copied and pasted content from a website in direct breach with the terms and conditions of companies’ websites. Others undertake scraping activity for different reasons.
There are a number of ways screen scraping is carried out and a variety of reasons behind it. Script and automated bots (programs that carry out scraping activity) use single or multiple IPs as a way of disguising themselves as legitimate users. Malign bots, meanwhile, as differentiated from search engine bots, can damage a site’s performance, affect its web ranking and cause firms to lose control of their data, while the scrapers benefit from free listings.
Whatever the method, scraping is an increasing concern for companies with web facing assets, according to Marino Zini managing director at Sentor Managed Security Services UK.
Screen scraping, or web data harvesting as it is often referred to, has grown exponentially and is a serious threat to legitimate online business models, Mr Zini explained. Help is at hand for those firms who are or could be affected by the form of data theft though. Sentor, for example, has been getting more business in the scraping area in recent times and has seen particular success with Yell.com.
Mr Zini noted that the companies have been working together for almost four-and-a-half years to stop scraping, which affects Yell.com for “quite obvious” reasons.
“They’ve got a massive listing of 22 million records and a lot of those records get copied or scraped by automated tools called bots - they automatically harvest the data and then advertise it on competing websites,” the expert explained.
He noted that the Sentor system can detect scraping activity in real-time by tracking user patterns while network traffic-based technologies, such as firewalls and intrusion prevention devices, are unable to detect scraping.
Talking about blocking data scraping activity, Mr Zini continued: “You need a user-based analysis and that is what we do and that is quite unique in the security field. It’s a kind of nirvana of information security.”
Other sectors that should consider protection are the insurance industry, realtors, B2B portals or in fact any sector with large online listings, he continued. It is a “highly competitive environment”, Mr Zini explained, as most companies place their pricing and product data on their websites.
These prices are taken by other companies and used to compare, he noted. “It’s a very dynamic sector so it does make a difference if you can just lower the pricing by say £2. It’s known as ‘rate raping’ in the industry,” Mr Zini commented. While this term may be shocking for some, it is only reflective of the effect scraping can have on firms.
The low-cost airline sector is also taking note of data scraping and the impact it is having on its business. Mr Zini pointed to a recent article he wrote entitled the Billion-Dollar Theft. It is estimated by those within the industry that scraping caused a loss that equates to about a billion dollars on 2008 revenues, he said, and the figure will be larger in 2009 unless something is done about it.
While there are concerns, Mr Zini has reassurances for firms worried about scraping: “The fact is that you can protect against it as we’ve demonstrated and you can detect it by tracking user patterns and using a real-time service.” There are even precedents of scrapers having being prosecuted.