Data scraping ‘can be simple or very hard’
For those people that carry out screen scraping, it can either be a simple or particularly difficult task, it has been suggested. This depends on how complex the source is, according to Martin Streicher, writing for Linux Magazine.
The tools for carrying out scraping activity are mainly the same, whatever the task is, Mr Streicher explained. He admitted to scraping himself, noting that he had probably scraped tens of sites in the past for purposes such as aggregating and analysing sales data.
There are a number of tasks that those looking to carry out scraping activities will need to take on, he explained. The first step they will have to take will be the identification of content they are interested in, then moving on to finding those sites that have the desired information, Mr Streicher asserted. Scrapers will then need to determine if the data on the site is accessible and the find or create tools to collect pages and extract data, he added.
People that do carry out scraping activities may run into trouble, however, as recently highlighted by Ryanair’s announcement that it has lodged proceedings in the High Court in Dublin against Travelviva AG, a German screen scraping ticket tout. The airline has claimed that Travelviva has been carrying out unauthorised screen scraping as well as reselling Ryanair’s flights with unjustified mark-ups. Ryanair said that it is planning to carry out more actions against other European unauthorised screen scrapers in the coming weeks.
“Ryanair is determined to continue its crusade against screen scraping ticket-tout websites until the last screen scraper stops overcharging unsuspecting consumers and breaching Ryanair’s copyright and terms of use of www.ryanair.com,” said Ryanair’s Daniel de Carvalho.
“We are confident that unauthorised screen scraping and overcharging of consumers will eventually be outlawed throughout Europe, to the benefit of consumers and legitimate businesses,” he added.