The web scraping industry is maturing both from the technology and business perspective, however, it still lacks proper regulation. For this reason, key market players are launching an Ethical Web Data Collection Initiative (EWDCI) to share best practices and advocate for common principles. These were some of the main takeaways from this year’s edition of the prominent industry conference — OxyCon.
Organized by a leading public web data gathering solutions provider Oxylabs, OxyCon connected global web scraping experts for a two-day online event. From practical tips for engineers to high-level panel discussions, the conference speakers reviewed the most recent developments in the field.
Allen O'Neill, CEO and CTO at The DataWorks, argued that while the web scraping industry has been developing rapidly over the years, there’s still so much potential left for the future:
“The web scraping industry hasn’t even scratched the surface with its potential yet. There will be many new unicorns in the industry in the upcoming ten years - those who will be able to harness the power of information extraction (not data extraction, but information extraction) and use that to gain insights that have never been seen before”, - said Allen.
The fast growth of the industry was illustrated by scaling being the hottest topic at OxyCon. Karsten Madsen, CEO at SEO company Morningscore, shared the story of his team moving from small data requests to having to compete with SEO industry giants. According to him, it’s not always about having the most data or the smartest data - it’s about having smarter algorithms to manage it.
Glen De Cauwsemaecker, Lead Crawler Engineer at OTA Insight had another tip for scaling data operations: “Be pragmatic and look for cost-reward balance”, - he recommended to the fast-growing data companies.
Besides the technical challenges of scaling, legal issues are also often close to the top of the list of concerns. The participants of the panel discussion “Lawyers discuss scraping” emphasized the ambiguity and many unclear areas that come with the lack of proper industry regulation. As a result, the industry itself must be proactive in safeguarding it from within and sharing best practices among each other.
In this light, Christian Dawson, Executive Director at I2Coalition made an announcement of a new web scraping industry initiative. I2Coalition, together with 5 public data aggregators - Oxylabs, Zyte, Smartproxy, Coresignal, and Sprious has launched an Ethical Web Data Collection Initiative (EWDCI). The aim of the group will be to promote the industry’s best practices and advocate for beneficial technical standards.