Start with the problem, not with the Data says Andreas Weigend on one of the many banners at this years this year’s Strata data conference in London. It is one of the recurring themes of the conference and something we at Bluemetrix have been shouting for years.
We have assembled our key takeaways from the event below covering everything from technology stacks to organisation change management but if you want the full slides head over to https://strataconf.com/slides to get them from each speaker.
Data science is not always the first step
Allison Nau from Coxautodata.com illustrated the journey from problem to insight clearly during her presentation with this diagram of the journey.
In fact Allison Nau ignores Data Science in the beginning. “DataScience is not always the first step”.
Hollie Lubbock from Fjord presented the intersection of data where you can find insight. It comes in the overlap when you answer the questions Who, what, where, when, why and how. The intersection of data is between Thick data design research, Big data and data science, and wide data trends and strategy.
But this is not easy to achieve as highlighted by Kim Nilsson in this great diagram illustrating the challenges for a datascience team combining so many different sources.
This is why it was great to see some real life case studies presented on the day.
Carme Artigas of synergicpartners.com presented a case study on using data to increase the retail business value for Pepsi Co. Using BigData they were able to locate the Points of Sale with the highest potential and to segment those PoS according to their shopper behaviour. There were 3 algorithms they used to locate the PoS with the highest potential:
- Linear regression
- Logistic regression
- Clustering mode
EasyJets Data team shared a great slide on their journey from Excel sheets to Big Data and advanced analytics.
“Data is the fuel, the gold, of this era” – Eva Kaili
They use this data now for revenue management.
This consists of the following processes:
Algorithmic ecosystem with over 150 processes controlling the pricing lifecycle at flight level.
- Pre-live – analysis of historic flight performance (internal and external drivers).
- Live – forecasting and tracking demand evolution against plan & reacting to changes.
- Events – handling demand variations through the year.
Flight ticket and ancillary ticket:
- 500,000 flights and their ancillaries managed independently.
- 30,000 daily adjustments. (Note, they do not change the price after you check their website!)
Machine learning since 2010:
- Over 1000 productivity improvements in that time.
- Many millions of revenue improvements.
Paco Nathan of OReilly Media presented an interesting comparison of Agile Methodology vs Machine Learning.
“AI sounds scary to the lay person. But what we call AI is actually decision automation- and that sounds (a bit) less scary”. – Amr Awadallah
Fraud detection techniques comparison.
Current big data technology stacks
Harvinder Atwal did a quick analysis of the conference talk descriptions and plotted the most talked about technologies at the Conference. He notes some change since last year. Spark still top but Kafka overtakes Hadoop, Druid is highest new entry but Kylin disappears off the chart.
A key quote from the day presented by Strata Data themselves:
“Big data is just the ability to gather information and query it in such a way that we are able to learn things about the world that were previously inaccessible to us.” – Hilary Mason.
While Olaf Hein, Ordix presented an interesting comparison of big data storage solutions.
In terms of big data technology it was interesting to hear from @cloudera and @Telekom_group as they work towards providing Machine learning as a service which should have a big impact on the industry over the coming years.
To speed up development in the machine learning pipeline, break it up in pieces suggested Harvinder Atwal, Head of Data Strategy and Advanced Analytics for Money Supermarket. This enables you to commit code regularly and build out your continuous integration.
Organisation change management
One of the most challenging roadblocks to a data project is the organization itself shared Guillaume Salou .
There is no technical solution to a political problem.
A common theme throughout the day was the difference in language between different stakeholders in an organisation. To move forward you need to agree a common language as highlighted by https://dere-street.com/.
“Innovation is not organising a hackathon on a weekend giving your best ideas for a beer and a cold pizza” – Louise Beaumont, Publicis Goupe “
@kevinsigliano presented the key dimensions you need to focus on to prove the ROI of digital transformation when making a case to the organisation.
Han Yang @HanYang1234 From Cisco highlighted one of the key challenges for organisations adopting data science. While data scientists have their favourite variant of software stack, but they don’t believe corporate IT is savvy enough to deliver it. He suggested the solution to this is a validated design. It’s like a recipe for IT to follow.
Harvinder recommends organising around the ideal journey of your data instead of organising around your teams and organisation structure. It results in fewer roles, more end-to-end ownership and less friction.
A common complaint comes from users of the data itself and are forced to use business intelligence dashboards. Users hate BI dashboards and have for a long time as presented in this dashboard about dashboards from @markmadsen & @superdupershant.
One of the key difficulties of Big data and machine learning is the difference between research and when that research but hit the real world and brilliantly summarised with this slide from @mikiobraun
Thats it from the team.
Were you at the conference? What were your key takeaways?