Start with the problem, not with the Data says Andreas Weigend on one of the many banners at this years this year’s Strata data conference in London. It is one of the recurring themes of the conference and something we at Bluemetrix have been shouting for years.

We have assembled our key takeaways from the event below covering everything from technology stacks to organisation change management but if you want the full slides head over to https://strataconf.com/slides to get them from each speaker.

 

Data science is not always the first step

Allison-Nau-Coxautodata

Image c: Damien Berger @berger_data

Allison Nau from Coxautodata.com illustrated the journey from problem to insight clearly during her presentation with this diagram of the journey.

 

In fact Allison Nau ignores Data Science in the beginning. “DataScience is not always the first step”.

Hollie Lubbock from Fjord presented the intersection of data where you can find insight.  It comes in the overlap when you answer the questions Who, what, where, when, why and how.  The intersection of data is between Thick data design research, Big data and data science, and wide data trends and strategy.

Hollie Lubbock Fjord Data
Hollie Lubbock Fjord Data

But this is not easy to achieve as highlighted by Kim Nilsson in this great diagram illustrating the challenges for a datascience team combining so many different sources.

Kim Nilsson  Data

Image c: Bence Arato @BenceArato

 

This is why it was great to see some real life case studies presented on the day.

Carme Artigas of synergicpartners.com presented a case study on using data to increase the retail business value for Pepsi Co.  Using BigData they were able to locate the Points of Sale with the highest potential and to segment those PoS according to their shopper behaviour. There were 3 algorithms they used to locate the PoS with the highest potential:

  • Linear regression
  • Logistic regression
  • Clustering mode

“Data is the fuel, the gold, of this era” – Eva Kaili

Easyjet Data Science Journey

Image c: ClearPeaks @CLEARPEAKS

EasyJets Data team shared a great slide on their journey from Excel sheets to Big Data and advanced analytics.

They use this data now for revenue management.

This consists of the following processes:

Algorithmic ecosystem with over 150 processes controlling the pricing lifecycle at flight level.

  • Pre-live – analysis of historic flight performance (internal and external drivers).
  • Live – forecasting and tracking demand evolution against plan & reacting to changes.
  • Events – handling demand variations through the year.

Flight ticket and ancillary ticket:

  • 500,000 flights and their ancillaries managed independently.
  • 30,000 daily adjustments. (Note, they do not change the price after you check their website!)

Machine learning since 2010:

  • Over 1000 productivity improvements in that time.
  • Many millions of revenue improvements.

 

Paco Nathan of OReilly Media presented an interesting comparison of Agile Methodology vs Machine Learning.

Paco Nathan - Comparison of Agile Methodology vs Machine Learning

Image c: Shaun McGirr @shaunmcgirr

“AI sounds scary to the lay person. But what we call AI is actually decision automation- and that sounds (a bit) less scary”. – Amr Awadallah

Fraud detection techniques comparison.

Fraud detection techniques comparison

Image c: Jason @jasonbperkins

 

Current big data technology stacks

Harvinder Atwal Big Data Technology Stack

Harvinder Atwal did a quick analysis of the conference talk descriptions and plotted the most talked about technologies at the Conference. He notes some change since last year. Spark still top but Kafka overtakes Hadoop, Druid is highest new entry but Kylin disappears off the chart.

A key quote from the day presented by Strata Data themselves:
“Big data is just the ability to gather information and query it in such a way that we are able to learn things about the world that were previously inaccessible to us.” – Hilary Mason.

Rob Passarella alphafeatures.com alternative data stack

Image c: Saeed @saeedamenfx

While Olaf Hein, Ordix presented an interesting comparison of big data storage solutions.

Ordix comparison of big data storage solutions

Image c: Saeed @saeedamenfx

In terms of big data technology it was interesting to hear from  @cloudera and @Telekom_group as they work towards providing Machine learning as a service  which should have a big impact on the industry over the coming years.

 

To speed up development in the machine learning pipeline, break it up in pieces suggested Harvinder Atwal,  Head of Data Strategy and Advanced Analytics for Money Supermarket. This enables you to commit code regularly and build out your continuous integration.

 

Organisation change management

One of the most challenging roadblocks to a data project is the organization itself shared Guillaume Salou .

There is no technical solution to a political problem.

A common theme throughout the day was the difference in language between different stakeholders in an organisation. To move forward you need to agree a common language as highlighted by https://dere-street.com/.

Data experts policy makers

Image c: albertod @albertod

“Innovation is not organising a hackathon on a weekend giving your best ideas for a beer and a cold pizza” – Louise Beaumont, Publicis Goupe “

@kevinsigliano presented the key dimensions you need to focus on to prove the ROI of digital transformation when making a case to the organisation.

Data transformation key dimensions roi

Han Yang @HanYang1234 From Cisco highlighted one of the key challenges for organisations adopting data science. While data scientists have their favourite variant of software stack, but they don’t believe corporate IT is savvy enough to deliver it. He suggested the solution to this is a validated design. It’s like a recipe for IT to follow.

Harvinder recommends organising around the ideal journey of your data instead of organising around your teams and organisation structure. It results in fewer roles, more end-to-end ownership and less friction.

 

A common complaint comes from users of the data itself and are forced to use business intelligence dashboards. Users hate BI dashboards and have for a long time as presented in this dashboard about dashboards from @markmadsen & @superdupershant.

BI Business Intelligence Dashboard

Image c: Viktor Vojnovski @vojnovski

One of the key difficulties of Big data and machine learning is the difference between research and when that research but hit the real world and brilliantly summarised with this slide from @mikiobraun

Organisation of work

Image c: Jonny Daenen @JonnyDaenen

Thats it from the team.
Were you at the conference? What were your key takeaways?

chief-data-officer-ebook Download the book on creating a new mandate for the Chief Data Officer.