How to make the biggest big data decision

Big data has become big business. IDC has forecast that revenue from the sales of big data and business analytics applications, tools, and services will reach $187 billion (£132 billion) in 2019[1]. The analyst house also recently predicted that the amount of digital data created in 2025 will be 10 times greater than the amount created in 2016, with enterprises expected to create 60% of data by that year. With this in mind, most enterprises are today either looking to benefit from their big data or are doing so already.

Forward-looking organisations across a range of industries are already using big data across their value chains in a staggering range of applications from developing more effective medicines more rapidly, to learning about customer preferences, to improving operational performance in the extraction of oil. But side by side with the successes that companies are keen to talk about are many failures that are, for obvious reasons, discussed much less openly.

There is a lot of analysis and advice about the reasons why big data projects fail[2] . Much of this rightly identifies a lack of clarity in the project’s aims as being the cause, but big data projects also fail because of unsuitable technology choices. It is easy to understand why this might happen. First mover advantage is very significant in the digital world and there are many different data analytics technology options available. Consequently, technology leaders are under pressure to make complex technology selection decisions quickly. The risk is that the wrong choice is made.

Getting an existing live project back on the right technology path is at best incredibly costly and inconvenient, and at worst impossible. In any case, it will involve uncomfortable discussions with business stakeholders. Getting it right the first time is therefore of utmost importance.

Deciding on the right big data technologies for a project involves having a clear understanding of a few key internal factors: what the business is aiming to achieve, what is the current technology setup within the organisation and what talent already exists within the organisation. Taking the time to access these elements carefully before committing to a beguiling new technology is an invaluable step to take to avoid making costly and potentially irreversible mistakes.

It is also important to realise that, sometimes, establishing clear project objectives and exploring alternative approaches can reduce the problem to one that can be solved with mature mainstream technologies such as relational databases. In his book “Big Data Analytics: Turning Big Data into Big Money”, Frank Ohlhorst captures a very important point: “Big Data defines a situation in which data sets have grown to such enormous sizes that conventional information technologies can no longer effectively handle either the size of the data set or the scale and growth of the data set.” In other words, we only have a big data project when we have a problem that can’t be solved with conventional approaches. Attempting to use big data technologies to address something that is truly a small-scale problem is at least as dangerous, in terms of lost time and money, as the reverse.

There are several aspects of a solution design that we recommend are explicitly considered when an organisation is deciding which big data technology to employ, and we group these in into five deciding factors.
The five deciding factors

  • Data ingestion – How will data get into the system? How will data sources be connected to the system and how will the data travel between any two systems? What data formats will be used?
  • Streaming – Does anything need to be done to the data immediately as it flows between systems? If so, how will the data be processed? How quickly are the results really needed?
  • Batch processing – Can all the processing be done in real time, or will some data need to be processed offline? If so, how? How will stream- and batch-processed data be recombined at query time?
  • Storage – Where, how and for how long should the data be stored? Do we need to distinguish between medium-term and long-term storage or archiving?
  • Actuation – Once the data is captured and processed, how will it be used, and who by? Will it be used by business users, data scientists, automated processes or machines? How will users search for data? How will users subsequently work with data?

In a world where many big data technologies began life in companies such as Google or Facebook, companies that don’t have the same challenges as many pre-digital organisations, it’s never been more important to ensure that big data technology choices at the beginning of a project are well thought through. Getting it wrong can mean a blunted competitive edge, disappointment and even embarrassment, but the outcomes of doing this correctly – turbo-charging R&D, gaining new insight into customers’ behaviour, being first to market with a new service or beating the competition to introduce a first-in-class life-saving drug – can be immeasurably great.

Categorised:

James Hinchliffe

James Hinchliffe

James started working for Tessella in 2003 following the completion of his PhD in mathematics, ...

© Copyright 2018 Tessella
All rights reserved