What data science can learn from office logistics

When I first heard that we needed to upsize from our current base in Stevenage, I imagined that picking out a new office would be somewhat like choosing a new laptop. Most of the work would be upfront, checking through a list of requirements – perhaps location, floor space, parking, cost, to name a few. With that done, it would simply be a case of sorting all the paperwork, and one day we would just pack up and move. Of course, I was completely naïve about what is involved. Arrangements with lawyers, contractors, suppliers, movers, existing occupiers, and futureproofing for continued double-digit growth, all require lots of discussions. The reality is that there are many important and inter-related factors that cannot be finalised up front. Chatting with my colleagues, it became clear that a move like this can only work well when those involved remain in close communication. This made me wonder if people have similar misconceptions about what it takes to work effectively with data scientists.

Writing as a data scientist, my experience has taught me that a similar level of collaboration, through an ongoing partnership, is needed to deliver the sort of changes which bring real, sustained business value – whether the ultimate goal is to raise productivity, improve decision making, or just translate a specific business problem into a mathematical framework and provide appropriate training. The simple picture of giving a data scientist a data set, a rough set of instructions and expecting them to go off and “do the math” is rarely a recipe for success.

This won’t come as a surprise to everyone. Indeed, statisticians have always preferred to be involved throughout a process which spans from the designing of experiments to the analysis of experimental results. But as we acquire ever larger, more complex and disparate data sets, we face a sharp change in the need to work in close partnership with the owners and consumers of data. Otherwise analytics can only take us so far.

Let me give you an example which shows some of the subtleties that lie under the surface: a data set that has more “variables” than “samples”. This is a common situation, in problems spanning from econometrics to clinical studies – I’ll focus here on the case of a chemical manufacturing process. Thousands of quantities are measured for every chemical production run. Things like material specifications, temperatures, acidity, a multitude of flow rates, concentrations and a plethora of quality metrics… each recorded at many stages in production, several at multiple times. Since production started, there have been just 50 runs, and a few have failed for an unknown reason. The product owners want to know why some runs result in failure, but with so few runs and so much potentially relevant data recorded for each, trying to identify a single cause of failures from the data alone might not be possible.

There are plenty of machine learning approaches in data scientists’ arsenals to model this kind of data set. Perhaps these approaches suggest a few tens of potential factors which could relate to process failures. A good start, but what next? Through conversations with the process engineers, it should be possible to use their deep understanding of the process to rule out, say, half of these factors straight away. However, a good partnership goes beyond this and aims for more than a just an analysis of the data. Existing data alone will probably not prove that certain issues are causing production failures in the chemical plant. But by working together to design experiments for future production runs, the potential causes of failure can be proved or disproved in minimum time. From here, perhaps the data pipeline can be refined to flag failures sooner, thus saving time. Or even better, it may be that real-time corrections can be introduced, to prevent the failures altogether.

Looking beyond making chemicals, across all domains it’s clear to me that data scientists have a bigger impact as part of a close partnership with data owners and end users. If moving office is a sufficiently complex business problem that it takes more than handing over a wish list to get a solution, it’s no surprise that the scale of challenges which data scientists work on should not be tackled in isolation. Unfortunately, this is still too often the case.

Sam Genway

Sam Genway

Sam is a data scientist with an academic research background in theoretical physics. He works with ...

Latest Tweet from Sam Genway

RT @CambConsultants: Highlights from today's machine learning/AI media event at our head office in #Cambridge, with @nvidia, @NetAppUK and…

about 5 months ago

© Copyright 2019 Tessella
All rights reserved