Why General Relativity is relevant, 100 years on, to today’s data scientists

“A bold claim, Sir!” I hear you cry. How can you relate General Relativity (GR)†, the mathematically complex realisation of an abstract picture of the universe to modern everyday data science? Is he really saying that forecasting customer churn or the predictive maintenance of a turbine can be connected to removing the need for a gravitational force by allowing massive objects to curve space-time? Yes I am, after a fashion. Come with me on this one, we’ll get to data science soon enough.

A quiet voice in a deafening background –the hunt for gravitational waves mirrors modern data science

A quiet voice in a deafening background –the hunt for gravitational waves mirrors modern data science

You see, it’s all about the challenge of measuring important effects that are subtle and hidden inside broader levels of complexity. Some of the predictions of GR are pretty easy to validate with a good telescope, by making really precise measurements of our solar system. Others are much harder – so much so that we are still waiting 100 years later to see them.

The prediction most relevant to my story here is that of gravitational waves. These are nothing more or less than ripples running through the already curved fabric of space-time. The ripples result from the journey of massive objects through space. But you can ignore the details and don’t worry if, like me, you struggle to picture this. The point is that these ripples are extremely hard to detect. But it’s important to find them since, if they exist, they help reassure us that the rest of GR is right. And GR is needed to predict the fate of the universe.

The challenge facing physics is that these ripples are predicted to have an extremely small effect on the things we can directly measure. The natural world is full of the vibrations of thermal energy and many other masking factors, making seeing a gravitational ripple amongst all this noise very hard indeed. The solution is to build a vast structure that splits a laser beam down two 3km tunnels at right angles to each other. They hope that an instrument on this scale will be sensitive enough to detect the minute differences in length of the tunnels caused by a ripple. The key to success is to remove all the other sources of signals, from seismic activity to traffic noise, from the experiment.

And this is my first point. Physics has always been the science that pushes the boundaries of what is possible to measure. Ghost-like particles called neutrinos, for example, require a 1,000 tonne tank of heavy water to have any hope of detecting an interaction. The tank is more than 2km underground to minimise the signal noise from cosmic rays. These guys never give up. And we’re not talking direct measurement here, as you can’t “see” either a gravitational wave or a neutrino. You have to see them indirectly, via how they affect the things you can see. That’s an acquired skill.

My second point is that deploying these skills via massive machines is the exception in physics, not the rule. Extracting fuzzy, indirectly measured effects, which lurk inside massive datasets has been a common part of physics for decades. Observational astrophysics is a great example, where the signals collected by radio and x-ray telescopes undergo sophisticated post collection analysis in order to test our current theories of galaxy and star formation. Unlocking the inner structure of new advanced materials from the scattering patterns of beams of neutrons requires careful analysis of terabytes of experimental data.

And since we are talking about measuring the messy, fuzzy real world here, physics has led the development of new techniques in statistics and probability theory needed to quantify the confidence with which you can interpret data. The recent “discovery” of the Higgs boson was only announced once CERN had determined that the probability of collecting the measurement dataset, if their predictions were wrong, was less than a 1 in 3 million.
The underlying mindset and practical skills developed in this kind of work are increasingly transferrable into the modern world of data-driven analytics. So it’s little wonder to me when I find in my line of work that physicists often make really top drawer data scientists. And no – please don’t be Einstein. I wouldn’t want the staff management overhead.

† see http://www.newyorker.com/tech/elements/the-space-doctors-big-idea-einstein-general-relativity for an utterly charming and highly accessible overview of what relativity is about

© Copyright 2018 Tessella
All rights reserved