This is one question that I have been faced with repeatedly while making sales pitches. What is analytics anyway? While everyone has their perspective of what analytics actually is, this is how I see things – Problem statements fall into two fundamental categories (along the Y-axis as shown in the figure above):

  • Problems where the physical relations between the data-sets to be analysed are known – the relations can be captured by some mathematical formula, by rules, logic or structures.
  • Problems where the physical relations are unknown or too complex to be modeled.
Q1: Known Problem Statement + Known Physical Relations between datasets

Suppose a refiner is analyzing its working capital and wants to know how much of that is tied in crude oil inventory, intermediates, and in finished product inventory. He needs the analysis sliced and diced by refinery, by region, by material group, etc. The analysis also needs to show how much of the inventories are stuck in dead stock, the opportunity cost of it and historical trend of that for last 5 years.

This is an analysis where all the data relations are known. One can perform the analysis combining some logic (SQL), mathematical formulae and structures (hierarchy). In a scenario such as this one, the business question or problem statement is known along with the model required to arrive at an answer. This is what I call “Known-Known”modelling – we know both the business question and the physical relations of the data.

Q4: Known Problem Statement + Unknown Physical Relations between data-sets

There are however, areas, where we do not know the physical relations between the data or they are too complex to be captured in any hard-coded physical relations. Supposing if the same refiner now wants to know well in advance, when a particular critical equipment might fail (e.g. the feed pump to the crude distillation unit), it is difficult to capture it in any physical model. Needless to say, there are always sensor based alarms and alerts that warn against above-limit vibrations, winding temperature, etc. There are also sophisticated models to estimate the thermal, mechanical and electrical stresses for large rotating machines such as this one (feed pump).

However, today, we have commercially viable techniques available to complement these traditional techniques and extend our foresight far beyond of what was hitherto possible. These techniques do not insist on a given relation between data (such as bearing vibrations, winding temperatures, differential pressure, lubrication oil test results, both preventive and breakdown maintenance work order history, etc. as in this example). Rather, these techniques discover the probabilistic relation between an event (e.g. potential failure) and some particular constellations or sequence of constellations of the data sets. These techniques are commonly known as Machine learning, Artificial Intelligence, Data Mining or Statistical Data Mining, Neural Network etc. There could be fine differences between these techniques, however, for broad understanding, these can be clubbed together for what I call “Unknown-known” type analytics (i.e. business issues are known but specific data relations are not).

We have just covered the right hand side of our problem space, i.e. when business issues are known. But what about when business is not aware of the issue or just did not raise it?

Q2+Q3: The Left Quadrants

Few years back, while implementing a large analytics based decision support project for a hydrocarbon major, we stumbled upon a bad actor that nobody was looking for. The analysis was simple ranking of bad actors (equipment) by failures (number, shut down duration, impact, etc.). The data-sets involved were from the customer’s maintenance work order management module, and the material management module and few others. The customer has multiple (refineries, petrochemical plants, Lube plants, etc.) plants in Asia. The analysis allowed ranking of all equipment and the sub-components (bearings, solenoid coils, etc.) at different levels of aggregation along various roll up hierarchies or paradigms. The analytics popped up a particular make of electrical solenoid coil that was responsible for many of the breakdowns across multiple plants and was costing millions of dollars. This is an unintentional discovery that belongs to the top left hand corner of the problem space – “Known-Unknown”.

Now, for the most intriguing quadrant of all: “Unknown-Unknown” (Q3) problems – where nobody is looking for a solution to any particular business problem. Naturally the data set is also not part of any specific data model.It is possible to analyse such data dump and discover correlations not known earlier. This is a classic data mining approach where one finds out correlations between two sets of data and may look for causation thereafter. Stories (true?) such as “fathers who go to buy diapers also buy beer” are classic examples of unknown-unknown data mining outcomes. It is an investigation work and may yield hitherto unknown facts such as – a correlation of a quality issue in a process to a particular set up in a unit far upstream of the process with no apparent direct connection.

In any large industry, there is immense amount of value just in the first quadrant (Q1) itself. Knowing what’s going wrong in business operations and improving it by just 1% may add multi-million dollars to the bottom line. One of my customers took the trouble of evaluating the tangible business impact and employed a 3rd party consultant for the same. The evaluation came out with a very similar figure and further noted that the intangible benefits were likely to be much higher. The decision quality, the collaboration culture, quality of reporting, etc. got radically changed and that definitely had much bigger bottom line impact (than the tangible benefits). However, it is important to mention that the known-known quadrant (Q1) also depends a lot on the domain expertise of the analytics team. In many a cases, the customer brings in that domain expertise as they are the best people to know about their data, business processes and business issues.

Most of the analytics projects I have dealt with, lie in Q1 (and in Q2 as a by-product of Q1) problem space. However, there are increasing interests in Q4 problem space, the known-unknown quadrant. The few solutions that I have come across, are based on the PaaS (cloud “platform as a service”) infrastructure such as AWS, Predix, MindSphere, etc. and are for “product as a service” offerings of major equipment such as large rotating machines.

However, Q4 projects are probably in their very early days. There is definitely a lot of interest and curiosity around this topic which is commonly referred to as “predictive analytics”. While these solutions or services employ “predictive analytics” in a major way, it is not the only thing they do. They employ everything from first principle models (belongs to Q1) to deep learning algorithms (belongs to Q4) on the data feeds from the machines/ assets for fulfilling the SLA (service level agreement) conditions.

This was my take on “analytics” based on my experience with industrial data. What is your take?

Leave a comment