Explainable AI in Manufacturing: A Predictive Maintenance Case Study

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 592))

Included in the following conference series:

IFIP International Conference on Advances in Production Management Systems

6624 Accesses
35 Citations

Abstract

This paper describes an example of an explainable AI (Artificial Intelligence) (XAI) in a form of Predictive Maintenance (PdM) scenario for manufacturing. Predictive maintenance has the potential of saving a lot of money by reducing and predicting machine breakdown. In this case study we work with generalized data to show how this scenario could look like with real production data. For this purpose, we created and evaluated a machine learning model based on a highly efficient gradient boosting decision tree in order to predict machine errors or tool failures. Although the case study is strictly experimental, we can conclude that explainable AI in form of focused analytic and reliable prediction model can reasonably contribute to prediction of maintenance tasks.

You have full access to this open access chapter, Download conference paper PDF

Towards big industrial data mining through explainable automated machine learning

Article 10 February 2022

Using an Explainable Machine Learning Approach to Minimize Opportunistic Maintenance Interventions

A New Data Analytics Framework Emphasising Pre-processing in Learning AI Models for Complex Manufacturing Systems

Keywords

1 Introduction

Predictive Maintenance (PdM) anticipates maintenance needs to avoid costs associated with unscheduled downtime. By connecting to devices and monitoring the data the devices produce, we can identify patterns that lead to potential problems or failures. Those insights can be used to address issues before they happen. This ability to predict when equipment or assets need maintenance allows us to optimize equipment lifetime and minimize downtime [1].

The fundamental litmus test for explainable AI (XAI) – that is, machine learning algorithms and other Artificial Intelligence systems that produce outcomes that humans can readily understand and track backwards to the origins [2].

In this case study we will consider the field of maintenance in manufacturing. More precisely we will deal with PdM by involving explainable AI outputs as base for our decisions and predictions.

2 Explainable AI - XAI

Recent success of Machine Learning (ML) led to series of application scenarios for Artificial Intelligence (AI) applications. Continued advances promise to produce autonomous systems that will perceive, learn, decide, and act on their own. However, the effectiveness of these systems is limited by the machine’s current inability to explain their decisions and actions to human users.

The Explainable AI (XAI) program introduced by DARPA^{Footnote 1} aims to create a suite of ML techniques that:

Produce more explainable models, while maintaining a high level of learning performance (prediction accuracy); and
Enable human users to understand, appropriately trust, and effectively manage the emerging generation of artificially intelligent partners.

For decision makers who rely upon Data Analytics and Data Science, explainability is a real issue. If the computational system relies on a simple decision model such as logistic regression, they can understand it and convince executives who have to sign off on a system because it seems reasonable and fair. They can justify the analytical results to shareholders, regulators, and other involved stakeholders. But for “Deep Nets” and ML systems, this is no longer possible.

There is a need to find ways to explain the system to the decision maker so that they know that their decisions are going to be reasonable. The goals of explanation involves reaching a persuasion, but that comes only as a consequence of understanding the how the AI works, the mistakes the system can make, and the safety measures surrounding it.

Meanwhile, AI is increasingly allowed to make and take more autonomous decisions and actions. Justifying these decisions will only become more crucial, and there is little doubt that this field will continue to rise in prominence and produce exciting and much needed work in the future [3].

The importance of explanation, and especially explanation in AI, has been emphasized in numerous popular press outlets over the past decades, with considerable discussion of the explainability of “Deep Nets” and ML systems in both the technical literature and the recent popular press [2, 4,5,6,7,8,9,10,11,12,13].

3 Predictive Maintenance

PdM extracts insights from the data produced by the equipment on the shop floor and acts on these insights. The idea of PdM goes back to the early 1990’s and augments regularly scheduled, preventive maintenance. PdM requires the equipment to provide data from sensors monitoring the equipment as well as other operational data. Humans act based on the analysis. Simply speaking, it is a technique to determine (predict) the failure of the machine component in the near future so that the component can be replaced based on the maintenance plan before it fails and stops the production process. The PdM can improve the production process and increase the productivity. By successfully handling with PdM we are able to achieve the following goals:

Reduce the operational risk of mission-critical equipment.
Control cost of maintenance by enabling just-in-time maintenance operations.
Discover patterns connected to various maintenance problems.
Provide Key Performance Indicators.

Usually PdM uses descriptive, statistical or probabilistic approach to drive analysis and prediction. There are also several approaches which used Machine Learning (ML) [1, 14]. Through the literature [15] there can be found the following types of PrM in the production: reactive, periodic, proactive and predictive (Fig. 1).

4 Case Study

In order to handle and use this technique we need a various data from the machines in production. In this case study we used the freely available data from a data source generated as test data set for PdM containing information about: telemetry, errors, failures and machine properties.

The data can be found at Azure blob storage. The data is maintained by Azure Gallery Article^{Footnote 2}. Once the data is downloaded from the blob storage, local copies will be used for further observations in this contribution.

4.1 Methodology

Usually, every PdM technique should proceed by the following three main steps:

Collect Data – collect all possible descriptions, historical and real-time data, usually by using IoT (Internet of Things) devices, various loggers, technical documentation, etc.
Predict Failures – collected data can be used and transformed into ML ready data sets, and build a ML model to predict the failures of the components in the set of machines in the production.
React – by obtaining the information which components will fail in the near future, we can activate the process of replacement so the component will be replaced before it fails, and the production process will not.

4.2 Data Preparation

In order to predict failures in the production process, a set of data transformations, cleaning, feature engineering, and selection must be performed to prepare the data for building a ML model. The data preparation part plays a crucial role in the model building process because quality of the data and its preparation will directly influences the model accuracy and reliability. The data used for this PdM use case can be classified to:

Telemetry – which collects historical data about machine behavior (voltage, vibration, etc.).
Errors – the data about warnings and errors in the machines.
Maint – data about replacement and maintenance for the machines.
Machines – descriptive information about the machines.
Failures – data when a certain machine is stopped, due to component failure.

Errors data represents the most important information in every PdM system. The errors are non-breaking recorded events while the machine is still operational. In the experimental data set the error date and times are rounded to the closest hour since the telemetry data is collected at an hourly rate. What we get to insight is shown in the left chart of Fig. 2.

Failures data represents the replacements of the components due to the failure of the machines. Once the failure is happened the machine is stopped. This is a crucial difference between errors and failures. Failure distribution produced by certain component across machines is shown in the right chart of Fig. 2.

Maintenance data tells us about scheduled and unscheduled maintenance. The data set contains the records which correspond to both, regular inspection of components as well as failures. To add the record into the maintenance table a component must be replaced during the scheduled inspection or replaced due to a breakdown. In case the records are created due to breakdowns are called failures. Maintenance contains the data from 2014 and 2015 years.

Machine data includes information about 100 machines which are subject of the PdM analysis. The information includes: model type, and machine age. Distribution of the machine age categorized by the models across production process is shown in the following Fig. 3.

4.3 Feature Engineering

First, several lagged telemetry data was created, since telemetry data are classic time series data. In the following, the rolling mean and standard deviation of the telemetry data over the last 3-h lag window is calculated for every 3 h. For capturing a longer term effect 24 h lag features rolling average and standard deviation were calculated. Once we have rolling lag features calculated, we can merge them into one data frame. Now that we have basic data frame, we merged previously calculated data frames with this one. At the end of the merging process, the relevant columns are selected.

Unlike telemetry that had numerical values, errors have categorical values denoting the type of error that occurred at a time-stamp. This was used to aggregate categories of the error with different types of errors that occurred in the lag window. The main task here was to create a relevant feature in order to create a quality data set for the machine learning part.

One of the good features that has been chosen was the number of replacements of each component in the last 3 months to incorporate the frequency of replacements. Furthermore, we calculated how long it has been since a component is last replaced as that would be expected to correlate better with component failures since the longer a component is used, the more degradation should be expected. The machine data set contains descriptive information about machines like the type of machines and their ages which is the years in service.

As the last step in feature engineering, we are performing merging all features into one data set. The label in PdM should be the probability that a machine will fail in the near future due to a failure certain component. If we take 24 h to be the period (task) for this problem, the label construction consists of a new column in the feature data set which indicate if certain machine will fail or not in the next 24 h due to failure one of several components.

In this way, we are defining the label as a categorical variable containing: – none – if the machine will not fail in the next 24 h, – comp1 to comp4 if the machine will fail in the next 24 h due to the failure of certain components. Since we can experiment with the label construction by applying different conditions, we can implement methods that take several arguments in order to define the general problem.

4.4 Preliminary Results

We analyzed 5 data sets with information about telemetry, data, errors and maintenance as well as failure for 100 machines. The data were transformed and analyzed in order to create the final data set for building a machine learning model for PdM.

Once we created all features from the data sets, as a final step is to create the label column so that it describes if a certain machine will fail in the next 24 h due to failure a comp1, comp2, comp3, comp4 or it will continue to work. In this part, we performed a part of the ML task and start training a ML model for predicting if a certain machine will fail in the next 24 h due to failure, or it will be in functioning normal in that time period.

The model which we built was multi-class classification model since it has 5 values to predict: comp1, comp2, comp3, comp4 or none – means it will continue to work. We used the DART Booster hyper-parameter tuning along with Light- GBM^{Footnote 3} [16] which is a gradient boosting framework that uses tree based learning algorithm. It is especially efficient on small data sets. We evaluated the trained model first with training data set (see Table 1).

Table 1. Results on training data set.

Full size table

As can be seen the model predicts the values from the training data set correctly in most cases. In order to see how the model predicts unknown data entries we used the test data. The result is shown in Table 2. We can see, that the model has overall accuracy 99%, and 95% average per class accuracy which is very promising for experimental case.

Table 2. Results on test data set.

Full size table

5 Conclusions, Limitations and Outlook

In this paper we conducted a case study in the field of Predictive Maintenance (PdM) with sample machine data to demonstrate how explainable AI can be reached in the field of manufacturing. Although the study is strictly experimental, we can conclude that explainable AI in form of reliable prediction model and visualizations can reasonably contribute to avoiding unnecessary costs associated with unscheduled downtime caused through machine errors or tool failures.

The basic limitation of this contribution is that this experiment was conducted with generic data set, however the presented concept shows high maturity with promising results.

The next step in the future would be to engage the trained model with some data collected directly in real world manufacturing settings and involving data from different manufacturer. In this way the reliability of presented results could be approved and through the comparison of results from different data sources and adjustment of the prediction model.

Notes

References

Lee, W.J., Wu, H., Yun, H., Kim, H., Jun, M.B., Sutherland, J.W.: Predictive maintenance of machine tool systems using artificial intelligence techniques applied to machine condition data. Procedia CIRP 80, 506–511 (2019). 26th CIRP Conference on Life Cycle Engineering (LCE) Purdue University, West Lafayette, IN, USA May 7–9, 2019
Google Scholar
Kim, T.W.: Explainable artificial intelligence (XAI), the goodness criteria and the grasp-ability test. arXiv preprint arXiv:1810.09598 (2018)
Biran, O., Cotton, C.V.: Explanation and justification in machine learning: a survey or (2017)
Google Scholar
Bornstein, A.M.: Is artificial intelligence permanently inscrutable?
Google Scholar
Clancey, W.J.: Intelligent tutoring systems: a tutorial survey. Technical report, Stanford Univ CA Dept of Computer Science (1986)
Google Scholar
Core, M.G., Lane, H.C., Van Lent, M., Gomboc, D., Solomon, S., Rosenberg, M.: Building explainable artificial intelligence systems. In: AAAI, pp. 1766–1773 (2006)
Google Scholar
Hawkins, J.: Special report: can we copy the brain?-what intelligent machines need to learn from the neocortex. IEEE Spectr. 54(6), 34–71 (2017)
Article Google Scholar
Marcus, G.: Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631 (2018)
Monroe, D.: AI, explain yourself. Commun. ACM 61(11), 11–13 (2018)
Article Google Scholar
Sheh, R.K.M.: “Why did you do that?” explainable intelligent robots. In: Workshops at the Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Voosen, P.: How AI detectives are craking open the black box of deep learning. Science (2017)
Google Scholar
Wang, D., Yang, Q., Abdul, A., Lim, B.Y.: Designing theory-driven user-centric explainable AI. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pp. 1–15 (2019)
Google Scholar
Weinberger, D.: Our machines now have knowledge we’ll never understand. Backchannel (2017). https://www.wired.com/story/our-machines-now-have-knowledge-well-never-understand
Susto, G.A., Schirru, A., Pampuri, S., Mcloone, S., Beghi, A.: Machine learning for predictive maintenance: a multiple classifier approach. IEEE Trans. Ind. Inform. 11, 812–820 (2015)
Google Scholar
Selcuk, S.: Predictive maintenance, its implementation and latest trends. Proc. Inst. Mech. Eng. Part B: J. Eng. Manuf. 231(9), 1670–1679 (2017)
Article Google Scholar
Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 3146–3154. Curran Associates, Inc. (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Bihać, dr. Irfana Ljubijankića bb, 77000, Bihać, Bosnia and Herzegovina
Bahrudin Hrnjica
IT and Business Informatics, CAMPUS 02 University of Applied Sciences, Körblergasse 126, 8010, Graz, Austria
Selver Softic

Authors

Bahrudin Hrnjica
View author publications
You can also search for this author in PubMed Google Scholar
Selver Softic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Selver Softic .

Editor information

Editors and Affiliations

University of Novi Sad, Novi Sad, Serbia
Bojan Lalic
University of Belgrade, Belgrade, Serbia
Vidosav Majstorovic
University of Novi Sad, Novi Sad, Serbia
Ugljesa Marjanovic
ZF Hungária Kft., Eger, Hungary
Gregor von Cieminski
Tecnológico de Monterrey, Mexico City, Mexico
David Romero

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hrnjica, B., Softic, S. (2020). Explainable AI in Manufacturing: A Predictive Maintenance Case Study. In: Lalic, B., Majstorovic, V., Marjanovic, U., von Cieminski, G., Romero, D. (eds) Advances in Production Management Systems. Towards Smart and Digital Manufacturing. APMS 2020. IFIP Advances in Information and Communication Technology, vol 592. Springer, Cham. https://doi.org/10.1007/978-3-030-57997-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-57997-5_8
Published: 25 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57996-8
Online ISBN: 978-3-030-57997-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)