Skip to main content

Evaluating Tree Explanation Methods for Anomaly Reasoning: A Case Study of SHAP TreeExplainer and TreeInterpreter

  • Conference paper
  • First Online:
Advances in Conceptual Modeling (ER 2020)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12584))

Included in the following conference series:

Abstract

Understanding predictions made by Machine Learning models is critical in many applications. In this work, we investigate the performance of two methods for explaining tree-based models: ‘Tree Interpreter (TI)’ and ‘SHapley Additive exPlanations TreeExplainer (SHAP-TE)’. Using a case study on detecting anomalies in job runtimes of applications that utilize cloud-computing platforms, we compare these approaches using a variety of metrics, including computation time, significance of attribution value, and explanation accuracy. We find that, although the SHAP-TE offers consistency guarantees over TI, at the cost of increased computation, consistency does not necessarily improve the explanation performance in our case study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
eBook
USD 39.99
Price excludes VAT (USA)
Softcover Book
USD 54.99
Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Feature Attribution (FA) is defined as the contribution each independent variable or a “feature” made to the final prediction of a model.

  2. 2.

    See Sect. 2 for the definition of consistency.

  3. 3.

    Feature Attribution Method (FAM), referred to as the explanation method that calculates FAs to interpret each prediction generated by a model.

  4. 4.

    Some of the covariate variables in postgreSQL dataset are continuous, which when grouped reduces the number of data points per cluster.

  5. 5.

    RBO implementation: https://github.com/changyaochen/rbo.

  6. 6.

    Dataset can be found at https://groups.cs.umass.edu/kdl/causal-eval-data.

  7. 7.

    This data is collected in the work by [5].

  8. 8.

    For eg, consider 2 lists of attribution values \(S_1=[1, 1.1, 1.3]\) and \(S_2=[1, 3, 5]\). The ranking obtained from values in \(S_2\) is more reliable than \(S_1\).

  9. 9.

    https://github.com/sharmapulkit/TreeInterpretability_AnomalyExplanation.

References

  1. Caruana, R., Karampatziakis, N., Yessenalina, A.: An empirical evaluation of supervised learning in high dimensions. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, pp. 96–103 (2008)

    Google Scholar 

  2. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41, 1–58 (2009)

    Article  Google Scholar 

  3. Cuzzocrea, A., Mumolo, E., Cecolin, R.: Runtime anomaly detection in embedded systems by binary tracing and hidden Markov models. In 2015 IEEE 39th Annual Computer Software and Applications Conference, vol. 2, pp. 15–22 (2015)

    Google Scholar 

  4. Duque Anton, S., Sinha, S., Schotten, H.: Anomaly-based intrusion detection in industrial data with SVM and random forests, pp. 1–6 (2019)

    Google Scholar 

  5. Gentzel, A., Garant, D., Jensen, D.: The case for evaluating causal models using interventional measures and empirical data. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32, pp. 11722–11732. Curran Associates Inc. (2019)

    Google Scholar 

  6. Kuhn, H.W., Tucker, A.W.: Contributions to the Theory of Games, vol. 2. Princeton University Press, Princeton (1953)

    MATH  Google Scholar 

  7. Lipovetsky, S., Conklin, M.: Analysis of regression in game theory approach. Appl. Stochast. Models Bus. Ind. 17, 319–330 (2001)

    Article  MathSciNet  Google Scholar 

  8. Lundberg, S.M., et al.: From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2(1), 2522–5839 (2020)

    Article  Google Scholar 

  9. Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  10. Peiris, M., Hill, J.H., Thelin, J., Bykov, S., Kliot, G., Konig, C.: PAD: performance anomaly detection in multi-server distributed systems. In: 2014 IEEE 7th International Conference on Cloud Computing, pp. 769–776 (2014)

    Google Scholar 

  11. Primartha, R., Tama, B.A.: Anomaly detection using random forest: a performance revisited. In: 2017 International Conference on Data and Software Engineering (ICoDSE), pp. 1–6 (2017)

    Google Scholar 

  12. Ribeiro, M.T., Singh, S., Guestrin, C.: “why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016, pp. 1135–1144 (2016)

    Google Scholar 

  13. Saabas, A.: Treeinterpreter. https://github.com/andosa/treeinterpreter

  14. Shao, L., et al.: Griffon. In: Proceedings of the ACM Symposium on Cloud Computing - SoCC 2019 (2019)

    Google Scholar 

  15. Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. CoRR abs/1704.02685 (2017)

    Google Scholar 

  16. Shrikumar, A., Greenside, P., Shcherbina, A., Kundaje, A.: Not just a black box: learning important features through propagating activation differences. CoRR abs/1605.01713 (2016)

    Google Scholar 

  17. Štrumbelj, E., Kononenko, I.: Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 41, 647–665 (2013)

    Article  Google Scholar 

  18. Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: The IEEE Conference on Computer Vision and Pattern Recognition (2018)

    Google Scholar 

  19. Webber, W., Moffat, A., Zobel, J.: A similarity measure for indefinite rankings. ACM Trans. Inf. Syst. 28, 4 (2010)

    Article  Google Scholar 

  20. Wulsin, D., Blanco, J., Mani, R., Litt, B.: Semi-supervised anomaly detection for EEG waveforms using deep belief nets. In: 2010 Ninth International Conference on Machine Learning and Applications, pp. 436–441 (2010)

    Google Scholar 

Download references

Acknowledgements

We thank our mentors, Javier Burroni and Prof. Andrew McCallum, for their guidance. We also thank Minsoo Thigpen for organizational support, as well as Scott Lundberg for providing insightful suggestions on a earlier draft. Finally, we thank anonymous reviewers for their feedback.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Pulkit Sharma or Liqun Shao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharma, P. et al. (2020). Evaluating Tree Explanation Methods for Anomaly Reasoning: A Case Study of SHAP TreeExplainer and TreeInterpreter. In: Grossmann, G., Ram, S. (eds) Advances in Conceptual Modeling. ER 2020. Lecture Notes in Computer Science(), vol 12584. Springer, Cham. https://doi.org/10.1007/978-3-030-65847-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-65847-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-65846-5

  • Online ISBN: 978-3-030-65847-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics