Skip to main content

SASH: Safe Autonomous Self-Healing

  • Conference paper
  • First Online:
Service-Oriented Computing – ICSOC 2022 Workshops (ICSOC 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13821))

Included in the following conference series:

  • 861 Accesses

Abstract

With the large scale and user demands on modern cloud systems there is a need for autonomous approaches to self-healing. When there is no operator in the loop for self-healing actions, it is crucial to ensure that the actions taken are safe and effective. In this paper we propose SASH: Safe Autonomous Self-Healing, which uses surrogate models to estimate the safety and effectiveness of self-healing actions. SASH uses system metrics, configuration parameters, domain information and available actions to decide on the best fault remediation action or combination of actions. The performance of the action(s) are then verified through a validation block that updates the knowledge base with how the actions performed for that fault. This data is then used to update the safety and effectiveness estimation algorithm. The results show the framework is able to successfully remediate faults with a low number of actions and with protection against unsafe actions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
eBook
USD 149.00
Price excludes VAT (USA)
Softcover Book
USD 199.99
Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ali-Tolppa, J., Kocsis, S., Schultz, B., Bodrog, L., Kajo, M.: Self-healing and resilience in future 5G cognitive autonomous networks. In: 2018 ITU Kaleidoscope: Machine Learning for a 5G Future (ITU K), pp. 1–8. IEEE (2018)

    Google Scholar 

  2. Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  3. Bressler, S.L., Seth, A.K.: Wiener-granger causality: a well established methodology. Neuroimage 58(2), 323–329 (2011)

    Article  Google Scholar 

  4. Computing, A., et al.: An architectural blueprint for autonomic computing. IBM White Pap. 31(2006), 1–6 (2006)

    Google Scholar 

  5. Dai, Y., Xiang, Y., Zhang, G.: Self-healing and hybrid diagnosis in cloud computing. In: IEEE International Conference on Cloud Computing, pp. 45–56 (2009)

    Google Scholar 

  6. Dang, Y., Lin, Q., Huang, P.: AIOps: real-world challenges and research innovations. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pp. 4–5. IEEE (2019)

    Google Scholar 

  7. Gulenko, A.: Autonomic self-healing in cloud computing platforms. Technische Universitaet Berlin, Germany (2020)

    Google Scholar 

  8. Jin, Y., et al.: Self-aware distributed deep learning framework for heterogeneous IoT edge devices. Futur. Gener. Comput. Syst. 125, 908–920 (2021)

    Article  Google Scholar 

  9. Magalhaes, J.P., Silva, L.M.: A framework for self-healing and self-adaptation of cloud-hosted web-based applications. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science, vol. 1, pp. 555–564. IEEE (2013)

    Google Scholar 

  10. Mariani, L., Monni, C., Pezzé, M., Riganelli, O., Xin, R.: Localizing faults in cloud systems. In: 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST), pp. 262–273. IEEE (2018)

    Google Scholar 

  11. Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)

  12. Mo, S., Pei, X., Wu, C.: Safe reinforcement learning for autonomous vehicle using Monte Carlo tree search. IEEE Trans. Intell. Transp. 23, 6766–6773 (2021)

    Article  Google Scholar 

  13. Paltrinieri, N., Comfort, L., Reniers, G.: Learning about risk: machine learning for risk assessment. Saf. Sci. 118, 475–486 (2019)

    Article  Google Scholar 

  14. Petrenko, S.: Developing a Cybersecurity Immune System for Industry 40. CRC Press, Boca Raton (2022)

    Book  Google Scholar 

  15. Rajput, P.K., Sikka, G.: Multi-agent architecture for fault recovery in self-healing systems. J. Ambient. Intell. Humaniz. Comput. 12(2), 2849–2866 (2021)

    Article  Google Scholar 

  16. Sadiku, M.N., Musa, S.M., Momoh, O.D.: Cloud computing: opportunities and challenges. IEEE Potentials 33(1), 34–36 (2014)

    Article  Google Scholar 

  17. Schwarting, W., Alonso-Mora, J., Rus, D.: Planning and decision-making for autonomous vehicles. Annu. Rev. Control Robot. Auton. Syst. 1(1), 187–210 (2018)

    Article  Google Scholar 

  18. Shalit, U., Johansson, F.D., Sontag, D.: Estimating individual treatment effect: generalization bounds and algorithms. In: International Conference on Machine Learning, pp. 3076–3085. PMLR (2017)

    Google Scholar 

  19. Shirazi, E., Jadid, S.: Autonomous self-healing in smart distribution grids using agent systems. IEEE Trans. Industr. Inf. 15(12), 6291–6301 (2018)

    Article  Google Scholar 

  20. Tamim, I., Saci, A., Jammal, M., Shami, A.: Downtime-aware O-RAN VNF deployment strategy for optimized self-healing in the O-cloud. In: 2021 IEEE Global Communications Conference (GLOBECOM), pp. 1–6. IEEE (2021)

    Google Scholar 

  21. White, G., Diuwe, J., Fonseca, E., O’Brien, O.: MMRCA: multimodal root cause analysis. In: Hacid, H., et al. (eds.) ICSOC 2021. LNCS, vol. 13236, pp. 177–189. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-14135-5_14

    Chapter  Google Scholar 

  22. Zhou, G., Tian, W., Buyya, R.: Deep reinforcement learning-based methods for resource scheduling in cloud computing: a review and future directions. arXiv preprint arXiv:2105.04086 (2021)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gary White .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

White, G., Custode, L.L., O’Brien, O. (2023). SASH: Safe Autonomous Self-Healing. In: Troya, J., et al. Service-Oriented Computing – ICSOC 2022 Workshops. ICSOC 2022. Lecture Notes in Computer Science, vol 13821. Springer, Cham. https://doi.org/10.1007/978-3-031-26507-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26507-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26506-8

  • Online ISBN: 978-3-031-26507-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics