Abstract
Atomic Commit Problem (ACP) is a single-shot agreement problem similar to consensus, meant to model the properties of transaction commit protocols in fault-prone distributed systems. We argue that ACP is too restrictive to capture the complexities of modern transactional data stores, where commit protocols are integrated with concurrency control, and their executions for different transactions are interdependent. As an alternative, we introduce Transaction Certification Service (TCS), a new formal problem that captures safety guarantees of multi-shot transaction commit protocols with integrated concurrency control. TCS is parameterized by a certification function that can be instantiated to support common isolation levels, such as serializability and snapshot isolation. We then derive a provably correct crash-resilient protocol for implementing TCS through successive refinement. Our protocol achieves a better time complexity than mainstream approaches that layer two-phase commit on top of Paxos-style replication.
Similar content being viewed by others
Notes
In practice, the client only needs to send the data relevant to the corresponding shard. We omit this optimisation to simplify notation.
References
Berenson, H., Bernstein, P., Gray, J., Melton, J., O’Neil, E., O’Neil, P.: A critique of ANSI SQL isolation levels. In: Conference on Management of Data (SIGMOD) (1995)
Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley Longman Publishing Co., Inc., Boston (1986)
Borr, A.J.: Transaction monitoring in ENCOMPASS: reliable distributed transaction processing. In: International Conference on Very Large Data Bases (VLDB) (1981)
Bravo, M., Gotsman, A.: Reconfigurable atomic transaction commit. In: Symposium on Principles of Distributed Computing (PODC) (2019)
Chandra, T.D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685–722 (1996)
Charron-Bost, B., Schiper, A.: Uniform consensus is harder than consensus. J. Algorithms 51(1), 15–37 (2004)
Coelho, P.R., Schiper, N., Pedone, F.: Fast atomic multicast. In: Conference on Dependable Systems and Networks (DSN) (2017)
Corbett, J.C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J.J., Ghemawat, S., Gubarev, A., Heiser, C., Hochschild, P., Hsieh, W.C., Kanthak, S., Kogan, E., Li, H., Lloyd, A., Melnik, S., Mwaura, D., Nagle, D., Quinlan, S., Rao, R., Rolig, L., Saito, Y., Szymaniak, M., Taylor, C., Wang, R., Woodford, D.: Spanner: Google’s globally-distributed database. In: Symposium on Operating Systems Design and Implementation (OSDI) (2012)
Défago, X., Schiper, A., Urbán, P.: Total order broadcast and multicast algorithms: taxonomy and survey. ACM Comput. Surv. 36(4), 372–421 (2004)
Ding, B., Kot, L., Demers, A., Gehrke, J.: Centiman: elastic, high performance optimistic concurrency control by watermarking. In: Symposium on Cloud Computing (SoCC) (2015)
Dragojević, A., Narayanan, D., Nightingale, E.B., Renzelmann, M., Shamis, A., Badam, A., Castro, M.: No compromises: distributed transactions with consistency, availability, and performance. In: Symposium on Operating Systems Principles (SOSP) (2015)
Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)
Dwork, C., Skeen, D.: The inherent cost of nonblocking commitment. In: Symposium on Principles of Distributed Computing (PODC) (1983)
Filipovic, I., O’Hearn, P.W., Rinetzky, N., Yang, H.: Abstraction for concurrent objects. Theor. Comput. Sci. 411(51–52), 4379–4398 (2010)
Glendenning, L., Beschastnikh, I., Krishnamurthy, A., Anderson, T.: Scalable consistency in Scatter. In: Symposium on Operating Systems Principles (SOSP) (2011)
Gotsman, A., Lefort, A., Chockler, G.V.: White-box atomic multicast. In: Conference on Dependable Systems and Networks (DSN) (2019)
Gray, J.: Notes on data base operating systems. In: International Conference on Very Large Databases (1978)
Gray, J.: The transaction concept: virtues and limitations (invited paper). In: International Conference on Very Large Data Bases (VLDB) (1981)
Gray, J., Lamport, L.: Consensus on transaction commit. ACM Trans. Database Syst. 31(1), 133–160 (2006)
Guerraoui, R.: Revisiting the relationship between non-blocking atomic commitment and consensus. In: Workshop on Distributed Algorithms (WDAG) (1995)
Guerraoui, R., Larrea, M., Schiper, A.: Reducing the cost for non-blocking in atomic commitment. In: International Conference on Distributed Computing Systems (ICDCS) (1996)
Guerraoui, R., Wang, J.: How fast can a distributed transaction commit? In: Symposium on Principles of Database Systems (PODS) (2017)
Hadzilacos, V.: On the relationship between the atomic commitment and consensus problems. In: Asilomar Workshop on Fault-Tolerant Distributed Computing (1990)
Herlihy, M.P., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)
Junqueira, F.P., Reed, B.C., Serafini, M.: Zab: high-performance broadcast for primary-backup systems. In: Conference on Dependable Systems and Networks (DSN) (2011)
Keidar, I., Dolev, D.: Increasing the resilience of atomic commit at no additional cost. In: Symposium on Principles of Database Systems (PODS) (1995)
Keidar, I., Rajsbaum, S.: A simple proof of the uniform consensus synchronous lower bound. Inf. Process. Lett. 85(1), 47–52 (2003)
Kokocinski, M., Kobus, T., Wojciechowski, P.T.: Make the leader work: executive deferred update replication. In: Symposium on Reliable Distributed Systems (SRDS) (2014)
Kraska, T., Pang, G., Franklin, M.J., Madden, S., Fekete, A.: MDCC: multi-data center consistency. In: European Conference on Computer Systems (EuroSys) (2013)
Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)
Mahmoud, H., Nawab, F., Pucher, A., Agrawal, D., El Abbadi, A.: Low-latency multi-datacenter databases using replicated commit. Proc. VLDB Endow. 6(9), 661–672 (2013)
Mahmoud, H.A., Arora, V., Nawab, F., Agrawal, D., El Abbadi, A.: Maat: effective and scalable coordination of distributed transactions in the cloud. Proc. VLDB Endow. 7(5), 329–340 (2014)
Maiyya, S., Nawab, F., Agrawal, D., El Abbadi, A.: Unifying consensus and atomic commitment for effective cloud data management. Proc. VLDB Endow. 12(5), 611–623 (2019)
Maiyya, S., Nawab, F., Agrawal, D., El Abbadi, A.: Private communication. 2020–2021
Maiyya, S., Nawab, F., Agrawal, D., El Abbadi, A.: Errata for “Unifying consensus and atomic commitment for effective cloud data management”. Submitted to VLDB’21 (2021)
Mu, S., Nelson, L., Lloyd, W., Li, J.: Consolidating concurrency control and consensus for commits under conflicts. In: Symposium on Operating Systems Design and Implementation (OSDI) (2016)
Oki, B.M., Liskov, B.H.: Viewstamped replication: a new primary copy method to support highly-available distributed systems. In: Symposium on Principles of Distributed Computing (PODC) (1988)
Pedone, F., Guerraoui, R., Schiper, A.: The database state machine approach. Distrib. Parallel Databases 14(1), 71–98 (2003)
Peluso, S., Romano, P., Quaglia, F.: Score: a scalable one-copy serializable partial replication protocol. In: International Middleware Conference (Middleware) (2012)
Peluso, S., Ruivo, P., Romano, P., Quaglia, F., Rodrigues, L.E.T.: GMU: genuine multiversion update-serializable partial data replication. IEEE Trans. Parallel Distrib. Syst. 27(10), 71–98 (2016)
Ramarao, K.V.S.: Complexity of distributed commit protocols. Acta Inform. 26(6), 577–595 (1989)
Saeida Ardekani, M., Sutra, P., Shapiro, M.: G-DUR: a middleware for assembling, analyzing, and improving transactional protocols. In: International Middleware Conference (Middleware) (2014)
Schiper, N., Sutra, P., Pedone, F.: P-store: genuine partial replication in wide area networks. In: Symposium on Reliable Distributed Systems (SRDS) (2010)
Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22(4), 299–319 (1990)
Sciascia, D., Pedone, F., Junqueira, F.: Scalable deferred update replication. In: Conference on Dependable Systems and Networks (DSN) (2012)
Skeen, D.: Nonblocking commit protocols. In: Conference on Management of Data (SIGMOD) (1981)
Sovran, Y., Power, R., Aguilera, M.K., Li, J.: Transactional storage for geo-replicated systems. In: Symposium on Operating Systems Principles (SOSP) (2011)
Thomson, A., Diamond, T., Weng, S., Ren, K., Shao, P., Abadi, D.J.: Calvin: fast distributed transactions for partitioned database systems. In: Conference on Management of Data (SIGMOD) (2012)
Weikum, G., Vossen, G.: Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery. Morgan Kaufmann Publishers Inc., Burlington (2001)
Zhang, I., Sharma, N.K., Szekeres, A., Krishnamurthy, A., Ports, D.R.K.: Building consistent transactions with inconsistent replication. In: Symposium on Operating Systems Principles (SOSP) (2015)
Zhang, I., Sharma, N.K., Szekeres, A., Krishnamurthy, A., Ports, D.R.K.: When is operation ordering required in replicated transactional storage? IEEE Data Eng. Bull. 39(1), 27–38 (2016)
Acknowledgements
Alexey Gotsman was supported by a Starting Grant RACCOON from the European Research Council.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is a revised and expanded version of a paper that received a Best Paper Award at the 32nd International Symposium on Distributed Computing (DISC)
Rights and permissions
About this article
Cite this article
Chockler, G., Gotsman, A. Multi-shot distributed transaction commit. Distrib. Comput. 34, 301–318 (2021). https://doi.org/10.1007/s00446-021-00389-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00446-021-00389-4