×

Multi-shot distributed transaction commit. (English) Zbl 1522.68062

Summary: Atomic Commit Problem (ACP) is a single-shot agreement problem similar to consensus, meant to model the properties of transaction commit protocols in fault-prone distributed systems. We argue that ACP is too restrictive to capture the complexities of modern transactional data stores, where commit protocols are integrated with concurrency control, and their executions for different transactions are interdependent. As an alternative, we introduce Transaction Certification Service (TCS), a new formal problem that captures safety guarantees of multi-shot transaction commit protocols with integrated concurrency control. TCS is parameterized by a certification function that can be instantiated to support common isolation levels, such as serializability and snapshot isolation. We then derive a provably correct crash-resilient protocol for implementing TCS through successive refinement. Our protocol achieves a better time complexity than mainstream approaches that layer two-phase commit on top of Paxos-style replication.

MSC:

68M14 Distributed systems

Software:

Paxos

References:

[1] Berenson, H., Bernstein, P., Gray, J., Melton, J., O’Neil, E., O’Neil, P.: A critique of ANSI SQL isolation levels. In: Conference on Management of Data (SIGMOD) (1995)
[2] Bernstein, PA; Hadzilacos, V.; Goodman, N., Concurrency Control and Recovery in Database Systems (1986), Boston: Addison-Wesley Longman Publishing Co., Inc., Boston
[3] Borr, A.J.: Transaction monitoring in ENCOMPASS: reliable distributed transaction processing. In: International Conference on Very Large Data Bases (VLDB) (1981)
[4] Bravo, M., Gotsman, A.: Reconfigurable atomic transaction commit. In: Symposium on Principles of Distributed Computing (PODC) (2019) · Zbl 1542.68012
[5] Chandra, TD; Hadzilacos, V.; Toueg, S., The weakest failure detector for solving consensus, J. ACM, 43, 4, 685-722 (1996) · Zbl 0885.68022 · doi:10.1145/234533.234549
[6] Charron-Bost, B.; Schiper, A., Uniform consensus is harder than consensus, J. Algorithms, 51, 1, 15-37 (2004) · Zbl 1078.68157 · doi:10.1016/j.jalgor.2003.11.001
[7] Coelho, P.R., Schiper, N., Pedone, F.: Fast atomic multicast. In: Conference on Dependable Systems and Networks (DSN) (2017)
[8] Corbett, J.C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J.J., Ghemawat, S., Gubarev, A., Heiser, C., Hochschild, P., Hsieh, W.C., Kanthak, S., Kogan, E., Li, H., Lloyd, A., Melnik, S., Mwaura, D., Nagle, D., Quinlan, S., Rao, R., Rolig, L., Saito, Y., Szymaniak, M., Taylor, C., Wang, R., Woodford, D.: Spanner: Google’s globally-distributed database. In: Symposium on Operating Systems Design and Implementation (OSDI) (2012)
[9] Défago, X.; Schiper, A.; Urbán, P., Total order broadcast and multicast algorithms: taxonomy and survey, ACM Comput. Surv., 36, 4, 372-421 (2004) · doi:10.1145/1041680.1041682
[10] Ding, B., Kot, L., Demers, A., Gehrke, J.: Centiman: elastic, high performance optimistic concurrency control by watermarking. In: Symposium on Cloud Computing (SoCC) (2015)
[11] Dragojević, A., Narayanan, D., Nightingale, E.B., Renzelmann, M., Shamis, A., Badam, A., Castro, M.: No compromises: distributed transactions with consistency, availability, and performance. In: Symposium on Operating Systems Principles (SOSP) (2015)
[12] Dwork, C.; Lynch, N.; Stockmeyer, L., Consensus in the presence of partial synchrony, J. ACM, 35, 2, 288-323 (1988) · doi:10.1145/42282.42283
[13] Dwork, C., Skeen, D.: The inherent cost of nonblocking commitment. In: Symposium on Principles of Distributed Computing (PODC) (1983)
[14] Filipovic, I.; O’Hearn, PW; Rinetzky, N.; Yang, H., Abstraction for concurrent objects, Theor. Comput. Sci., 411, 51-52, 4379-4398 (2010) · Zbl 1209.68156 · doi:10.1016/j.tcs.2010.09.021
[15] Glendenning, L., Beschastnikh, I., Krishnamurthy, A., Anderson, T.: Scalable consistency in Scatter. In: Symposium on Operating Systems Principles (SOSP) (2011)
[16] Gotsman, A., Lefort, A., Chockler, G.V.: White-box atomic multicast. In: Conference on Dependable Systems and Networks (DSN) (2019)
[17] Gray, J.: Notes on data base operating systems. In: International Conference on Very Large Databases (1978)
[18] Gray, J.: The transaction concept: virtues and limitations (invited paper). In: International Conference on Very Large Data Bases (VLDB) (1981)
[19] Gray, J.; Lamport, L., Consensus on transaction commit, ACM Trans. Database Syst., 31, 1, 133-160 (2006) · doi:10.1145/1132863.1132867
[20] Guerraoui, R.: Revisiting the relationship between non-blocking atomic commitment and consensus. In: Workshop on Distributed Algorithms (WDAG) (1995)
[21] Guerraoui, R., Larrea, M., Schiper, A.: Reducing the cost for non-blocking in atomic commitment. In: International Conference on Distributed Computing Systems (ICDCS) (1996)
[22] Guerraoui, R., Wang, J.: How fast can a distributed transaction commit? In: Symposium on Principles of Database Systems (PODS) (2017)
[23] Hadzilacos, V.: On the relationship between the atomic commitment and consensus problems. In: Asilomar Workshop on Fault-Tolerant Distributed Computing (1990)
[24] Herlihy, MP; Wing, JM, Linearizability: a correctness condition for concurrent objects, ACM Trans. Program. Lang. Syst., 12, 3, 463-492 (1990) · doi:10.1145/78969.78972
[25] Junqueira, F.P., Reed, B.C., Serafini, M.: Zab: high-performance broadcast for primary-backup systems. In: Conference on Dependable Systems and Networks (DSN) (2011)
[26] Keidar, I., Dolev, D.: Increasing the resilience of atomic commit at no additional cost. In: Symposium on Principles of Database Systems (PODS) (1995)
[27] Keidar, I.; Rajsbaum, S., A simple proof of the uniform consensus synchronous lower bound, Inf. Process. Lett., 85, 1, 47-52 (2003) · Zbl 1042.68010 · doi:10.1016/S0020-0190(02)00333-2
[28] Kokocinski, M., Kobus, T., Wojciechowski, P.T.: Make the leader work: executive deferred update replication. In: Symposium on Reliable Distributed Systems (SRDS) (2014)
[29] Kraska, T., Pang, G., Franklin, M.J., Madden, S., Fekete, A.: MDCC: multi-data center consistency. In: European Conference on Computer Systems (EuroSys) (2013)
[30] Lamport, L., The part-time parliament, ACM Trans. Comput. Syst., 16, 2, 133-169 (1998) · doi:10.1145/279227.279229
[31] Mahmoud, H.; Nawab, F.; Pucher, A.; Agrawal, D.; El Abbadi, A., Low-latency multi-datacenter databases using replicated commit, Proc. VLDB Endow., 6, 9, 661-672 (2013) · doi:10.14778/2536360.2536366
[32] Mahmoud, HA; Arora, V.; Nawab, F.; Agrawal, D.; El Abbadi, A., Maat: effective and scalable coordination of distributed transactions in the cloud, Proc. VLDB Endow., 7, 5, 329-340 (2014) · doi:10.14778/2732269.2732270
[33] Maiyya, S.; Nawab, F.; Agrawal, D.; El Abbadi, A., Unifying consensus and atomic commitment for effective cloud data management, Proc. VLDB Endow., 12, 5, 611-623 (2019) · doi:10.14778/3303753.3303765
[34] Maiyya, S., Nawab, F., Agrawal, D., El Abbadi, A.: Private communication. 2020-2021
[35] Maiyya, S., Nawab, F., Agrawal, D., El Abbadi, A.: Errata for “Unifying consensus and atomic commitment for effective cloud data management”. Submitted to VLDB’21 (2021)
[36] Mu, S., Nelson, L., Lloyd, W., Li, J.: Consolidating concurrency control and consensus for commits under conflicts. In: Symposium on Operating Systems Design and Implementation (OSDI) (2016)
[37] Oki, B.M., Liskov, B.H.: Viewstamped replication: a new primary copy method to support highly-available distributed systems. In: Symposium on Principles of Distributed Computing (PODC) (1988)
[38] Pedone, F., Guerraoui, R., Schiper, A.: The database state machine approach. Distrib. Parallel Databases 14(1), 71-98 (2003)
[39] Peluso, S., Romano, P., Quaglia, F.: Score: a scalable one-copy serializable partial replication protocol. In: International Middleware Conference (Middleware) (2012)
[40] Peluso, S.; Ruivo, P.; Romano, P.; Quaglia, F.; Rodrigues, LET, GMU: genuine multiversion update-serializable partial data replication, IEEE Trans. Parallel Distrib. Syst., 27, 10, 71-98 (2016) · doi:10.1109/TPDS.2015.2510998
[41] Ramarao, KVS, Complexity of distributed commit protocols, Acta Inform., 26, 6, 577-595 (1989) · Zbl 0679.68092 · doi:10.1007/BF00263581
[42] Saeida Ardekani, M., Sutra, P., Shapiro, M.: G-DUR: a middleware for assembling, analyzing, and improving transactional protocols. In: International Middleware Conference (Middleware) (2014)
[43] Schiper, N., Sutra, P., Pedone, F.: P-store: genuine partial replication in wide area networks. In: Symposium on Reliable Distributed Systems (SRDS) (2010)
[44] Schneider, FB, Implementing fault-tolerant services using the state machine approach: a tutorial, ACM Comput. Surv., 22, 4, 299-319 (1990) · doi:10.1145/98163.98167
[45] Sciascia, D., Pedone, F., Junqueira, F.: Scalable deferred update replication. In: Conference on Dependable Systems and Networks (DSN) (2012)
[46] Skeen, D.: Nonblocking commit protocols. In: Conference on Management of Data (SIGMOD) (1981)
[47] Sovran, Y., Power, R., Aguilera, M.K., Li, J.: Transactional storage for geo-replicated systems. In: Symposium on Operating Systems Principles (SOSP) (2011)
[48] Thomson, A., Diamond, T., Weng, S., Ren, K., Shao, P., Abadi, D.J.: Calvin: fast distributed transactions for partitioned database systems. In: Conference on Management of Data (SIGMOD) (2012)
[49] Weikum, G.; Vossen, G., Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery (2001), Burlington: Morgan Kaufmann Publishers Inc., Burlington
[50] Zhang, I., Sharma, N.K., Szekeres, A., Krishnamurthy, A., Ports, D.R.K.: Building consistent transactions with inconsistent replication. In: Symposium on Operating Systems Principles (SOSP) (2015)
[51] Zhang, I.; Sharma, NK; Szekeres, A.; Krishnamurthy, A.; Ports, DRK, When is operation ordering required in replicated transactional storage?, IEEE Data Eng. Bull., 39, 1, 27-38 (2016)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.