×

Extensible block-level storage virtualization in cluster-based systems. (English) Zbl 1233.68054

Summary: High-performance storage systems are evolving towards decentralized commodity clusters that can scale in capacity, processing power, and network throughput. Building such systems requires: (a) sharing physical resources among applications; (b) sharing data among applications; (c) allowing customized data views. Current solutions typically satisfy the first two requirements through a cluster file-system, resulting in monolithic, hard-to-manage systems. In this paper we present a storage system that addresses all three requirements by extending the block layer below the file-system. First, we discuss how our system provides customized (virtualized) storage views within a single node. Then, we discuss how it scales in clustered setups. To achieve efficient resource and data sharing we support block-level allocation and locking as in-band mechanisms. We implement a prototype under Linux and use it to build a shared cluster file-system. Our evaluation in a 24-node cluster setup concludes that our approach offers flexibility, scalability and reduced effort to implement new functionality.

MSC:

68M14 Distributed systems

Software:

IOzone; FiST; Horus
Full Text: DOI

References:

[1] M. Abd-El-Malek, et al. Ursa Minor: versatile cluster-based storage, in: Proceedings of the 4th USENIX FAST Conference, 2005.
[2] Aguilera, M. K.; Merchant, A.; Shah, M.; Veitch, A.; Karamanolis, C.: Sinfonia: a new paradigm for building scalable distributed systems, (2007)
[3] K.A. Amiri, et al. Highly concurrent shared storage, in: Proceedings of ICDCS’00 Conference, 2000.
[4] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, Xen and the art of virtualization, in: Proceedings of the 19th ACM SOSP, Oct. 2003, pp. 164–177.
[5] Bohossian, Vasken: Computing in the RAIN: a reliable array of independent nodes, IEEE transactions in parallel distributed systems 12, No. 2, 99-114 (2001)
[6] P.H. Carns, W.B. Ligon III, R.B. Ross, R. Thakur, PVFS: a parallel file system for linux clusters, in: Proceedings of the 4th Annual Linux Showcase and Conference, 2000.
[7] Chen, P. M.; Noble, B. D.: When virtual is better than real, (2001)
[8] Compellent. Storage center data sheet. www.compellent.com.
[9] W.V. Courtright II, G.A. Gibson, M. Holland, J. Zelenka, RAIDframe: rapid prototyping for disk arrays, in: Proceedings of the 1996 Conference on Measurement and Modeling of Computer Systems, SIGMETRICS, vol. 24 (1), May 1996 pp. 268–269.
[10] T. Denehy, A. Arpaci-Dusseau, R. Arpaci-Dusseau, Bridging the information gap in storage protocol stacks in: Proceedings of the USENIX Annual Technical Conference, USENIX’02, June 2002, pp. 177–190.
[11] EMC. Enginuity(TM): The Storage Platform Operating Environment (White Paper). http://www.emc.com/pdf/techlib/c1033.pdf.
[12] EMC. Introducing RAID 5 on Symmetrix DMX. http://www.emc.com/products/systems/enginuity/pdf/H1114_Intro_raid5_DMX_ldv.pdf.
[13] R. English, S. Alexander, Loge: A self-organizing disk controller, in: Proceedings of the Winter 1992 USENIX Conference, 1992.
[14] Enterprise Volume Management System. evms.sourceforge.net.
[15] M.D. Flouris, et al. Using lightweight transactions and snapshots for fault-tolerant services based on shared storage bricks, in: Proceedings of the International Workshop on High Performance I/O Techniques and Deployment of Very Large Scale I/O Systems, HiperIO, 2006.
[16] M.D. Flouris, A. Bilas, Clotho: transparent data versioning at the block I/O level, in: 12th NASA Goddard & 21st IEEE Conference on Mass Storage Systems and Technologies, MSST2004, Apr. 2004.
[17] M.D. Flouris, A. Bilas, Violin: a framework for extensible block-level storage, in: Proceedings of 13th IEEE/NASA Goddard (MSST2005) Conference on Mass Storage Systems and Technologies, Apr. 11–14 2005.
[18] Flouris, M. D.; Lachaize, R.; Bilas, A.: Violin: a framework for extensible block-level storage, Coregrid series 3, 83-98 (2007)
[19] M.D. Flouris, R. Lachaize, A. Bilas, Orchestra: extensible block-level support for resource and data sharing in networked storage systems, in: Proceedings of the 14th IEEE International Conference on Parallel and Distributed Systems, ICPADS’08, 2008.
[20] FreeBSD: GEOM Modular Disk I/O Request Transformation Framework. http://kerneltrap.org/node/view/454.
[21] G.A. Gibson, et al. A cost-effective, high-bandwidth storage architecture, in: Proceedings of the 8th ASPLOS Conference, 1998.
[22] J. Gray, Storage bricks have arrived, Invited Talk at the 1st USENIX Conference on File and Storage Technologies (FAST), 2002.
[23] Hagmann, R.: Reimplementing the cedar file system using logging and group commit, , 155-162 (1987)
[24] Heidemann, J.; Popek, G.: File system development with stackable layers, ACM transactions on computer systems 12, No. 1, 58-89 (1994)
[25] HP. OpenView Storage Area Manager. http://h18006.www1.hp.com/products/storage/software/sam/index.html.
[26] M.de Icaza, I. Molnar, G. Oxman, The linux RAID-1,-4,-5 code, in: LinuxExpo, Apr. 1997.
[27] I/O Performance Inc. XDD v.6.3. www.ioperformance.com.
[28] Iometer team. Iometer: The I/O performance analysis tool. http://www.iometer.org.
[29] W.de Jonge, M.F. Kaashoek, W.C. Hsieh, The logical disk: a new approach to improving file systems. in: Proceedings of 14th SOSP, 1993, pp. 15–28.
[30] M.F. Kaashoek, D.R. Engler, G.R. Ganger, H. Briceno, R. Hunt, D. Mazieres, T. Pinckney, R. Grimm, J. Janotti, K. Mackenzie, Application performance and flexibility on exokernel systems, in: Symposium on Operating Systems Principles, 1997, pp. 52–65.
[31] J. Katcher, PostMark: a new file system benchmark. http://www.netapp.com/tech_library/3022.html.
[32] K. Keeton, J. Wilkes, Automatic design of dependable data storage systems, in: Proceedings of Workshop on Algorithms and Architectures for Self-managing Systems, San Diego, CA, June 2003, pp. 7–12.
[33] D.R. Kenchammana-Hosekotea, R.A. Golding, C. Fleiner, O.A. Zaki, The design and evaluation of network RAID protocols. Report RJ 10316, IBM Almaden Research Center, 2004.
[34] W. Kenneth, Preslan, et al. A 64-bit, shared disk file system for linux Proceedings of the Conference on Mass Storage Systems and Technologies, MSST’99, 1999.
[35] Kohler, E.; Morris, R.; Chen, B.; Jannotti, J.; Kaashoek, M. F.: The click modular router, ACM transactions on computer systems 18, No. 3, 263-297 (2000)
[36] Kronenberg, N. P.: Vaxcluster: a closely-coupled distributed system, ACM transactions on computer systems 4, No. 2 (1986)
[37] E.K. Lee, et al. Petal: distributed virtual disks, in: Proceedings of the 7th ACM ASPLOS Conference, 1996.
[38] Lehey, G.: The vinum volume manager, , 57-68 (1999)
[39] C.R. Lumb, et al. D-SPTF: decentralized request distribution in brick-based storage systems, in: Proceedings of the 11th ACM ASPLOS Conference, 2004.
[40] J. MacCormick, et al. Boxwood: abstractions as the foundation for storage infrastructure, in: Proceedings of the 6th Symposium on Operating Systems Design and Impl., OSDI, 2004.
[41] Menon, J.; Pease, D. A.; Rees, R.; Duyanovich, L.; Hillsberg, B.: IBM storage tank – a heterogeneous scalable SAN file system, IBM systems journal 42, No. 2, 250-267 (2003)
[42] D. Mosberger, L.L. Peterson, Making paths explicit in the scout operating system, in: Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Impl., OSDI96, Oct. 28–31 1996.
[43] W.D. Norcott, D. Capps, IOzone Filesystem Benchmark. Oct. 2006, http://www.iozone.org.
[44] O’malley, S. W.; Peterson, L. L.: A dynamic network architecture, ACM transactions on computer systems 10, No. 2, 110-143 (1992)
[45] W. Oney, Programming the Microsoft Windows Driver Model, Second Ed. ISBN: 13: 9780735618039, ISBN 10: 0-7356-1803-8.
[46] Phillips, B.: Industry trends: have storage area networks come of age?, Computer 31, No. 7, 10-12 (1998)
[47] K.W. Preslan, et al. Scalability and recovery in a linux cluster file system, in: Proceedings of the 4th Annual Linux Showcase and Conference, 2000.
[48] PVFS2 project. PVFS2 home page. http://www.pvfs.org.
[49] Sean Quinlan, Sean Dorward, Venti: a new approach to archival data storage, in: Proceedings of the FAST’02 Conference, USENIX, Jan. 28–30 2002, pp. 89–102.
[50] Y. Saito, et al. FAB: building distributed enterprise arrays from commodity components, in: Proceedings of the 11th ACM ASPLOS Conference, 2004.
[51] P.W. Schermerhorn, et al. User-level extensibility in the mona file system, in: Proceedings of Freenix 2001.
[52] F. Schmuck, et al. GPFS: a shared-disk file system for large computing centers, in: USENIX Conference on File and Storage Technologies, FAST, 2002.
[53] R.A. Shillner, et al. Simplifying distributed file systems using a shared logical disk. Technical Report TR-524-96, Princeton University, 1996.
[54] M. Sivathanu, V. Prabhakaran, F. Popovici, T.E. Denehy, A.C. Arpaci-Dusseau, R.H. Arpaci-Dusseau, Semantically-smart disk systems, in: Proceedings of the FAST’03 Conference on File and Storage Technologies, FAST-03. USENIX Association, Apr. 2003.
[55] G.C. Skinner, et al. Stacking/vnodes: a progress report, in: Proceedings of the USENIX Summer Technical Conference, 1993.
[56] L. Stein, Stupid file systems are better, in: Proceedings of the 10th Workshop on Hot Topics on Operating Systems, HotOS X, June 2005.
[57] Stodolsky, D.; Holland, M.; William, I.; Courtright, V.; Gibson, G. A.: Parity-logging disk arrays, ACM transactions on computer systems 12, No. 3, 206-235 (1994)
[58] D. Teigland, H. Mauelshagen, Volume managers in linux, in: Proceedings of USENIX 2001 Technical Conference, June 2001.
[59] C.A. Thekkath, et al. Frangipani: a scalable distributed file system, in: Proceedings of the 16th Symp. on Operating Systems Principles, SOSP, 1997.
[60] R. van Renesse, K.P. Birman, R. Friedman, M. Hayden, D.A. Karr, A framework for protocol composition in horus, in: Symposium on Principles of Distributed Computing, 1995, pp. 80–89. · Zbl 1373.68064
[61] Veritas. Storage Foundation(TM). http://www.veritas.com/Products/www?c=product&refId=203.
[62] Veritas. Volume Manager(TM). http://www.veritas.com.
[63] R.Y. Wang, T.E. Anderson, D.A. Patterson, Virtual log based file systems for a programmable disk, in: Proceedings of Operating Systems Design and Implementation, OSDI, 1999, pp. 29–43.
[64] A. Warfield, R. Ross, K. Fraser, C. Limpach, S. Hand, Parallax: managing storage for a million machines, in: Proceedings of the 10th USENIX Workshop on Hot Topics in Operating Systems, June 2005.
[65] A. Whitaker, M. Shaw, S.D. Gribble, Scale and performance in the Denali isolation kernel, in: Proceedings of OSDI 2002. USENIX, 2002.
[66] Wilkes, J.: Traveling to Rome: QoS specifications for automated storage system management, (2001) · Zbl 0980.68868
[67] Wilkes, J.; Golding, R. A.; Staelin, C.; Sullivan, T.: The HP autoraid hierarchical storage system, ACM transactions on computer systems 14, No. 1, 108-136 (1996)
[68] XIV Ltd. Delivering the Thin provisioning advantage with xiv’s nextra architecture white paper. http://www.xivstorage.com/materials/white_papers/nextra_thin_provisioning_white_paper.pdf.
[69] XIV Ltd. Nextra Snapshot implementation white paper. www.xivstorage.com.
[70] E. Zadok, et al. FiST: a language for stackable file systems, in: Proceedings of the USENIX Annual Technical Conference, 2000.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.