Abstract
With the growing popularity of IoT nowadays, tremendous amount of time series data at high resolution is being generated, transmitted, stored, and processed by modern sensor networks in different application domains, which naturally incurs extensive storage and computation cost in practice. Data compression is the key to resolve such challenge, and various compression techniques, either lossless or lossy, have been proposed and widely adopted in industry and academia. Although existing approaches are generally successful, we observe a unique characteristic in certain time series data, i.e., significant periodicity and strong randomness, which leads to poor compression performance using existing methods and hence calls for a specifically designed compression mechanism that can utilise the periodic and stochastic patterns at the same time. To this end, we propose a decomposition-based compression algorithm which divides the original time series into several components reflecting periodicity and randomness respectively, and then approximates each component accordingly to guarantee overall compression ratio and maximum error. We conduct extensive evaluation on a real world dataset, and the experimental results verify the superiority of our proposals compared with current state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ratanamahatana, C.A., Lin, J., Gunopulos, D., Keogh, E., Vlachos, M., Das, G.: Mining time series data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 1049–1077. Springer, Boston (2009)
Eichinger, F., Efros, P., Karnouskos, S., Böhm, K.: A time-series compression technique and its application to the smart grid. VLDB J. 24(2), 193–218 (2015)
Tak-chung, F.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)
Esling, P., Agon, C.: Time-series data mining. ACM Comput. Surveys (CSUR) 45(1), 12 (2012)
Hung, N.Q.V., Jeung, H., Aberer, K.: An evaluation of model-based approaches to sensor data compression. IEEE Trans. Knowl. Data Eng. 25(11), 2434–2447 (2013)
Keogh, E., Kasetty, S.: On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining Knowl. Discov. 7(4), 349–371 (2003)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast Subsequence Matching in Time-Series Databases, vol. 23. ACM, New York (1994)
Chan, K.P., Fu, A.W.C.: Efficient time series matching by wavelets. In: 1999 Proceedings of 15th International Conference on Data Engineering, pp. 126–133. IEEE (1999)
Shahabi, C., Tian, X., Zhao, W.: Tsa-tree: A wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data. In: Proceedings of 12th International Conference on Scientific and Statistical Database Management, pp. 55–68. IEEE (2000)
Lin, J., Keogh, E., Li, W., Lonardi, S.: Experiencing SAX: a novel symbolic representation of time series. Data Mining Knowl. Discov. 15(2), 107 (2007)
Shieh, J., Keogh, E.: i sax: indexing and mining terabyte sized time series. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 623–631. ACM (2008)
Yi, B.K., Faloutsos, C.: Fast time sequence indexing for arbitrary lp norms. In: VLDB (2000)
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst. 3(3), 263–286 (2001)
Lazaridis, I., Mehrotra, S.: Capturing sensor-generated time series with quality guarantees. In: Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405), pp. 429–440 (2003)
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Locally adaptive dimensionality reduction for indexing large time series databases. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, SIGMOD 2001, pp. 151–162. ACM, New York, NY, USA (2001)
Buragohain, C., Shrivastava, N., Suri, S.: Space efficient streaming algorithms for the maximum error histogram. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 1026–1035 (2007)
Chen, Q., Chen, L., Lian, X., Liu, Y., Yu., J.X.: Indexable pla for efficient similarity search. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, pp. 435–446. VLDB Endowment (2007)
Keogh, E., Chu, S., Hart, D., Pazzani, M.: An online algorithm for segmenting time series. In: Proceedings 2001 IEEE International Conference on Data Mining, pp. 289–296 (2001)
Elmeleegy, H., Elmagarmid, A.K., Cecchet, E., Aref, W.G., Zwaenepoel, W.: Online piece-wise linear approximation of numerical streams with precision guarantees. Proc. VLDB Endow. 2(1), 145–156 (2009)
Ni, J., Ravishankar, C.V.: Indexing spatio-temporal trajectories with efficient polynomial approximations. IEEE Trans. Knowl. Data Eng. 19(5), 663–678 (2007)
Cai, Y., Ng, R.: Indexing spatio-temporal trajectories with chebyshev polynomials. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD 2004, pp. 599–610. ACM New York, USA (2004)
Acknowledgment
This research is partially supported by the Australian Research Council (Grant No.DP170101172) and the Queensland Government (Grant No.AQRF12516).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ruan, B., Hua, W., Zhang, R., Zhou, X. (2017). Decomposition-Based Approximation of Time Series Data with Max-Error Guarantees. In: Huang, Z., Xiao, X., Cao, X. (eds) Databases Theory and Applications. ADC 2017. Lecture Notes in Computer Science(), vol 10538. Springer, Cham. https://doi.org/10.1007/978-3-319-68155-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-68155-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68154-2
Online ISBN: 978-3-319-68155-9
eBook Packages: Computer ScienceComputer Science (R0)