Abstract.
Sensor devices and embedded processors are becoming widespread, especially in measurement/monitoring applications. Their limited resources (CPU, memory and/or communication bandwidth, and power) pose some interesting challenges. We need concise, expressive models to represent the important features of the data and that lend themselves to efficient estimation. In particular, under these severe constraints, we want models and estimation methods that (a) require little memory and a single pass over the data, (b) can adapt and handle arbitrary periodic components, and (c) can deal with various types of noise. We propose \ensuremath{\mathrm{AWSOM}} (Arbitrary Window Stream mOdeling Method), which allows sensors in remote or hostile environments to efficiently and effectively discover interesting patterns and trends. This can be done automatically, i.e., with no prior inspection of the data or any user intervention and expert tuning before or during data gathering. Our algorithms require limited resources and can thus be incorporated into sensors - possibly alongside a distributed query processing engine [10,6,27]. Updates are performed in constant time with respect to stream size using logarithmic space. Existing forecasting methods (SARIMA, GARCH, etc.) and “traditional” Fourier and wavelet analysis fall short on one or more of these requirements. To the best of our knowledge, \ensuremath{\mathrm{AWSOM}} is the first framework that combines all of the above characteristics. Experiments on real and synthetic datasets demonstrate that \ensuremath{\mathrm{AWSOM}} discovers meaningful patterns over long time periods. Thus, the patterns can also be used to make long-range forecasts, which are notoriously difficult to perform. In fact, \ensuremath{\mathrm{AWSOM}} outperforms manually set up autoregressive models, both in terms of long-term pattern detection and modeling and by at least 10 x in resource consumption.
Similar content being viewed by others
References
Akay M (1997) Time frequency and wavelets in biomedical signal processing. Wiley, New York
Arasu A, Babcock B, Babu S, McAlister J, Widom J (2002) Characterizing memory requirements for queries over continuous data streams. In: Proc PODS
Babcock B, Olston C (2003) Distributed top-k monitoring. In: Proc SIGMOD
Beran J (1994) Statistics for long-memory processes. Chapman & Hall, London
Bollerslev T (1986) Generalized autoregressive conditional heteroskedasticity. J Econometr 31:307-327
Bonnet P, Gehrke JE, Seshadri P (2001) Towards sensor database systems. In: Proc MDM
Brockwell PJ, Davis RA (1991) Time series: theory and methods. Springer series in statistics, 2nd edn. Springer, Berlin Heidelberg New York
Bulut A, Singh AK (2003) SWAT: Hierarchical stream summarization in large networks. In: Proc 19th ICDE
Carley LR, Ganger GR, Nagle D (2000) Mems-based integrated-circuit mass-storage systems. Commun ACM 43(11):72-80
Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Seidman G, Stonebraker M, Tatbul N, Zdonik SB (2002) Monitoring streams - a new class of data management applications. In: Proc VLDB
Chen Y, Dong G, Han J, Wah BW, Wang J (2002) Multi-dimensional regression analysis of time-series data streams. In: Proc VLDB
Considine J, Li F, Kollios G, Byers JW (2004) Approximate aggregation techniques for sensor databases. In: Proc ICDE
Das G, Lin L-I, Mannila H, Renganathan G, Smyth P (1998) Rule discovery from time series. In: Proc KDD
Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows. In: Proc SODA
DeGroot MH, Schervish MJ (2002) Probability and statistics, 3rd edn. Addison-Wesley, Reading, MA
Dobra A, Garofalakis MN, Gehrke J, Rastogi R (2002) Processing complex aggregate queries over data streams. In: Proc SIGMOD
Ergün F, Muthukrishnan S, Sahinalp SC (2004) Sublinear methods for detecting periodic trends in datastreams. In: Proc LATIN
C. Faloutsos (1996) Searching multimedia databases by content. Kluwer, Dordrecht
Garofalakis MN, Gibbons PB (2002) Wavelet synopses with error guarantees. In: Proc SIGMOD
Gehrke J, Korn F, Srivastava D (2001) On computing correlated aggregates over continual data streams. In: Proc SIGMOD
Gencay R, Selcuk F, Whitcher B (2001) An introduction to wavelets and other filtering methods in finance and economics. Academic, New York
Gilbert AC, Kotidis Y, Muthukrishnan S, Strauss M (2001) Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In: Proc VLDB
Guha S, Koudas N (2002) Approximating a data stream for querying and estimation: algorithms and performance evaluation. In: Proc ICDE
Hill J, Szewczyk R, Woo A, Hollar S, Culler D, Pister K (2000) System architecture directions for networked sensors. In: Proc ASPLOS-IX
Indyk P, Koudas N, Muthukrishnan S (2000) Identifying representative trends in massive time series data sets using sketches. In: Proc VLDB
Leland W, Taqqu M, Willinger W, Wilson D (1994) On the self-similar nature of ethernet traffic. IEEE Trans Network 2(1):1-15
Madden SR, Shah MA, Hellerstein JM, Raman V (2002) Continuously adaptive continuous queries over streams. In: Proc SIGMOD
Olston C, Jiang J, Widom J (2003) Adaptive filters for continuous queries over distributed data streams. In: Proc SIGMOD
Palpanas T, Vlachos M, Keogh EJ, Gunopulos D, Truppel W (2004) Online amnesic approximation of streaming time series. In: Proc ICDE
Percival DB, Walden AT (2000) Wavelet methods for time series analysis. Cambridge University Press, Cambridge, UK
Riedel E, Faloutsos C, Ganger GR, Nagle D (2000) Data mining on an OLTP system (nearly) for free. In Proc SIGMOD
Tao Y, Faloutsos C, Papadias D, Liu B (2004) Prediction and indexing of moving objects with unknown motion patterns. In: Proc SIGMOD
Weigend AS, Gerschenfeld NA (1994) Time series prediction: forecasting the future and understanding the past. Addison-Wesley, Reading, MA
Yi B-K, Sidiropoulos N, Johnson T, Jagadish H, Faloutsos C, Biliris A (2000) Online data mining for co-evolving time sequences. Proc ICDE
Young P (1984) Recursive estimation and time-series analysis: an introduction. Springer, Berlin Heidelberg New York
Zhang D, Gunopulos D, Tsotras VJ, Seeger B (2002) Temporal aggregation over data streams using multiple granularities. In: Proc EDBT
Zhu Y, Shasha D (2002) Statstream: statistical monitoring of thousands of data streams in real time. In: Proc VLDB
Zhu Y, Shasha D (2003) Efficient elastic burst detection in data streams. In: Proc KDD
Zuidwijk R, de Zeeuw P (1998) Fast algorithm for directional time-scale analysis using wavelets. In: Proc SPIE, Wavelet Applications in Signal and Image Processing VI, vol 3458
Author information
Authors and Affiliations
Corresponding author
Additional information
Received: 2 January 2004, Accepted: 23 March 2004, Published online: 12 August 2004
Edited by: S. Abitebou
Anthony Brockwell: This material is based upon work supported by the National Science Foundation under Grants Nos. DMS-9819950 and IIS-0083148.
Christos Faloutsos: This material is based upon work supported by the National Science Foundation under Grants Nos. IIS-9817496, IIS-9988876, IIS-0083148, IIS-0113089, IIS-0209107, IIS-0205224, INT-0318547, SE NSOR-0329549, EF-0331657, and IIS-0326322, by the Pennsylvania Infrastructure Technology Alliance (PITA) Grant No. 22-901-0001, and by the Defense Advanced Research Projects Agency under Contract No. N66001-00-1-8936. Additional funding was provided by donations from Intel and by a gift from Northrop-Grumman Corporation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or other funding parties.
Rights and permissions
About this article
Cite this article
Papadimitriou, S., Brockwell, A. & Faloutsos, C. Adaptive, unsupervised stream mining. VLDB 13, 222–239 (2004). https://doi.org/10.1007/s00778-004-0130-8
Issue Date:
DOI: https://doi.org/10.1007/s00778-004-0130-8