skip to main content
research-article

Fast Euclidean OPTICS with Bounded Precision in Low Dimensional Space

Published: 27 May 2018 Publication History

Abstract

OPTICS is a popular method for visualizing multidimensional clusters. All the existing implementations of this method have a time complexity of $O(n^2)$ --- where n is the size of the input dataset --- and thus, may not be suitable for datasets of large volumes. This paper alleviates the problem by resorting to approximation with guarantees. The main result is a new algorithm that runs in $O(n łog n)$ time under any fixed dimensionality, and computes a visualization that has provably small discrepancies from that of OPTICS. As a side product, our algorithm gives an index structure that occupies linear space, and supports the cluster group-by query with near-optimal cost. The quality of the cluster visualizations produced by our techniques and the efficiency of the proposed algorithms are demonstrated with an empirical evaluation on real data.

References

[1]
Elke Achtert, Christian Böhm, and Peer Kröger . 2006. DeLi-Clu: Boosting Robustness, Completeness, Usability, and Efficiency of Hierarchical Clustering by a Closest Pair Ranking Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). 119--128.
[2]
Arne Andersson, Torben Hagerup, Stefan Nilsson, and Rajeev Raman . 1998. Sorting in Linear Time? Journal of Computer and System Sciences (JCSS), Vol. 57, 1 (1998), 74--93.
[3]
Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, and Jörg Sander . 1999. OPTICS: Ordering Points To Identify the Clustering Structure Proceedings of ACM Management of Data (SIGMOD). 49--60.
[4]
Sunil Arya and Timothy M. Chan . 2014. Better ε-Dependencies for Offline Approximate Nearest Neighbor Search, Euclidean Minimum Spanning Trees, and ε-Kernels Proceedings of Symposium on Computational Geometry (SoCG). 416.
[5]
Markus M. Breunig, Hans-Peter Kriegel, and Jörg Sander . 2000. Fast Hierarchical Clustering Based on Compressed Data and OPTICS Proceedings of European Conference on Principles of Data Mining and Knowledge Discovery (PKDD). 232--242.
[6]
Paul B. Callahan and S. Rao Kosaraju . 1993. Faster Algorithms for Some Geometric Graph Problems in Higher Dimensions Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). 291--300.
[7]
Paul B. Callahan and S. Rao Kosaraju . 1995. A Decomposition of Multidimensional Point Sets with Applications to k-Nearest-Neighbors and n-Body Potential Fields. Journal of the ACM (JACM) Vol. 42, 1 (1995), 67--90.
[8]
Ricardo J. G. B. Campello, Davoud Moulavi, Arthur Zimek, and Jörg Sander . 2015. Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 10, 1 (2015), 5:1--5:51.
[9]
Mark de Berg, Ade Gunawan, and Marcel Roeloffzen . 2017. Faster DB-scan and HDB-scan in Low-Dimensional Euclidean Spaces. CoRR Vol. abs/1702.08607 (2017).
[10]
Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu . 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise Proceedings of ACM Knowledge Discovery and Data Mining (SIGKDD). 226--231.
[11]
Johannes Fischer and Volker Heun . 2006. Theoretical and Practical Improvements on the RMQ-Problem, with Applications to LCA and LCE. In Proceedings of Annual Symposium on Combinatorial Pattern Matching (CPM). 36--48.
[12]
Jordi Fonollosa, Sadique Sheik, Ramon Huerta, and Santiago Marco . 2015. Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sensors and Actuators B: Chemical Vol. 215 (2015), 618--629.
[13]
Junhao Gan and Yufei Tao . 2017 a. Dynamic Density Based Clustering. In Proceedings of ACM Management of Data (SIGMOD). 1493--1507.
[14]
Junhao Gan and Yufei Tao . 2017 b. On the Hardness and Approximation of Euclidean DBSCAN. ACM Transactions on Database Systems (TODS), Vol. 42, 3 (2017), 14:1--14:45.
[15]
Ramón Huerta, Thiago Mosqueiro, Jordi Fonollosa, Nikolai F. Rulkov, and Irene Rodr'ıguez-Luján . 2016. Online Decorrelation of Humidity and Temperature in Chemical Sensors for Continuous Monitoring. CoRR Vol. abs/1608.01719 (2016).
[16]
David R. Karger, Philip N. Klein, and Robert Endre Tarjan . 1995. A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees. Journal of the ACM (JACM) Vol. 42, 2 (1995), 321--328.
[17]
Valerie King . 1997. A Simpler Minimum Spanning Tree Verification Algorithm. Algorithmica, Vol. 18, 2 (1997), 263--270.
[18]
David G. Kirkpatrick and Stefan Reisch . 1984. Upper Bounds for Sorting Integers on Random Access Machines. Theoretical Computer Science Vol. 28 (1984), 263--276.
[19]
Md. Mostofa Ali Patwary, Diana Palsetia, Ankit Agrawal, Wei-keng Liao, Fredrik Manne, and Alok N. Choudhary . 2013. Scalable parallel OPTICS data clustering using graph algorithmic techniques Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis. 49:1--49:12.
[20]
Johannes Schneider and Michail Vlachos . 2017. Scalable density-based clustering with quality guarantees using random projections. Data Min. Knowl. Discov. Vol. 31, 4 (2017), 972--1005.
[21]
Aniko Vagner . 2016. The GridOPTICS clustering algorithm. Intell. Data Anal., Vol. 20, 5 (2016), 1061--1084.
[22]
Pravin M. Vaidya . 1989. An O(n log n) Algorithm for the All-nearest.Neighbors Problem. Discrete & Computational Geometry Vol. 4 (1989), 101--115.

Cited By

View all
  • (2023)FINEX: A Fast Index for Exact & Flexible Density-Based ClusteringProceedings of the ACM on Management of Data10.1145/35889251:1(1-25)Online publication date: 30-May-2023
  • (2022)Scalable and Accurate Density-Peaks Clustering on Fully Dynamic Data2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020690(445-454)Online publication date: 17-Dec-2022
  • (2021)Dynamic Structural Clustering on GraphsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452828(1491-1503)Online publication date: 9-Jun-2021
  • Show More Cited By

Index Terms

  1. Fast Euclidean OPTICS with Bounded Precision in Low Dimensional Space

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
    May 2018
    1874 pages
    ISBN:9781450347037
    DOI:10.1145/3183713
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 May 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. density-based clustering
    2. optics
    3. visualizations

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGMOD/PODS '18
    Sponsor:

    Acceptance Rates

    SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 23 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)FINEX: A Fast Index for Exact & Flexible Density-Based ClusteringProceedings of the ACM on Management of Data10.1145/35889251:1(1-25)Online publication date: 30-May-2023
    • (2022)Scalable and Accurate Density-Peaks Clustering on Fully Dynamic Data2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020690(445-454)Online publication date: 17-Dec-2022
    • (2021)Dynamic Structural Clustering on GraphsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452828(1491-1503)Online publication date: 9-Jun-2021
    • (2020)Adaptive Density-Based Spatial Clustering for Massive Data AnalysisIEEE Access10.1109/ACCESS.2020.29694408(23346-23358)Online publication date: 2020
    • (2020)“Show Me the Crowds!” Revealing Cluster Structures Through AMTICSData Science and Engineering10.1007/s41019-020-00137-x5:4(360-374)Online publication date: 11-Aug-2020

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media