Skip to main content

Showing 1–16 of 16 results for author: Miettinen, P

  1. arXiv:2406.04148  [pdf, other

    cs.LG cs.DB

    Fast Redescription Mining Using Locality-Sensitive Hashing

    Authors: Maiju Karjalainen, Esther Galbrun, Pauli Miettinen

    Abstract: Redescription mining is a data analysis technique that has found applications in diverse fields. The most used redescription mining approaches involve two phases: finding matching pairs among data attributes and extending the pairs. This process is relatively efficient when the number of attributes remains limited and when the attributes are Boolean, but becomes almost intractable when the data co… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 20 pages, 4 figures, to appear at ECML-PKDD 2024

  2. arXiv:2307.07396  [pdf, other

    cs.LG

    Visualizing Overlapping Biclusterings and Boolean Matrix Factorizations

    Authors: Thibault Marette, Pauli Miettinen, Stefan Neumann

    Abstract: Finding (bi-)clusters in bipartite graphs is a popular data analysis approach. Analysts typically want to visualize the clusters, which is simple as long as the clusters are disjoint. However, many modern algorithms find overlapping clusters, making visualization more complicated. In this paper, we study the problem of visualizing \emph{a given clustering} of overlapping clusters in bipartite grap… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: 17 pages, 7 figures, to be published in ECML PKDD 2023

  3. arXiv:2212.06630  [pdf, other

    cs.DB cs.CR

    Differentially Private Tree-Based Redescription Mining

    Authors: Matej Mihelčić, Pauli Miettinen

    Abstract: Differential privacy provides a strong form of privacy and allows preserving most of the original characteristics of the dataset. Utilizing these benefits requires one to design specific differentially private data analysis algorithms. In this work, we present three tree-based algorithms for mining redescriptions while preserving differential privacy. Redescription mining is an exploratory data an… ▽ More

    Submitted 28 February, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: 50 pages, 15 figures

  4. arXiv:2206.01483  [pdf, other

    cs.LG

    Finding Rule-Interpretable Non-Negative Data Representation

    Authors: Matej Mihelčić, Pauli Miettinen

    Abstract: Non-negative Matrix Factorization (NMF) is an intensively used technique for obtaining parts-based, lower dimensional and non-negative representation of non-negative data. It is a popular method in different research fields. Scientists performing research in the fields of biology, medicine and pharmacy often prefer NMF over other dimensionality reduction approaches (such as PCA) because the non-ne… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

    Comments: 13 pages, 3 figures. Journal manuscript

  5. arXiv:2012.03138  [pdf, other

    cs.LG cs.AI cs.DS

    Biclustering and Boolean Matrix Factorization in Data Streams

    Authors: Stefan Neumann, Pauli Miettinen

    Abstract: We study the clustering of bipartite graphs and Boolean matrix factorization in data streams. We consider a streaming setting in which the vertices from the left side of the graph arrive one by one together with all of their incident edges. We provide an algorithm that, after one pass over the stream, recovers the set of clusters on the right side of the graph using sublinear space; to the best of… ▽ More

    Submitted 5 December, 2020; originally announced December 2020.

    Comments: This technical report is the slightly extended version of a paper [34] which appeared at VLDB'20

  6. arXiv:2012.03127  [pdf, ps, other

    cs.LG cs.AI

    Recent Developments in Boolean Matrix Factorization

    Authors: Pauli Miettinen, Stefan Neumann

    Abstract: The goal of Boolean Matrix Factorization (BMF) is to approximate a given binary matrix as the product of two low-rank binary factor matrices, where the product of the factor matrices is computed under the Boolean algebra. While the problem is computationally hard, it is also attractive because the binary nature of the factor matrices makes them highly interpretable. In the last decade, BMF has rec… ▽ More

    Submitted 5 December, 2020; originally announced December 2020.

    Comments: This technical report is the slightly extended version of a survey which appeared at IJCAI'20

  7. Boolean matrix factorization meets consecutive ones property

    Authors: Nikolaj Tatti, Pauli Miettinen

    Abstract: Boolean matrix factorization is a natural and a popular technique for summarizing binary matrices. In this paper, we study a problem of Boolean matrix factorization where we additionally require that the factor matrices have consecutive ones property (OBMF). A major application of this optimization problem comes from graph visualization: standard techniques for visualizing graphs are circular or l… ▽ More

    Submitted 17 January, 2019; originally announced January 2019.

    Comments: 13 pages, 6 figures. To appear in 2019 SIAM International Conference on Data Mining (SDM19). For the associated source code, see https://cs.uef.fi/~pauli/bmf/ordered_bmf/

    Journal ref: Proc. 2019 SIAM International Conference on Data Mining (SDM19), 2019, 729-737

  8. Hybrid ASP-based Approach to Pattern Mining

    Authors: Sergey Paramonov, Daria Stepanova, Pauli Miettinen

    Abstract: Detecting small sets of relevant patterns from a given dataset is a central challenge in data mining. The relevance of a pattern is based on user-provided criteria; typically, all patterns that satisfy certain criteria are considered relevant. Rule-based languages like Answer Set Programming (ASP) seem well-suited for specifying such criteria in a form of constraints. Although progress has been ma… ▽ More

    Submitted 22 August, 2018; originally announced August 2018.

    Comments: 29 pages, 7 figures, 5 tables

    Journal ref: Theory and Practice of Logic Programming 19 (2019) 505-535

  9. arXiv:1801.06136  [pdf, other

    cs.LG

    Latitude: A Model for Mixed Linear-Tropical Matrix Factorization

    Authors: Sanjar Karaev, James Hook, Pauli Miettinen

    Abstract: Nonnegative matrix factorization (NMF) is one of the most frequently-used matrix factorization models in data analysis. A significant reason to the popularity of NMF is its interpretability and the `parts of whole' interpretation of its components. Recently, max-times, or subtropical, matrix factorization (SMF) has been introduced as an alternative model with equally interpretable `winner takes it… ▽ More

    Submitted 18 January, 2018; originally announced January 2018.

    Comments: 14 pages, 6 figures. To appear in 2018 SIAM International Conference on Data Mining (SDM '18). For the source code, see https://people.mpi-inf.mpg.de/~pmiettin/linear-tropical/

  10. arXiv:1709.00900  [pdf, ps, other

    cs.CC

    Reductions for Frequency-Based Data Mining Problems

    Authors: Stefan Neumann, Pauli Miettinen

    Abstract: Studying the computational complexity of problems is one of the - if not the - fundamental questions in computer science. Yet, surprisingly little is known about the computational complexity of many central problems in data mining. In this paper we study frequency-based problems and propose a new type of reduction that allows us to compare the complexities of the maximal frequent pattern mining pr… ▽ More

    Submitted 4 September, 2017; originally announced September 2017.

    Comments: This is an extended version of a paper of the same title to appear in the Proceedings of the 17th IEEE International Conference on Data Mining (ICDM'17)

  11. arXiv:1707.08872  [pdf, other

    cs.LG

    Algorithms for Approximate Subtropical Matrix Factorization

    Authors: Sanjar Karaev, Pauli Miettinen

    Abstract: Matrix factorization methods are important tools in data mining and analysis. They can be used for many tasks, ranging from dimensionality reduction to visualization. In this paper we concentrate on the use of matrix factorizations for finding patterns from the data. Rather than using the standard algebra -- and the summation of the rank-1 components to build the approximation of the original matr… ▽ More

    Submitted 19 July, 2017; originally announced July 2017.

    Comments: 40 pages, 9 figures. For the associated source code, see http://people.mpi-inf.mpg.de/~pmiettin/tropical/

  12. arXiv:1609.05034  [pdf, other

    cs.DM

    What You Will Gain By Rounding: Theory and Algorithms for Rounding Rank

    Authors: Stefan Neumann, Rainer Gemulla, Pauli Miettinen

    Abstract: When factorizing binary matrices, we often have to make a choice between using expensive combinatorial methods that retain the discrete nature of the data and using continuous methods that can be more efficient but destroy the discrete structure. Alternatively, we can first compute a continuous factorization and subsequently apply a rounding procedure to obtain a discrete representation. But what… ▽ More

    Submitted 6 October, 2016; v1 submitted 16 September, 2016; originally announced September 2016.

    Comments: 14 pages, 7 figures. For associated source code, see http://dws.informatik.uni-mannheim.de/en/resources/software/rounding-rank/ . This is an extended version of a paper accepted for publication in the proceedings of the 2016 IEEE International Conference on Data Mining (ICDM)

  13. arXiv:1602.04650  [pdf, other

    cs.SI physics.soc-ph

    Hyperbolae Are No Hyperbole: Modelling Communities That Are Not Cliques

    Authors: Saskia Metzler, Stephan Günnemann, Pauli Miettinen

    Abstract: Cliques are frequently used to model communities: a community is a set of nodes where each pair is equally likely to be connected. But studying real-world communities reveals that they have more structure than that. In particular, the nodes can be ordered in such a way that (almost) all edges in the community lie below a hyperbola. In this paper we present three new models for communities that cap… ▽ More

    Submitted 4 October, 2016; v1 submitted 15 February, 2016; originally announced February 2016.

    Comments: 31 pages, 18 figures. This is an extended version of a paper of the same title accepted for publication in the proceedings of the 2016 IEEE International Conference on Data Mining (ICDM). For source code, see http://people.mpi-inf.mpg.de/~pmiettin/hybobo/

  14. On Defining SPARQL with Boolean Tensor Algebra

    Authors: Saskia Metzler, Pauli Miettinen

    Abstract: The Resource Description Framework (RDF) represents information as subject-predicate-object triples. These triples are commonly interpreted as a directed labelled graph. We propose an alternative approach, interpreting the data as a 3-way Boolean tensor. We show how SPARQL queries - the standard queries for RDF - can be expressed as elementary operations in Boolean algebra, giving us a complete re… ▽ More

    Submitted 1 March, 2015; originally announced March 2015.

  15. Clustering Boolean Tensors

    Authors: Saskia Metzler, Pauli Miettinen

    Abstract: Tensor factorizations are computationally hard problems, and in particular, are often significantly harder than their matrix counterparts. In case of Boolean tensor factorizations -- where the input tensor and all the factors are required to be binary and we use Boolean algebra -- much of that hardness comes from the possibility of overlapping components. Yet, in many applications we are perfectly… ▽ More

    Submitted 4 January, 2015; originally announced January 2015.

  16. arXiv:1310.4843  [pdf, other

    cs.DS

    Scalable Boolean Tensor Factorizations using Random Walks

    Authors: Dóra Erdős, Pauli Miettinen

    Abstract: Tensors are becoming increasingly common in data mining, and consequently, tensor factorizations are becoming more and more important tools for data miners. When the data is binary, it is natural to ask if we can factorize it into binary factors while simultaneously making sure that the reconstructed tensor is still binary. Such factorizations, called Boolean tensor factorizations, can provide imp… ▽ More

    Submitted 17 October, 2013; originally announced October 2013.