skip to main content
research-article
Open access

Minimizing Congestion for Balanced Dominators

Published: 14 August 2022 Publication History

Abstract

A primary challenge in metagenomics is reconstructing individual microbial genomes from the mixture of short fragments created by sequencing. Recent work leverages the sparsity of the assembly graph to find r-dominating sets which enable rapid approximate queries through a dominator-centric graph partition. In this paper, we consider two problems related to reducing uncertainty and improving scalability in this setting.
First, we observe that nodes with multiple closest dominators necessitate arbitrary tie-breaking in the existing pipeline. As such, we propose findingsparse dominating sets which minimize this effect via a newcongestion parameter. We prove minimizing congestion is NP-hard, and give an O (√Δr) approximation algorithm, where Δ is the max degree.
To improve scalability, the graph should be partitioned into uniformly sized pieces, subject to placing vertices with a closest dominator. This leads to balanced neighborhood partitioning : given an r-dominating set, find a partition into connected subgraphs with optimal uniformity so that each vertex is co-assigned with some closest dominator. Using variance of piece sizes to measure uniformity, we show this problem is NP-hard iff r is greater than 1. We design and analyze several algorithms, including a polynomial-time approach which is exact when r=1 (and heuristic otherwise).
We complement our theoretical results with computational experiments on a corpus of real-world networks showing sparse dominating sets lead to more balanced neighborhood partitionings. Further, on the metagenome fHuSB1, our approach maintains high query containment and similarity while reducing piece size variance.

Supplemental Material

MP4 File
A primary challenge in metagenomics is reconstructing microbial genomes from the mixture of short fragments. Recent work uses r-dominating sets of the assembly graph to enable rapid queries through a dominator-centric graph partition. In this talk, we consider two problems related to reducing uncertainty and improving scalability in this setting. First, we observe that nodes with multiple closest dominators necessitate arbitrary tie-breaking. As such, we consider finding "sparse" dominating sets which minimize this effect via a new "congestion" parameter. To improve scalability, the graph should be partitioned evenly. This leads to "balanced neighborhood partitioning": given an r-dominating set, find a partition into pieces so that each vertex is co-assigned with some closest dominator and piece sizes are as uniform as possible. We complement our theoretical results with experiments on real-world networks showing sparse dominating sets lead to more balanced neighborhood partitionings.

References

[1]
Noga Alon, Dana Moshkovitz, and Shmuel Safra. 2006. Algorithmic construction of sets for k-restrictions. ACM Transactions on Algorithms (TALG), Vol. 2, 2 (2006), 153--177.
[2]
David A Bader, Henning Meyerhenke, Peter Sanders, and Dorothea Wagner. 2013. Graph partitioning and graph clustering. Vol. 588. American Mathematical Society, Providence, RI.
[3]
C. Titus Brown, Dominik Moritz, Michael P. O'Brien, Felix Reidl, Taylor Reiter, and Blair D. Sullivan. 2020. Exploring neighborhoods in large metagenome assembly graphs using spacegraphcats reveals hidden sequence diversity. Genome Biology, Vol. 21, 1 (06 Jul 2020), 164.
[4]
Rayan Chikhi, Antoine Limasset, and Paul Medvedev. 2016. Compacting de Bruijn graphs from sequencing data quickly and in low memory. Bioinformatics, Vol. 32, 12 (06 2016), i201--i208.
[5]
Zdenk Dvo?ák. 2013. Constant-factor approximation of the domination number in sparse graphs. European Journal of Combinatorics, Vol. 34, 5 (2013), 833--840.
[6]
Carl Einarson and Felix Reidl. 2020. A General Kernelization Technique for Domination and Independence Problems in Sparse Classes. In 15th International Symposium on Parameterized and Exact Computation (IPEC 2020) (Leibniz International Proceedings in Informatics (LIPIcs), Vol. 180). 11:1--11:15.
[7]
Lester Randolph Ford and Delbert R Fulkerson. 1956. Maximal flow through a network. Canadian journal of Mathematics, Vol. 8 (1956), 399--404.
[8]
Michael R Garey and David S Johnson. 1979. Computers and intractability. Vol. 174. freeman San Francisco.
[9]
Ping Hu, Lauren Tom, Andrea Singh, Brian C. Thomas, Brett J. Baker, Yvette M. Piceno, Gary L. Andersen, Jillian F. Banfield, and Nicole Dubilier. 2016. Genome-Resolved Metagenomic Analysis Reveals Roles for Candidate Phyla and Other Microbial Community Members in Biogeochemical Transformations in Oil Reservoirs. mBio, Vol. 7, 1 (2016), e01669-15.
[10]
Lars Jaffke, O joung Kwon, Torstein J.F. Strømme, and Jan Arne Telle. 2019. Mim-width III. Graph powers and generalized distance domination problems. Theoretical Computer Science, Vol. 796 (2019), 216--236.
[11]
Jan Kratochvíl. 1994. Regular codes in regular graphs are difficult. Discrete Mathematics, Vol. 133, 1 (1994), 191--205.
[12]
Marilynn Livingston and Q. Stout. 1997. Perfect Dominating Sets. Congressus Numerantium, Vol. 79 (08 1997).
[13]
Yosuke Mizutani, Annie Staker, and Blair D. Sullivan. 2022. Accompanying source code. https://github.com/TheoryInPractice/sparsedomsets.
[14]
Minh Nguyen, Minh Hà, Diep Nguyen, and The Tran. 2020. Solving the k-dominating set problem on very large-scale networks. Comp. Social Networks, Vol. 7 (07 2020).
[15]
Christopher Quince, Alan W Walker, Jared T Simpson, Nicholas J Loman, and Nicola Segata. 2017. Shotgun metagenomics, from sampling to analysis. Nature biotechnology, Vol. 35, 9 (2017), 833--844.
[16]
Ran Raz and Shmuel Safra. 1997. A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing. 475--484.
[17]
Ryan A. Rossi and Nesreen K. Ahmed. 2015. The Network Data Repository with Interactive Graph Analytics and Visualization. In AAAI.
[18]
Peter J. Slater. 1976. R-Domination in Graphs. J. ACM, Vol. 23, 3 (July 1976), 446--450.
[19]
Zhong Wang, Wei Wang, Joon-Mo Kim, Bhavani Thuraisingham, and Weili Wu. 2012. PTAS for the minimum weighted dominating set in growth bounded graphs. Journal of Global Optimization, Vol. 54, 3 (2012), 641--648.

Index Terms

  1. Minimizing Congestion for Balanced Dominators

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
    August 2022
    5033 pages
    ISBN:9781450393850
    DOI:10.1145/3534678
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 August 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. congestion
    2. dominating sets
    3. graph partitioning
    4. metagenomics

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    KDD '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 249
      Total Downloads
    • Downloads (Last 12 months)75
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 21 Oct 2024

    Other Metrics

    Citations

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media