subscribe to arXiv mailings

A Law of Large Numbers for SIR on the Stochastic Block Model: A Proof via Herd Immunity

Authors: Christian Borgs, Karissa Huang, Christian Ikeokwu

Abstract: In this paper, we study the dynamics of the susceptible-infected-recovered (SIR) model on a network with community structure, namely the stochastic block model (SBM). As usual, the SIR model is a stochastic model for an epidemic where infected vertices infect susceptible neighbors at some rate $η$ and recover at rate $γ$, and the SBM is a random graph model where vertices are partitioned into a fi… ▽ More In this paper, we study the dynamics of the susceptible-infected-recovered (SIR) model on a network with community structure, namely the stochastic block model (SBM). As usual, the SIR model is a stochastic model for an epidemic where infected vertices infect susceptible neighbors at some rate $η$ and recover at rate $γ$, and the SBM is a random graph model where vertices are partitioned into a finite number of communities. The connection probability between two vertices depends on their community affiliation, here scaled so that the average degrees have a finite limit as the network grows. We prove laws of large numbers (LLN) for the epidemic's trajectory to a system of ordinary differential equations over any time horizon (finite or infinite), including in particular a LLN for the final size of the infection. Our proofs rely on two main ingredients: (i) a new coupling of the SIR epidemic and the randomness of the SBM, revealing a vector-valued random variable that drives the epidemic (related to what is usually called the ``force of the infection'' via a linear transformation), and (ii) a novel technique for analyzing the limiting behavior of the infinite time horizon for the infection, using the fact that once the infection passes the herd immunity threshold it dies out quickly and has a negligible impact on the overall size of the infection. △ Less

Submitted 9 October, 2024; originally announced October 2024.

arXiv:2409.10584 [pdf, other]

Manifold-Constrained Nucleus-Level Denoising Diffusion Model for Structure-Based Drug Design

Authors: Shengchao Liu, Divin Yan, Weitao Du, Weiyang Liu, Zhuoxinran Li, Hongyu Guo, Christian Borgs, Jennifer Chayes, Anima Anandkumar

Abstract: Artificial intelligence models have shown great potential in structure-based drug design, generating ligands with high binding affinities. However, existing models have often overlooked a crucial physical constraint: atoms must maintain a minimum pairwise distance to avoid separation violation, a phenomenon governed by the balance of attractive and repulsive forces. To mitigate such separation vio… ▽ More Artificial intelligence models have shown great potential in structure-based drug design, generating ligands with high binding affinities. However, existing models have often overlooked a crucial physical constraint: atoms must maintain a minimum pairwise distance to avoid separation violation, a phenomenon governed by the balance of attractive and repulsive forces. To mitigate such separation violations, we propose NucleusDiff. It models the interactions between atomic nuclei and their surrounding electron clouds by enforcing the distance constraint between the nuclei and manifolds. We quantitatively evaluate NucleusDiff using the CrossDocked2020 dataset and a COVID-19 therapeutic target, demonstrating that NucleusDiff reduces violation rate by up to 100.00% and enhances binding affinity by up to 22.16%, surpassing state-of-the-art models for structure-based drug design. We also provide qualitative analysis through manifold sampling, visually confirming the effectiveness of NucleusDiff in reducing separation violations and improving binding affinities. △ Less

Submitted 30 September, 2024; v1 submitted 16 September, 2024; originally announced September 2024.

arXiv:2405.02128 [pdf]

Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo

Authors: Nakul Rampal, Kaiyu Wang, Matthew Burigana, Lingxiang Hou, Juri Al-Johani, Anna Sackmann, Hanan S. Murayshid, Walaa Abdullah Al-Sumari, Arwa M. Al-Abdulkarim, Nahla Eid Al-Hazmi, Majed O. Al-Awad, Christian Borgs, Jennifer T. Chayes, Omar M. Yaghi

Abstract: The rapid advancement in artificial intelligence and natural language processing has led to the development of large-scale datasets aimed at benchmarking the performance of machine learning models. Herein, we introduce 'RetChemQA,' a comprehensive benchmark dataset designed to evaluate the capabilities of such models in the domain of reticular chemistry. This dataset includes both single-hop and m… ▽ More The rapid advancement in artificial intelligence and natural language processing has led to the development of large-scale datasets aimed at benchmarking the performance of machine learning models. Herein, we introduce 'RetChemQA,' a comprehensive benchmark dataset designed to evaluate the capabilities of such models in the domain of reticular chemistry. This dataset includes both single-hop and multi-hop question-answer pairs, encompassing approximately 45,000 Q&As for each type. The questions have been extracted from an extensive corpus of literature containing about 2,530 research papers from publishers including NAS, ACS, RSC, Elsevier, and Nature Publishing Group, among others. The dataset has been generated using OpenAI's GPT-4 Turbo, a cutting-edge model known for its exceptional language understanding and generation capabilities. In addition to the Q&A dataset, we also release a dataset of synthesis conditions extracted from the corpus of literature used in this study. The aim of RetChemQA is to provide a robust platform for the development and evaluation of advanced machine learning algorithms, particularly for the reticular chemistry community. The dataset is structured to reflect the complexities and nuances of real-world scientific discourse, thereby enabling nuanced performance assessments across a variety of tasks. The dataset is available at the following link: https://github.com/nakulrampal/RetChemQA △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2401.15122 [pdf, other]

A Multi-Grained Symmetric Differential Equation Model for Learning Protein-Ligand Binding Dynamics

Authors: Shengchao Liu, Weitao Du, Yanjing Li, Zhuoxinran Li, Vignesh Bhethanabotla, Nakul Rampal, Omar Yaghi, Christian Borgs, Anima Anandkumar, Hongyu Guo, Jennifer Chayes

Abstract: In drug discovery, molecular dynamics (MD) simulation for protein-ligand binding provides a powerful tool for predicting binding affinities, estimating transport properties, and exploring pocket sites. There has been a long history of improving the efficiency of MD simulations through better numerical methods and, more recently, by utilizing machine learning (ML) methods. Yet, challenges remain, s… ▽ More In drug discovery, molecular dynamics (MD) simulation for protein-ligand binding provides a powerful tool for predicting binding affinities, estimating transport properties, and exploring pocket sites. There has been a long history of improving the efficiency of MD simulations through better numerical methods and, more recently, by utilizing machine learning (ML) methods. Yet, challenges remain, such as accurate modeling of extended-timescale simulations. To address this issue, we propose NeuralMD, the first ML surrogate that can facilitate numerical MD and provide accurate simulations in protein-ligand binding. We propose a principled approach that incorporates a novel physics-informed multi-grained group symmetric framework. Specifically, we propose (1) a BindingNet model that satisfies group symmetry using vector frames and captures the multi-level protein-ligand interactions, and (2) an augmented neural differential equation solver that learns the trajectory under Newtonian mechanics. For the experiment, we design ten single-trajectory and three multi-trajectory binding simulation tasks. We show the efficiency and effectiveness of NeuralMD, with a 2000$\times$ speedup over standard numerical MD simulation and outperforming all other ML approaches by up to 80% under the stability metric. We further qualitatively show that NeuralMD reaches more stable binding predictions compared to other machine learning methods. △ Less

Submitted 1 February, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

arXiv:2312.05468 [pdf]

Image and Data Mining in Reticular Chemistry Using GPT-4V

Authors: Zhiling Zheng, Zhiguo He, Omar Khattab, Nakul Rampal, Matei A. Zaharia, Christian Borgs, Jennifer T. Chayes, Omar M. Yaghi

Abstract: The integration of artificial intelligence into scientific research has reached a new pinnacle with GPT-4V, a large language model featuring enhanced vision capabilities, accessible through ChatGPT or an API. This study demonstrates the remarkable ability of GPT-4V to navigate and obtain complex data for metal-organic frameworks, especially from graphical sources. Our approach involved an automate… ▽ More The integration of artificial intelligence into scientific research has reached a new pinnacle with GPT-4V, a large language model featuring enhanced vision capabilities, accessible through ChatGPT or an API. This study demonstrates the remarkable ability of GPT-4V to navigate and obtain complex data for metal-organic frameworks, especially from graphical sources. Our approach involved an automated process of converting 346 scholarly articles into 6240 images, which represents a benchmark dataset in this task, followed by deploying GPT-4V to categorize and analyze these images using natural language prompts. This methodology enabled GPT-4V to accurately identify and interpret key plots integral to MOF characterization, such as nitrogen isotherms, PXRD patterns, and TGA curves, among others, with accuracy and recall above 93%. The model's proficiency in extracting critical information from these plots not only underscores its capability in data mining but also highlights its potential in aiding the creation of comprehensive digital databases for reticular chemistry. In addition, the extracted nitrogen isotherm data from the selected literature allowed for a comparison between theoretical and experimental porosity values for over 200 compounds, highlighting certain discrepancies and underscoring the importance of integrating computational and experimental data. This work highlights the potential of AI in accelerating scientific discovery and innovation, bridging the gap between computational tools and experimental research, and paving the way for more efficient, inclusive, and comprehensive scientific inquiry. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: 36 pages, 24 figures

arXiv:2306.14915 [pdf]

doi 10.1002/anie.202311983

A GPT-4 Reticular Chemist for Guiding MOF Discovery

Authors: Zhiling Zheng, Zichao Rong, Nakul Rampal, Christian Borgs, Jennifer T. Chayes, Omar M. Yaghi

Abstract: We present a new framework integrating the AI model GPT-4 into the iterative process of reticular chemistry experimentation, leveraging a cooperative workflow of interaction between AI and a human researcher. This GPT-4 Reticular Chemist is an integrated system composed of three phases. Each of these utilizes GPT-4 in various capacities, wherein GPT-4 provides detailed instructions for chemical ex… ▽ More We present a new framework integrating the AI model GPT-4 into the iterative process of reticular chemistry experimentation, leveraging a cooperative workflow of interaction between AI and a human researcher. This GPT-4 Reticular Chemist is an integrated system composed of three phases. Each of these utilizes GPT-4 in various capacities, wherein GPT-4 provides detailed instructions for chemical experimentation and the human provides feedback on the experimental outcomes, including both success and failures, for the in-context learning of AI in the next iteration. This iterative human-AI interaction enabled GPT-4 to learn from the outcomes, much like an experienced chemist, by a prompt-learning strategy. Importantly, the system is based on natural language for both development and operation, eliminating the need for coding skills, and thus, make it accessible to all chemists. Our collaboration with GPT-4 Reticular Chemist guided the discovery of an isoreticular series of MOFs, with each synthesis fine-tuned through iterative feedback and expert suggestions. This workflow presents a potential for broader applications in scientific research by harnessing the capability of large language models like GPT-4 to enhance the feasibility and efficiency of research activities. △ Less

Submitted 3 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: 173 pages (9-page manuscript and 164 pages of supporting information) Submitted to Angewandte Chemie International Edition

Journal ref: Angew. Chem. Int. Ed. 2023, e202311983

arXiv:2306.11296 [pdf]

doi 10.1021/jacs.3c05819

ChatGPT Chemistry Assistant for Text Mining and Prediction of MOF Synthesis

Authors: Zhiling Zheng, Oufan Zhang, Christian Borgs, Jennifer T. Chayes, Omar M. Yaghi

Abstract: We use prompt engineering to guide ChatGPT in the automation of text mining of metal-organic frameworks (MOFs) synthesis conditions from diverse formats and styles of the scientific literature. This effectively mitigates ChatGPT's tendency to hallucinate information -- an issue that previously made the use of Large Language Models (LLMs) in scientific fields challenging. Our approach involves the… ▽ More We use prompt engineering to guide ChatGPT in the automation of text mining of metal-organic frameworks (MOFs) synthesis conditions from diverse formats and styles of the scientific literature. This effectively mitigates ChatGPT's tendency to hallucinate information -- an issue that previously made the use of Large Language Models (LLMs) in scientific fields challenging. Our approach involves the development of a workflow implementing three different processes for text mining, programmed by ChatGPT itself. All of them enable parsing, searching, filtering, classification, summarization, and data unification with different tradeoffs between labor, speed, and accuracy. We deploy this system to extract 26,257 distinct synthesis parameters pertaining to approximately 800 MOFs sourced from peer-reviewed research articles. This process incorporates our ChemPrompt Engineering strategy to instruct ChatGPT in text mining, resulting in impressive precision, recall, and F1 scores of 90-99%. Furthermore, with the dataset built by text mining, we constructed a machine-learning model with over 86% accuracy in predicting MOF experimental crystallization outcomes and preliminarily identifying important factors in MOF crystallization. We also developed a reliable data-grounded MOF chatbot to answer questions on chemical reactions and synthesis procedures. Given that the process of using ChatGPT reliably mines and tabulates diverse MOF synthesis information in a unified format, while using only narrative language requiring no coding expertise, we anticipate that our ChatGPT Chemistry Assistant will be very useful across various other chemistry sub-disciplines. △ Less

Submitted 19 July, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: Published on Journal of the American Chemical Society (2023); 102 pages (18-page manuscript, 84 pages of supporting information)

Journal ref: J. Am. Chem. Soc. 2023, 145, 32, 18048-18062

arXiv:2306.09375 [pdf, other]

Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials

Authors: Shengchao Liu, Weitao Du, Yanjing Li, Zhuoxinran Li, Zhiling Zheng, Chenru Duan, Zhiming Ma, Omar Yaghi, Anima Anandkumar, Christian Borgs, Jennifer Chayes, Hongyu Guo, Jian Tang

Abstract: Artificial intelligence for scientific discovery has recently generated significant interest within the machine learning and scientific communities, particularly in the domains of chemistry, biology, and material discovery. For these scientific problems, molecules serve as the fundamental building blocks, and machine learning has emerged as a highly effective and powerful tool for modeling their g… ▽ More Artificial intelligence for scientific discovery has recently generated significant interest within the machine learning and scientific communities, particularly in the domains of chemistry, biology, and material discovery. For these scientific problems, molecules serve as the fundamental building blocks, and machine learning has emerged as a highly effective and powerful tool for modeling their geometric structures. Nevertheless, due to the rapidly evolving process of the field and the knowledge gap between science (e.g., physics, chemistry, & biology) and machine learning communities, a benchmarking study on geometrical representation for such data has not been conducted. To address such an issue, in this paper, we first provide a unified view of the current symmetry-informed geometric methods, classifying them into three main categories: invariance, equivariance with spherical frame basis, and equivariance with vector frame basis. Then we propose a platform, coined Geom3D, which enables benchmarking the effectiveness of geometric strategies. Geom3D contains 16 advanced symmetry-informed geometric representation models and 14 geometric pretraining methods over 46 diverse datasets, including small molecules, proteins, and crystalline materials. We hope that Geom3D can, on the one hand, eliminate barriers for machine learning researchers interested in exploring scientific problems; and, on the other hand, provide valuable guidance for researchers in computational chemistry, structural biology, and materials science, aiding in the informed selection of representation techniques for specific applications. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2305.14537 [pdf, other]

Disincentivizing Polarization in Social Networks

Authors: Christian Borgs, Jennifer Chayes, Christian Ikeokwu, Ellen Vitercik

Abstract: On social networks, algorithmic personalization drives users into filter bubbles where they rarely see content that deviates from their interests. We present a model for content curation and personalization that avoids filter bubbles, along with algorithmic guarantees and nearly matching lower bounds. In our model, the platform interacts with $n$ users over $T$ timesteps, choosing content for each… ▽ More On social networks, algorithmic personalization drives users into filter bubbles where they rarely see content that deviates from their interests. We present a model for content curation and personalization that avoids filter bubbles, along with algorithmic guarantees and nearly matching lower bounds. In our model, the platform interacts with $n$ users over $T$ timesteps, choosing content for each user from $k$ categories. The platform receives stochastic rewards as in a multi-arm bandit. To avoid filter bubbles, we draw on the intuition that if some users are shown some category of content, then all users should see at least a small amount of that content. We first analyze a naive formalization of this intuition and show it has unintended consequences: it leads to ``tyranny of the majority'' with the burden of diversification borne disproportionately by those with minority interests. This leads us to our model which distributes this burden more equitably. We require that the probability any user is shown a particular type of content is at least $γ$ times the average probability all users are shown that type of content. Full personalization corresponds to $γ= 0$ and complete homogenization corresponds to $γ= 1$; hence, $γ$ encodes a hard cap on the level of personalization. We also analyze additional formulations where the platform can exceed its cap but pays a penalty proportional to its constraint violation. We provide algorithmic guarantees for optimizing recommendations subject to these constraints. These include nearly matching upper and lower bounds for the entire range of $γ\in [0,1]$ showing that the reward of a multi-agent variant of UCB is nearly optimal. Using real-world preference data, we empirically verify that under our model, users share the burden of diversification with only minor utility loss under our constraints. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2304.06170 [pdf, ps, other]

Locality via Global Ties: Stability of the 2-Core Against Misspecification

Authors: Christian Borgs, Geng Zhao

Abstract: For many random graph models, the analysis of a related birth process suggests local sampling algorithms for the size of, e.g., the giant connected component, the $k$-core, the size and probability of an epidemic outbreak, etc. In this paper, we study the question of when these algorithms are robust against misspecification of the graph model, for the special case of the 2-core. We show that, for… ▽ More For many random graph models, the analysis of a related birth process suggests local sampling algorithms for the size of, e.g., the giant connected component, the $k$-core, the size and probability of an epidemic outbreak, etc. In this paper, we study the question of when these algorithms are robust against misspecification of the graph model, for the special case of the 2-core. We show that, for locally converging graphs with bounded average degrees, under a weak notion of expansion, a local sampling algorithm provides robust estimates for the size of both the 2-core and its largest component. Our weak notion of expansion generalizes the classical definition of expansion, while holding for many well-studied random graph models. Our method involves a two-step sprinkling argument. In the first step, we use sprinkling to establish the existence of a non-empty $2$-core inside the giant, while in the second, we use this non-empty $2$-core as seed for a second sprinkling argument to establish that the giant contains a linear sized $2$-core. The second step is based on a novel coloring scheme for the vertices in the tree-part. Our algorithmic results follow from the structural properties for the $2$-core established in the course of our sprinkling arguments. The run-time of our local algorithm is constant independent of the graph size, with the value of the constant depending on the desired asymptotic accuracy $ε$. But given the existential nature of local limits, our arguments do not give any bound on the functional dependence of this constant on $ε$, nor do they give a bound on how large the graph has to be for the asymptotic additive error bound $ε$ to hold. △ Less

Submitted 12 April, 2023; originally announced April 2023.

arXiv:2205.12803 [pdf, other]

doi 10.1073/pnas.2208975119

Estimating Total Treatment Effect in Randomized Experiments with Unknown Network Structure

Authors: Christina Lee Yu, Edoardo M Airoldi, Christian Borgs, Jennifer T Chayes

Abstract: Randomized experiments are widely used to estimate the causal effects of a proposed treatment in many areas of science, from medicine and healthcare to the physical and biological sciences, from the social sciences to engineering, to public policy and to the technology industry at large. Here, we consider situations where classical methods for estimating the total treatment effect on a target popu… ▽ More Randomized experiments are widely used to estimate the causal effects of a proposed treatment in many areas of science, from medicine and healthcare to the physical and biological sciences, from the social sciences to engineering, to public policy and to the technology industry at large. Here, we consider situations where classical methods for estimating the total treatment effect on a target population are considerably biased due to confounding network effects, i.e., the fact that the treatment of an individual may impact their neighbors' outcomes, an issue referred to as network interference or as non-individualized treatment response. A key challenge in these situations, is that the network is often unknown, and difficult, or costly, to measure. In this paper, we characterize the limitations in estimating the total treatment effect without knowledge of the network that drives interference, assuming a potential outcomes model with heterogeneous additive network effects. This model encompasses a broad class of network interference sources, including spillover, peer effects, and contagion. Within this framework, we show that, surprisingly, given access to average historical baseline measurements prior to the experiment, we can develop a simple estimator and efficient randomized design that outputs an unbiased estimate with low variance. Our solution does not require knowledge of the underlying network structure, and it comes with statistical guarantees for a broad class of models. We believe our results are poised to impact current randomized experimentation strategies due to its ease of interpretation and implementation, alongside its provable theoretical insights under heterogeneous network effects. △ Less

Submitted 24 September, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

arXiv:2110.08961 [pdf, ps, other]

Algorithms Using Local Graph Features to Predict Epidemics

Authors: Yeganeh Alimohammadi, Christian Borgs, Amin Saberi

Abstract: We study a simple model of epidemics where an infected node transmits the infection to its neighbors independently with probability $p$. This is also known as the independent cascade or Susceptible-Infected-Recovered (SIR) model with fixed recovery time. The size of an outbreak in this model is closely related to that of the giant connected component in ``edge percolation'', where each edge of the… ▽ More We study a simple model of epidemics where an infected node transmits the infection to its neighbors independently with probability $p$. This is also known as the independent cascade or Susceptible-Infected-Recovered (SIR) model with fixed recovery time. The size of an outbreak in this model is closely related to that of the giant connected component in ``edge percolation'', where each edge of the graph is kept independently with probability $p$, studied for a large class of networks including configuration model \cite{molloy2011critical} and preferential attachment \cite{bollobas2003,Riordan2005}. Even though these models capture the effects of degree inhomogeneity and the role of super-spreaders in the spread of an epidemic, they only consider graphs that are locally tree like i.e. have a few or no short cycles. Some generalizations of the configuration model were suggested to capture local communities, known as household models \cite{ball2009threshold}, or hierarchical configuration model \cite{Hofstad2015hierarchical}. Here, we ask a different question: what information is needed for general networks to predict the size of an outbreak? Is it possible to make predictions by accessing the distribution of small subgraphs (or motifs)? We answer the question in the affirmative for large-set expanders with local weak limits (also known as Benjamini-Schramm limits). In particular, we show that there is an algorithm which gives a $(1-ε)$ approximation of the probability and the final size of an outbreak by accessing a constant-size neighborhood of a constant number of nodes chosen uniformly at random. We also present corollaries of the theorem for the preferential attachment model, and study generalizations with household (or motif) structure. The latter was only known for the configuration model. △ Less

Submitted 17 October, 2021; originally announced October 2021.

arXiv:2109.08240 [pdf, other]

Strategic Ranking

Authors: Lydia T. Liu, Nikhil Garg, Christian Borgs

Abstract: Strategic classification studies the design of a classifier robust to the manipulation of input by strategic individuals. However, the existing literature does not consider the effect of competition among individuals as induced by the algorithm design. Motivated by constrained allocation settings such as college admissions, we introduce strategic ranking, in which the (designed) individual reward… ▽ More Strategic classification studies the design of a classifier robust to the manipulation of input by strategic individuals. However, the existing literature does not consider the effect of competition among individuals as induced by the algorithm design. Motivated by constrained allocation settings such as college admissions, we introduce strategic ranking, in which the (designed) individual reward depends on an applicant's post-effort rank in a measurement of interest. Our results illustrate how competition among applicants affects the resulting equilibria and model insights. We analyze how various ranking reward designs, belonging to a family of step functions, trade off applicant, school, and societal utility, as well as how ranking design counters inequities arising from disparate access to resources. In particular, we find that randomization in the reward design can mitigate two measures of disparate impact, welfare gap and access. △ Less

Submitted 21 February, 2022; v1 submitted 16 September, 2021; originally announced September 2021.

Comments: 30 pages. To appear in the conference proceedings of AISTATS 2022

arXiv:2103.09952 [pdf, ps, other]

Locality of Random Digraphs on Expanders

Authors: Yeganeh Alimohammadi, Christian Borgs, Amin Saberi

Abstract: We study random digraphs on sequences of expanders with bounded average degree {which converge locally in probability}. We prove that the threshold for the existence of a giant strongly connected component, as well as the asymptotic fraction of nodes with giant fan-in or nodes with giant fan-out are local, in the sense that they are the same for two sequences with the same local limit. The digraph… ▽ More We study random digraphs on sequences of expanders with bounded average degree {which converge locally in probability}. We prove that the threshold for the existence of a giant strongly connected component, as well as the asymptotic fraction of nodes with giant fan-in or nodes with giant fan-out are local, in the sense that they are the same for two sequences with the same local limit. The digraph has a bow-tie structure, with all but a vanishing fraction of nodes lying either in the unique strongly connected giant and its fan-in and fan-out, or in sets with small fan-in and small fan-out. All local quantities are expressed in terms of percolation on the limiting rooted graph, without any structural assumptions on the limit, allowing, in particular, for non tree-like graphs. {In the course of establishing these results, we generalize previous results on the locality of the size of the giant to expanders of bounded average degree with possibly non-tree like limits. We also show that regardless of the local convergence of a sequence, the uniqueness of the giant and convergence of its relative size for unoriented percolation imply the bow-tie structure for directed percolation.} An application of our methods shows that the critical threshold for bond percolation and random digraphs on preferential attachment graphs is $p_c=0$, with an infinite order phase transition at $p_c$. △ Less

Submitted 30 August, 2022; v1 submitted 17 March, 2021; originally announced March 2021.

Comments: Added the proof on the continuity of percolation function on the limit. Generalized the bow-tie structure theorem

MSC Class: 60K37; 05C48; 05C80; 05C20 (Primary) 90B15 (Secondary)

arXiv:2007.14508 [pdf, ps, other]

A large deviation principle for block models

Authors: Christian Borgs, Jennifer Chayes, Julia Gaudio, Samantha Petti, Subhabrata Sen

Abstract: We initiate a study of large deviations for block model random graphs in the dense regime. Following Chatterjee-Varadhan(2011), we establish an LDP for dense block models, viewed as random graphons. As an application of our result, we study upper tail large deviations for homomorphism densities of regular graphs. We identify the existence of a "symmetric" phase, where the graph, conditioned on the… ▽ More We initiate a study of large deviations for block model random graphs in the dense regime. Following Chatterjee-Varadhan(2011), we establish an LDP for dense block models, viewed as random graphons. As an application of our result, we study upper tail large deviations for homomorphism densities of regular graphs. We identify the existence of a "symmetric" phase, where the graph, conditioned on the rare event, looks like a block model with the same block sizes as the generating graphon. In specific examples, we also identify the existence of a "symmetry breaking" regime, where the conditional structure is not a block model with compatible dimensions. This identifies a "reentrant phase transition" phenomenon for this problem---analogous to one established for Erdos-Renyi random graphs (Chatterjee-Dey(2010), Chatterjee-Varadhan(2011)). Finally, extending the analysis of Lubetzky-Zhao(2015), we identify the precise boundary between the symmetry and symmetry breaking regime for homomorphism densities of regular graphs and the operator norm on Erdos-Renyi bipartite graphs. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Comments: 60 pages, 8 figs

MSC Class: 60F10; 05C80; 60C05

arXiv:1910.04123 [pdf, other]

The Disparate Equilibria of Algorithmic Decision Making when Individuals Invest Rationally

Authors: Lydia T. Liu, Ashia Wilson, Nika Haghtalab, Adam Tauman Kalai, Christian Borgs, Jennifer Chayes

Abstract: The long-term impact of algorithmic decision making is shaped by the dynamics between the deployed decision rule and individuals' response. Focusing on settings where each individual desires a positive classification---including many important applications such as hiring and school admissions, we study a dynamic learning setting where individuals invest in a positive outcome based on their group's… ▽ More The long-term impact of algorithmic decision making is shaped by the dynamics between the deployed decision rule and individuals' response. Focusing on settings where each individual desires a positive classification---including many important applications such as hiring and school admissions, we study a dynamic learning setting where individuals invest in a positive outcome based on their group's expected gain and the decision rule is updated to maximize institutional benefit. By characterizing the equilibria of these dynamics, we show that natural challenges to desirable long-term outcomes arise due to heterogeneity across groups and the lack of realizability. We consider two interventions, decoupling the decision rule by group and subsidizing the cost of investment. We show that decoupling achieves optimal outcomes in the realizable case but has discrepant effects that may depend on the initial conditions otherwise. In contrast, subsidizing the cost of investment is shown to create better equilibria for the disadvantaged group even in the absence of realizability. △ Less

Submitted 4 October, 2019; originally announced October 2019.

Comments: 30 pages, 7 figures

arXiv:1909.09298 [pdf, ps, other]

Efficient sampling and counting algorithms for the Potts model on $\mathbb Z^d$ at all temperatures

Authors: Christian Borgs, Jennifer Chayes, Tyler Helmuth, Will Perkins, Prasad Tetali

Abstract: For $d \ge 2$ and all $q\geq q_{0}(d)$ we give an efficient algorithm to approximately sample from the $q$-state ferromagnetic Potts and random cluster models on finite tori $(\mathbb Z / n \mathbb Z )^d$ for any inverse temperature $β\geq 0$. This shows that the physical phase transition of the Potts model presents no algorithmic barrier to efficient sampling, and stands in contrast to Markov cha… ▽ More For $d \ge 2$ and all $q\geq q_{0}(d)$ we give an efficient algorithm to approximately sample from the $q$-state ferromagnetic Potts and random cluster models on finite tori $(\mathbb Z / n \mathbb Z )^d$ for any inverse temperature $β\geq 0$. This shows that the physical phase transition of the Potts model presents no algorithmic barrier to efficient sampling, and stands in contrast to Markov chain mixing time results: the Glauber dynamics mix slowly at and below the critical temperature, and the Swendsen--Wang dynamics mix slowly at the critical temperature. We also provide an efficient algorithm (an FPRAS) for approximating the partition functions of these models at all temperatures. Our algorithms are based on representing the random cluster model as a contour model using Pirogov--Sinai theory, and then computing an accurate approximation of the logarithm of the partition function by inductively truncating the resulting cluster expansion. The main innovation of our approach is an algorithmic treatment of unstable ground states, which is essential for our algorithms to apply to all inverse temperatures $β$. By treating unstable ground states our work gives a general template for converting probabilistic applications of Pirogov-Sinai theory to efficient algorithms. △ Less

Submitted 8 August, 2022; v1 submitted 19 September, 2019; originally announced September 2019.

arXiv:1907.01613 [pdf, ps, other]

A correction to Kallenberg's theorem for jointly exchangeable random measures

Authors: Christian Borgs, Jennifer T. Chayes, Souvik Dhara, Subhabrata Sen

Abstract: Kallenberg (2005) provided a necessary and sufficient condition for the local finiteness of a jointly exchangeable random measure on $\R_+^2$. Here we note an additional condition that was missing in Kallenberg's theorem, but was implicitly used in the proof. We also provide a counter-example when the additional condition does not hold. Kallenberg (2005) provided a necessary and sufficient condition for the local finiteness of a jointly exchangeable random measure on $\R_+^2$. Here we note an additional condition that was missing in Kallenberg's theorem, but was implicitly used in the proof. We also provide a counter-example when the additional condition does not hold. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: 9 pages

arXiv:1907.01605 [pdf, other]

Limits of Sparse Configuration Models and Beyond: Graphexes and Multi-Graphexes

Authors: Christian Borgs, Jennifer T. Chayes, Souvik Dhara, Subhabrata Sen

Abstract: We investigate structural properties of large, sparse random graphs through the lens of "sampling convergence" (Borgs et. al. (2017)). Sampling convergence generalizes left convergence to sparse graphs, and describes the limit in terms of a "graphex". We introduce a notion of sampling convergence for sequences of multigraphs, and establish the graphex limit for the configuration model, a preferent… ▽ More We investigate structural properties of large, sparse random graphs through the lens of "sampling convergence" (Borgs et. al. (2017)). Sampling convergence generalizes left convergence to sparse graphs, and describes the limit in terms of a "graphex". We introduce a notion of sampling convergence for sequences of multigraphs, and establish the graphex limit for the configuration model, a preferential attachment model, the generalized random graph, and a bipartite variant of the configuration model. The results for the configuration model, preferential attachment model and bipartite configuration model provide necessary and sufficient conditions for these random graph models to converge. The limit for the configuration model and the preferential attachment model is an augmented version of an exchangeable random graph model introduced by Caron and Fox (2017). △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: 60 pages, 1 figure

arXiv:1904.05233 [pdf, other]

What's in a Name? Reducing Bias in Bios without Access to Protected Attributes

Authors: Alexey Romanov, Maria De-Arteaga, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, Anna Rumshisky, Adam Tauman Kalai

Abstract: There is a growing body of work that proposes methods for mitigating bias in machine learning systems. These methods typically rely on access to protected attributes such as race, gender, or age. However, this raises two significant challenges: (1) protected attributes may not be available or it may not be legal to use them, and (2) it is often desirable to simultaneously consider multiple protect… ▽ More There is a growing body of work that proposes methods for mitigating bias in machine learning systems. These methods typically rely on access to protected attributes such as race, gender, or age. However, this raises two significant challenges: (1) protected attributes may not be available or it may not be legal to use them, and (2) it is often desirable to simultaneously consider multiple protected attributes, as well as their intersections. In the context of mitigating bias in occupation classification, we propose a method for discouraging correlation between the predicted probability of an individual's true occupation and a word embedding of their name. This method leverages the societal biases that are encoded in word embeddings, eliminating the need for access to protected attributes. Crucially, it only requires access to individuals' names at training time and not at deployment time. We evaluate two variations of our proposed method using a large-scale dataset of online biographies. We find that both variations simultaneously reduce race and gender biases, with almost no reduction in the classifier's overall true positive rate. △ Less

Submitted 10 April, 2019; originally announced April 2019.

Comments: Accepted at NAACL 2019; Best Thematic Paper

arXiv:1901.09451 [pdf, other]

doi 10.1145/3287560.3287572

Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting

Authors: Maria De-Arteaga, Alexey Romanov, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, Adam Tauman Kalai

Abstract: We present a large-scale study of gender bias in occupation classification, a task where the use of machine learning may lead to negative outcomes on peoples' lives. We analyze the potential allocation harms that can result from semantic representation bias. To do so, we study the impact on occupation classification of including explicit gender indicators---such as first names and pronouns---in di… ▽ More We present a large-scale study of gender bias in occupation classification, a task where the use of machine learning may lead to negative outcomes on peoples' lives. We analyze the potential allocation harms that can result from semantic representation bias. To do so, we study the impact on occupation classification of including explicit gender indicators---such as first names and pronouns---in different semantic representations of online biographies. Additionally, we quantify the bias that remains when these indicators are "scrubbed," and describe proxy behavior that occurs in the absence of explicit gender indicators. As we demonstrate, differences in true positive rates between genders are correlated with existing gender imbalances in occupations, which may compound these imbalances. △ Less

Submitted 27 January, 2019; originally announced January 2019.

Comments: Accepted at ACM Conference on Fairness, Accountability, and Transparency (ACM FAT*), 2019

arXiv:1810.12518 [pdf, ps, other]

Private Algorithms Can Always Be Extended

Authors: Christian Borgs, Jennifer Chayes, Adam Smith, Ilias Zadik

Abstract: We consider the following fundamental question on $ε$-differential privacy. Consider an arbitrary $ε$-differentially private algorithm defined on a subset of the input space. Is it possible to extend it to an $ε'$-differentially private algorithm on the whole input space for some $ε'$ comparable with $ε$? In this note we answer affirmatively this question for $ε'=2ε$. Our result applies to every i… ▽ More We consider the following fundamental question on $ε$-differential privacy. Consider an arbitrary $ε$-differentially private algorithm defined on a subset of the input space. Is it possible to extend it to an $ε'$-differentially private algorithm on the whole input space for some $ε'$ comparable with $ε$? In this note we answer affirmatively this question for $ε'=2ε$. Our result applies to every input metric space and space of possible outputs. This result originally appeared in a recent paper by the authors [BCSZ18]. We present a self-contained version in this note, in the hopes that it will be broadly useful. △ Less

Submitted 31 October, 2018; v1 submitted 30 October, 2018; originally announced October 2018.

arXiv:1810.02183 [pdf, ps, other]

Revealing Network Structure, Confidentially: Improved Rates for Node-Private Graphon Estimation

Authors: Christian Borgs, Jennifer Chayes, Adam Smith, Ilias Zadik

Abstract: Motivated by growing concerns over ensuring privacy on social networks, we develop new algorithms and impossibility results for fitting complex statistical models to network data subject to rigorous privacy guarantees. We consider the so-called node-differentially private algorithms, which compute information about a graph or network while provably revealing almost no information about the presenc… ▽ More Motivated by growing concerns over ensuring privacy on social networks, we develop new algorithms and impossibility results for fitting complex statistical models to network data subject to rigorous privacy guarantees. We consider the so-called node-differentially private algorithms, which compute information about a graph or network while provably revealing almost no information about the presence or absence of a particular node in the graph. We provide new algorithms for node-differentially private estimation for a popular and expressive family of network models: stochastic block models and their generalization, graphons. Our algorithms improve on prior work, reducing their error quadratically and matching, in many regimes, the optimal nonprivate algorithm. We also show that for the simplest random graph models ($G(n,p)$ and $G(n,m)$), node-private algorithms can be qualitatively more accurate than for more complex models---converging at a rate of $\frac{1}{ε^2 n^{3}}$ instead of $\frac{1}{ε^2 n^2}$. This result uses a new extension lemma for differentially private algorithms that we hope will be broadly useful. △ Less

Submitted 4 October, 2018; originally announced October 2018.

arXiv:1804.03277 [pdf, other]

Identifiability for graphexes and the weak kernel metric

Authors: Christian Borgs, Jennifer T. Chayes, Henry Cohn, László Miklós Lovász

Abstract: In two recent papers by Veitch and Roy and by Borgs, Chayes, Cohn, and Holden, a new class of sparse random graph processes based on the concept of graphexes over $σ$-finite measure spaces has been introduced. In this paper, we introduce a metric for graphexes that generalizes the cut metric for the graphons of the dense theory of graph convergence. We show that a sequence of graphexes converges i… ▽ More In two recent papers by Veitch and Roy and by Borgs, Chayes, Cohn, and Holden, a new class of sparse random graph processes based on the concept of graphexes over $σ$-finite measure spaces has been introduced. In this paper, we introduce a metric for graphexes that generalizes the cut metric for the graphons of the dense theory of graph convergence. We show that a sequence of graphexes converges in this metric if and only if the sequence of graph processes generated by the graphexes converges in distribution. In the course of the proof, we establish a regularity lemma and determine which sets of graphexes are precompact under our metric. Finally, we establish an identifiability theorem, characterizing when two graphexes are equivalent in the sense that they lead to the same process of random graphs. △ Less

Submitted 9 April, 2018; originally announced April 2018.

Comments: 109 pages

arXiv:1712.00710 [pdf, ps, other]

Iterative Collaborative Filtering for Sparse Matrix Estimation

Authors: Christian Borgs, Jennifer Chayes, Devavrat Shah, Christina Lee Yu

Abstract: We consider sparse matrix estimation where the goal is to estimate an $n\times n$ matrix from noisy observations of a small subset of its entries. We analyze the estimation error of the popularly utilized collaborative filtering algorithm for the sparse regime. Specifically, we propose a novel iterative variant of the algorithm, adapted to handle the setting of sparse observations. We establish th… ▽ More We consider sparse matrix estimation where the goal is to estimate an $n\times n$ matrix from noisy observations of a small subset of its entries. We analyze the estimation error of the popularly utilized collaborative filtering algorithm for the sparse regime. Specifically, we propose a novel iterative variant of the algorithm, adapted to handle the setting of sparse observations. We establish that as long as the fraction of entries observed at random scale as $\frac{\log^{1+κ}(n)}{n}$ for any fixed $κ> 0$, the estimation error with respect to the $\max$-norm decays to $0$ as $n\to\infty$ assuming the underlying matrix of interest has constant rank $r$. Our result is robust to model mis-specification in that if the underlying matrix is approximately rank $r$, then the estimation error decays to the approximate error with respect to the $\max$-norm. In the process, we establish algorithm's ability to handle arbitrary bounded noise in the observations. △ Less

Submitted 10 September, 2021; v1 submitted 3 December, 2017; originally announced December 2017.

arXiv:1708.03237 [pdf, other]

doi 10.1214/18-AOP1320

Sampling perspectives on sparse exchangeable graphs

Authors: Christian Borgs, Jennifer T. Chayes, Henry Cohn, Victor Veitch

Abstract: Recent work has introduced sparse exchangeable graphs and the associated graphex framework, as a generalization of dense exchangeable graphs and the associated graphon framework. The development of this subject involves the interplay between the statistical modeling of network data, the theory of large graph limits, exchangeability, and network sampling. The purpose of the present paper is to clar… ▽ More Recent work has introduced sparse exchangeable graphs and the associated graphex framework, as a generalization of dense exchangeable graphs and the associated graphon framework. The development of this subject involves the interplay between the statistical modeling of network data, the theory of large graph limits, exchangeability, and network sampling. The purpose of the present paper is to clarify the relationships between these subjects by explaining each in terms of a certain natural sampling scheme associated with the graphex model. The first main technical contribution is the introduction of sampling convergence, a new notion of graph limit that generalizes left convergence so that it becomes meaningful for the sparse graph regime. The second main technical contribution is the demonstration that the (somewhat cryptic) notion of exchangeability underpinning the graphex framework is equivalent to a more natural probabilistic invariance expressed in terms of the sampling scheme. △ Less

Submitted 10 February, 2020; v1 submitted 10 August, 2017; originally announced August 2017.

Comments: 45 pages, 1 figure

Journal ref: Annals of Probability 47 (2019), no. 5, 2754-2800

arXiv:1706.01143 [pdf, other]

Graphons: A Nonparametric Method to Model, Estimate, and Design Algorithms for Massive Networks

Authors: Christian Borgs, Jennifer T. Chayes

Abstract: Many social and economic systems are naturally represented as networks, from off-line and on-line social networks, to bipartite networks, like Netflix and Amazon, between consumers and products. Graphons, developed as limits of graphs, form a natural, nonparametric method to describe and estimate large networks like Facebook and LinkedIn. Here we describe the development of the theory of graphons,… ▽ More Many social and economic systems are naturally represented as networks, from off-line and on-line social networks, to bipartite networks, like Netflix and Amazon, between consumers and products. Graphons, developed as limits of graphs, form a natural, nonparametric method to describe and estimate large networks like Facebook and LinkedIn. Here we describe the development of the theory of graphons, for both dense and sparse networks, over the last decade. We also review theorems showing that we can consistently estimate graphons from massive networks in a wide variety of models. Finally, we show how to use graphons to estimate missing links in a sparse network, which has applications from estimating social and information networks in development economics, to rigorously and efficiently doing collaborative filtering with applications to movie recommendations in Netflix and product suggestions in Amazon. △ Less

Submitted 4 June, 2017; originally announced June 2017.

Comments: 7 pages, 2 figures, invited keynote talk delivered by one of the authors (JTC) at the 18th ACM Conference on Economics and Computation (EC 17)

arXiv:1611.01838 [pdf, other]

Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

Authors: Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann LeCun, Carlo Baldassi, Christian Borgs, Jennifer Chayes, Levent Sagun, Riccardo Zecchina

Abstract: This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based object… ▽ More This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based objective function that favors well-generalizable solutions lying in large flat regions of the energy landscape, while avoiding poorly-generalizable solutions located in the sharp valleys. Conceptually, our algorithm resembles two nested loops of SGD where we use Langevin dynamics in the inner loop to compute the gradient of the local entropy before each update of the weights. We show that the new objective has a smoother energy landscape and show improved generalization over SGD using uniform stability, under certain assumptions. Our experiments on convolutional and recurrent networks demonstrate that Entropy-SGD compares favorably to state-of-the-art techniques in terms of generalization error and training time. △ Less

Submitted 21 April, 2017; v1 submitted 6 November, 2016; originally announced November 2016.

Comments: ICLR '17

arXiv:1605.06444 [pdf, other]

doi 10.1073/pnas.1608103113

Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes

Authors: Carlo Baldassi, Christian Borgs, Jennifer Chayes, Alessandro Ingrosso, Carlo Lucibello, Luca Saglietti, Riccardo Zecchina

Abstract: In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost-function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here w… ▽ More In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost-function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare - but extremely dense and accessible - regions of configurations in the network weight space. We define a novel measure, which we call the "robust ensemble" (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models, and also provide a general algorithmic scheme which is straightforward to implement: define a cost-function given by a sum of a finite number of replicas of the original cost-function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful new algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems. △ Less

Submitted 6 October, 2016; v1 submitted 20 May, 2016; originally announced May 2016.

Comments: 31 pages (14 main text, 18 appendix), 12 figures (6 main text, 6 appendix)

Journal ref: Proc. Natl. Acad. Sci. U.S.A. 113(48):E7655-E7662, 2016

arXiv:1602.01889 [pdf, ps, other]

Discovering Neuronal Cell Types and Their Gene Expression Profiles Using a Spatial Point Process Mixture Model

Authors: Furong Huang, Animashree Anandkumar, Christian Borgs, Jennifer Chayes, Ernest Fraenkel, Michael Hawrylycz, Ed Lein, Alessandro Ingrosso, Srinivas Turaga

Abstract: Cataloging the neuronal cell types that comprise circuitry of individual brain regions is a major goal of modern neuroscience and the BRAIN initiative. Single-cell RNA sequencing can now be used to measure the gene expression profiles of individual neurons and to categorize neurons based on their gene expression profiles. While the single-cell techniques are extremely powerful and hold great promi… ▽ More Cataloging the neuronal cell types that comprise circuitry of individual brain regions is a major goal of modern neuroscience and the BRAIN initiative. Single-cell RNA sequencing can now be used to measure the gene expression profiles of individual neurons and to categorize neurons based on their gene expression profiles. While the single-cell techniques are extremely powerful and hold great promise, they are currently still labor intensive, have a high cost per cell, and, most importantly, do not provide information on spatial distribution of cell types in specific regions of the brain. We propose a complementary approach that uses computational methods to infer the cell types and their gene expression profiles through analysis of brain-wide single-cell resolution in situ hybridization (ISH) imagery contained in the Allen Brain Atlas (ABA). We measure the spatial distribution of neurons labeled in the ISH image for each gene and model it as a spatial point process mixture, whose mixture weights are given by the cell types which express that gene. By fitting a point process mixture model jointly to the ISH images, we infer both the spatial point process distribution for each cell type and their gene expression profile. We validate our predictions of cell type-specific gene expression profiles using single cell RNA sequencing data, recently published for the mouse somatosensory cortex. Jointly with the gene expression profiles, cell features such as cell size, orientation, intensity and local density level are inferred per cell type. △ Less

Submitted 10 June, 2016; v1 submitted 4 February, 2016; originally announced February 2016.

arXiv:1601.07134 [pdf, other]

Sparse exchangeable graphs and their limits via graphon processes

Authors: Christian Borgs, Jennifer T. Chayes, Henry Cohn, Nina Holden

Abstract: In a recent paper, Caron and Fox suggest a probabilistic model for sparse graphs which are exchangeable when associating each vertex with a time parameter in $\mathbb{R}_+$. Here we show that by generalizing the classical definition of graphons as functions over probability spaces to functions over $σ$-finite measure spaces, we can model a large family of exchangeable graphs, including the Caron-F… ▽ More In a recent paper, Caron and Fox suggest a probabilistic model for sparse graphs which are exchangeable when associating each vertex with a time parameter in $\mathbb{R}_+$. Here we show that by generalizing the classical definition of graphons as functions over probability spaces to functions over $σ$-finite measure spaces, we can model a large family of exchangeable graphs, including the Caron-Fox graphs and the traditional exchangeable dense graphs as special cases. Explicitly, modelling the underlying space of features by a $σ$-finite measure space $(S,\mathcal{S},μ)$ and the connection probabilities by an integrable function $W\colon S\times S\to [0,1]$, we construct a random family $(G_t)_{t\geq 0}$ of growing graphs such that the vertices of $G_t$ are given by a Poisson point process on $S$ with intensity $tμ$, with two points $x,y$ of the point process connected with probability $W(x,y)$. We call such a random family a graphon process. We prove that a graphon process has convergent subgraph frequencies (with possibly infinite limits) and that, in the natural extension of the cut metric to our setting, the sequence converges to the generating graphon. We also show that the underlying graphon is identifiable only as an equivalence class over graphons with cut distance zero. More generally, we study metric convergence for arbitrary (not necessarily random) sequences of graphs, and show that a sequence of graphs has a convergent subsequence if and only if it has a subsequence satisfying a property we call uniform regularity of tails. Finally, we prove that every graphon is equivalent to a graphon on $\mathbb{R}_+$ equipped with Lebesgue measure. △ Less

Submitted 20 June, 2018; v1 submitted 26 January, 2016; originally announced January 2016.

Comments: 71 pages, 3 figures

Journal ref: Journal of Machine Learning Research 18(210):1-71, 2018

arXiv:1508.06675 [pdf, other]

Consistent nonparametric estimation for heavy-tailed sparse graphs

Authors: Christian Borgs, Jennifer T. Chayes, Henry Cohn, Shirshendu Ganguly

Abstract: We study graphons as a non-parametric generalization of stochastic block models, and show how to obtain compactly represented estimators for sparse networks in this framework. Our algorithms and analysis go beyond previous work in several ways. First, we relax the usual boundedness assumption for the generating graphon and instead treat arbitrary integrable graphons, so that we can handle networks… ▽ More We study graphons as a non-parametric generalization of stochastic block models, and show how to obtain compactly represented estimators for sparse networks in this framework. Our algorithms and analysis go beyond previous work in several ways. First, we relax the usual boundedness assumption for the generating graphon and instead treat arbitrary integrable graphons, so that we can handle networks with long tails in their degree distributions. Second, again motivated by real-world applications, we relax the usual assumption that the graphon is defined on the unit interval, to allow latent position graphs where the latent positions live in a more general space, and we characterize identifiability for these graphons and their underlying position spaces. We analyze three algorithms. The first is a least squares algorithm, which gives an approximation we prove to be consistent for all square-integrable graphons, with errors expressed in terms of the best possible stochastic block model approximation to the generating graphon. Next, we analyze a generalization based on the cut norm, which works for any integrable graphon (not necessarily square-integrable). Finally, we show that clustering based on degrees works whenever the underlying degree distribution is atomless. Unlike the previous two algorithms, this third one runs in polynomial time. △ Less

Submitted 24 February, 2016; v1 submitted 26 August, 2015; originally announced August 2015.

Comments: 48 pages

arXiv:1506.06162 [pdf, other]

Private Graphon Estimation for Sparse Graphs

Authors: Christian Borgs, Jennifer T. Chayes, Adam Smith

Abstract: We design algorithms for fitting a high-dimensional statistical model to a large, sparse network without revealing sensitive information of individual members. Given a sparse input graph $G$, our algorithms output a node-differentially-private nonparametric block model approximation. By node-differentially-private, we mean that our output hides the insertion or removal of a vertex and all its adja… ▽ More We design algorithms for fitting a high-dimensional statistical model to a large, sparse network without revealing sensitive information of individual members. Given a sparse input graph $G$, our algorithms output a node-differentially-private nonparametric block model approximation. By node-differentially-private, we mean that our output hides the insertion or removal of a vertex and all its adjacent edges. If $G$ is an instance of the network obtained from a generative nonparametric model defined in terms of a graphon $W$, our model guarantees consistency, in the sense that as the number of vertices tends to infinity, the output of our algorithm converges to $W$ in an appropriate version of the $L_2$ norm. In particular, this means we can estimate the sizes of all multi-way cuts in $G$. Our results hold as long as $W$ is bounded, the average degree of $G$ grows at least like the log of the number of vertices, and the number of blocks goes to infinity at an appropriate rate. We give explicit error bounds in terms of the parameters of the model; in several settings, our bounds improve on or match known nonprivate results. △ Less

Submitted 19 June, 2015; originally announced June 2015.

Comments: 36 pages

arXiv:1410.5152 [pdf, ps, other]

Fixed-Points of Social Choice: An Axiomatic Approach to Network Communities

Authors: Christian Borgs, Jennifer Chayes, Adrian Marple, Shang-Hua Teng

Abstract: We provide the first social choice theory approach to the question of what constitutes a community in a social network. Inspired by the classic preferences models in social choice theory, we start from an abstract social network framework, called preference networks; these consist of a finite set of members where each member has a total-ranking preference of all members in the set. Within this f… ▽ More We provide the first social choice theory approach to the question of what constitutes a community in a social network. Inspired by the classic preferences models in social choice theory, we start from an abstract social network framework, called preference networks; these consist of a finite set of members where each member has a total-ranking preference of all members in the set. Within this framework, we develop two complementary approaches to axiomatically study the formation and structures of communities. (1) We apply social choice theory and define communities indirectly by postulating that they are fixed points of a preference aggregation function obeying certain desirable axioms. (2) We directly postulate desirable axioms for communities without reference to preference aggregation, leading to eight natural community axioms. These approaches allow us to formulate and analyze community rules. We prove a taxonomy theorem that provides a structural characterization of the family of community rules that satisfies all eight axioms. The structure is actually quite beautiful: these community rules form a bounded lattice under the natural intersection and union operations. Our structural theorem is complemented with a complexity result: while identifying a community by the most selective rule of the lattice is in P, deciding if a subset is a community by the most comprehensive rule of the lattice is coNP-complete. Our studies also shed light on the limitations of defining community rules solely based on preference aggregation: any aggregation function satisfying Arrow's IIA axiom, or based on commonly used aggregation schemes like the Borda count or generalizations thereof, lead to communities which violate at least one of our community axioms. Finally, we give a polynomial-time rule consistent with seven axioms and weakly satisfying the eighth axiom. △ Less

Submitted 20 October, 2014; originally announced October 2014.

Comments: 39 pages

arXiv:1408.0744 [pdf, other]

doi 10.1214/17-AOP1187

An $L^p$ theory of sparse graph convergence II: LD convergence, quotients, and right convergence

Authors: Christian Borgs, Jennifer T. Chayes, Henry Cohn, Yufei Zhao

Abstract: We extend the $L^p$ theory of sparse graph limits, which was introduced in a companion paper, by analyzing different notions of convergence. Under suitable restrictions on node weights, we prove the equivalence of metric convergence, quotient convergence, microcanonical ground state energy convergence, microcanonical free energy convergence, and large deviation convergence. Our theorems extend the… ▽ More We extend the $L^p$ theory of sparse graph limits, which was introduced in a companion paper, by analyzing different notions of convergence. Under suitable restrictions on node weights, we prove the equivalence of metric convergence, quotient convergence, microcanonical ground state energy convergence, microcanonical free energy convergence, and large deviation convergence. Our theorems extend the broad applicability of dense graph convergence to all sparse graphs with unbounded average degree, while the proofs require new techniques based on uniform upper regularity. Examples to which our theory applies include stochastic block models, power law graphs, and sparse versions of $W$-random graphs. △ Less

Submitted 4 August, 2014; originally announced August 2014.

Comments: 48 pages

Journal ref: Annals of Probability 46 (2018), 337--396

arXiv:1401.2906 [pdf, other]

doi 10.1090/tran/7543

An $L^p$ theory of sparse graph convergence I: limits, sparse random graph models, and power law distributions

Authors: Christian Borgs, Jennifer T. Chayes, Henry Cohn, Yufei Zhao

Abstract: We introduce and develop a theory of limits for sequences of sparse graphs based on $L^p$ graphons, which generalizes both the existing $L^\infty$ theory of dense graph limits and its extension by Bollobás and Riordan to sparse graphs without dense spots. In doing so, we replace the no dense spots hypothesis with weaker assumptions, which allow us to analyze graphs with power law degree distributi… ▽ More We introduce and develop a theory of limits for sequences of sparse graphs based on $L^p$ graphons, which generalizes both the existing $L^\infty$ theory of dense graph limits and its extension by Bollobás and Riordan to sparse graphs without dense spots. In doing so, we replace the no dense spots hypothesis with weaker assumptions, which allow us to analyze graphs with power law degree distributions. This gives the first broadly applicable limit theory for sparse graphs with unbounded average degrees. In this paper, we lay the foundations of the $L^p$ theory of graphons, characterize convergence, and develop corresponding random graph models, while we prove the equivalence of several alternative metrics in a companion paper. △ Less

Submitted 29 December, 2014; v1 submitted 13 January, 2014; originally announced January 2014.

Comments: 44 pages

Journal ref: Trans. Amer. Math. Soc. 372 (2019), 3019--3062

arXiv:1401.2792 [pdf, ps, other]

doi 10.1214/12-AOP755

Asymptotic behavior and distributional limits of preferential attachment graphs

Authors: Noam Berger, Christian Borgs, Jennifer T. Chayes, Amin Saberi

Abstract: We give an explicit construction of the weak local limit of a class of preferential attachment graphs. This limit contains all local information and allows several computations that are otherwise hard, for example, joint degree distributions and, more generally, the limiting distribution of subgraphs in balls of any given radius $k$ around a random vertex in the preferential attachment graph. We a… ▽ More We give an explicit construction of the weak local limit of a class of preferential attachment graphs. This limit contains all local information and allows several computations that are otherwise hard, for example, joint degree distributions and, more generally, the limiting distribution of subgraphs in balls of any given radius $k$ around a random vertex in the preferential attachment graph. We also establish the finite-volume corrections which give the approach to the limit. △ Less

Submitted 13 January, 2014; originally announced January 2014.

Comments: Published in at http://dx.doi.org/10.1214/12-AOP755 the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOP-AOP755

Journal ref: Annals of Probability 2014, Vol. 42, No. 1, 1-40

arXiv:1302.4615 [pdf, other]

Convergent sequences of sparse graphs: A large deviations approach

Authors: Christian Borgs, Jennifer Chayes, David Gamarnik

Abstract: In this paper we introduce a new notion of convergence of sparse graphs which we call Large Deviations or LD-convergence and which is based on the theory of large deviations. The notion is introduced by "decorating" the nodes of the graph with random uniform i.i.d. weights and constructing random measures on $[0,1]$ and $[0,1]^2$ based on the decoration of nodes and edges. A graph sequence is defi… ▽ More In this paper we introduce a new notion of convergence of sparse graphs which we call Large Deviations or LD-convergence and which is based on the theory of large deviations. The notion is introduced by "decorating" the nodes of the graph with random uniform i.i.d. weights and constructing random measures on $[0,1]$ and $[0,1]^2$ based on the decoration of nodes and edges. A graph sequence is defined to be converging if the corresponding sequence of random measures satisfies the Large Deviations Principle with respect to the topology of weak convergence on bounded measures on $[0,1]^d, d=1,2$. We then establish that LD-convergence implies several previous notions of convergence, namely so-called right-convergence, left-convergence, and partition-convergence. The corresponding large deviation rate function can be interpreted as the limit object of the sparse graph sequence. In particular, we can express the limiting free energies in terms of this limit object. △ Less

Submitted 19 February, 2013; originally announced February 2013.

Comments: 37 pages, 4 figures

arXiv:1212.0884 [pdf, ps, other]

Maximizing Social Influence in Nearly Optimal Time

Authors: Christian Borgs, Michael Brautbar, Jennifer Chayes, Brendan Lucier

Abstract: Diffusion is a fundamental graph process, underpinning such phenomena as epidemic disease contagion and the spread of innovation by word-of-mouth. We address the algorithmic problem of finding a set of k initial seed nodes in a network so that the expected size of the resulting cascade is maximized, under the standard independent cascade model of network diffusion. Runtime is a primary considerati… ▽ More Diffusion is a fundamental graph process, underpinning such phenomena as epidemic disease contagion and the spread of innovation by word-of-mouth. We address the algorithmic problem of finding a set of k initial seed nodes in a network so that the expected size of the resulting cascade is maximized, under the standard independent cascade model of network diffusion. Runtime is a primary consideration for this problem due to the massive size of the relevant input networks. We provide a fast algorithm for the influence maximization problem, obtaining the near-optimal approximation factor of (1 - 1/e - epsilon), for any epsilon > 0, in time O((m+n)k log(n) / epsilon^2). Our algorithm is runtime-optimal (up to a logarithmic factor) and substantially improves upon the previously best-known algorithms which run in time Omega(mnk POLY(1/epsilon)). Furthermore, our algorithm can be modified to allow early termination: if it is terminated after O(beta(m+n)k log(n)) steps for some beta < 1 (which can depend on n), then it returns a solution with approximation factor O(beta). Finally, we show that this runtime is optimal (up to logarithmic factors) for any beta and fixed seed size k. △ Less

Submitted 21 June, 2016; v1 submitted 4 December, 2012; originally announced December 2012.

Comments: Previous title was "Influence Maximization in Social Networks: Towards an Optimal Algorithmic Solution". An extended abstract of this work appeared in SODA 2014

ACM Class: F.2.2; J.4

arXiv:1202.6033 [pdf, ps, other]

The Power of Local Information in Social Networks

Authors: Christian Borgs, Michael Brautbar, Jennifer Chayes, Sanjeev Khanna, Brendan Lucier

Abstract: We study the power of \textit{local information algorithms} for optimization problems on social networks. We focus on sequential algorithms for which the network topology is initially unknown and is revealed only within a local neighborhood of vertices that have been irrevocably added to the output set. The distinguishing feature of this setting is that locality is necessitated by constraints on t… ▽ More We study the power of \textit{local information algorithms} for optimization problems on social networks. We focus on sequential algorithms for which the network topology is initially unknown and is revealed only within a local neighborhood of vertices that have been irrevocably added to the output set. The distinguishing feature of this setting is that locality is necessitated by constraints on the network information visible to the algorithm, rather than being desirable for reasons of efficiency or parallelizability. In this sense, changes to the level of network visibility can have a significant impact on algorithm design. We study a range of problems under this model of algorithms with local information. We first consider the case in which the underlying graph is a preferential attachment network. We show that one can find the node of maximum degree in the network in a polylogarithmic number of steps, using an opportunistic algorithm that repeatedly queries the visible node of maximum degree. This addresses an open question of Bollob{á}s and Riordan. In contrast, local information algorithms require a linear number of queries to solve the problem on arbitrary networks. Motivated by problems faced by recruiters in online networks, we also consider network coverage problems such as finding a minimum dominating set. For this optimization problem we show that, if each node added to the output set reveals sufficient information about the set's neighborhood, then it is possible to design randomized algorithms for general networks that nearly match the best approximations possible even with full access to the graph structure. We show that this level of visibility is necessary. We conclude that a network provider's decision of how much structure to make visible to its users can have a significant effect on a user's ability to interact strategically with the network. △ Less

Submitted 13 October, 2013; v1 submitted 27 February, 2012; originally announced February 2012.

Comments: An extended abstract of this work appeared in the 8th Workshop on Internet & Network Economics (WINE 2012)

arXiv:1202.2771 [pdf, ps, other]

Multi-Scale Matrix Sampling and Sublinear-Time PageRank Computation

Authors: Christian Borgs, Michael Brautbar, Jennifer Chayes, Shang-Hua Teng

Abstract: A fundamental problem arising in many applications in Web science and social network analysis is, given an arbitrary approximation factor $c>1$, to output a set $S$ of nodes that with high probability contains all nodes of PageRank at least $Δ$, and no node of PageRank smaller than $Δ/c$. We call this problem {\sc SignificantPageRanks}. We develop a nearly optimal, local algorithm for the problem… ▽ More A fundamental problem arising in many applications in Web science and social network analysis is, given an arbitrary approximation factor $c>1$, to output a set $S$ of nodes that with high probability contains all nodes of PageRank at least $Δ$, and no node of PageRank smaller than $Δ/c$. We call this problem {\sc SignificantPageRanks}. We develop a nearly optimal, local algorithm for the problem with runtime complexity $\tilde{O}(n/Δ)$ on networks with $n$ nodes. We show that any algorithm for solving this problem must have runtime of $Ω(n/Δ)$, rendering our algorithm optimal up to logarithmic factors. Our algorithm comes with two main technical contributions. The first is a multi-scale sampling scheme for a basic matrix problem that could be of interest on its own. In the abstract matrix problem it is assumed that one can access an unknown {\em right-stochastic matrix} by querying its rows, where the cost of a query and the accuracy of the answers depend on a precision parameter $ε$. At a cost propositional to $1/ε$, the query will return a list of $O(1/ε)$ entries and their indices that provide an $ε$-precision approximation of the row. Our task is to find a set that contains all columns whose sum is at least $Δ$, and omits any column whose sum is less than $Δ/c$. Our multi-scale sampling scheme solves this problem with cost $\tilde{O}(n/Δ)$, while traditional sampling algorithms would take time $Θ((n/Δ)^2)$. Our second main technical contribution is a new local algorithm for approximating personalized PageRank, which is more robust than the earlier ones developed in \cite{JehW03,AndersenCL06} and is highly efficient particularly for networks with large in-degrees or out-degrees. Together with our multiscale sampling scheme we are able to optimally solve the {\sc SignificantPageRanks} problem. △ Less

Submitted 28 May, 2013; v1 submitted 13 February, 2012; originally announced February 2012.

Comments: Accepted to Internet Mathematics journal for publication. An extended abstract of this paper appeared in WAW 2012 under the title "A Sublinear Time Algorithm for PageRank Computations"

arXiv:1201.4899 [pdf, ps, other]

Finding Endogenously Formed Communities

Authors: Maria-Florina Balcan, Christian Borgs, Mark Braverman, Jennifer Chayes, Shang-Hua Teng

Abstract: A central problem in e-commerce is determining overlapping communities among individuals or objects in the absence of external identification or tagging. We address this problem by introducing a framework that captures the notion of communities or clusters determined by the relative affinities among their members. To this end we define what we call an affinity system, which is a set of elements, e… ▽ More A central problem in e-commerce is determining overlapping communities among individuals or objects in the absence of external identification or tagging. We address this problem by introducing a framework that captures the notion of communities or clusters determined by the relative affinities among their members. To this end we define what we call an affinity system, which is a set of elements, each with a vector characterizing its preference for all other elements in the set. We define a natural notion of (potentially overlapping) communities in an affinity system, in which the members of a given community collectively prefer each other to anyone else outside the community. Thus these communities are endogenously formed in the affinity system and are "self-determined" or "self-certified" by its members. We provide a tight polynomial bound on the number of self-determined communities as a function of the robustness of the community. We present a polynomial-time algorithm for enumerating these communities. Moreover, we obtain a local algorithm with a strong stochastic performance guarantee that can find a community in time nearly linear in the of size the community. Social networks fit particularly naturally within the affinity system framework -- if we can appropriately extract the affinities from the relatively sparse yet rich information from social networks, our analysis then yields a set of efficient algorithms for enumerating self-determined communities in social networks. In the context of social networks we also connect our analysis with results about $(α,β)$-clusters introduced by Mishra, Schreiber, Stanton, and Tarjan \cite{msst}. In contrast with the polynomial bound we prove on the number of communities in the affinity system model, we show that there exists a family of networks with superpolynomial number of $(α,β)$-clusters. △ Less

Submitted 29 February, 2012; v1 submitted 23 January, 2012; originally announced January 2012.

Comments: 22 pages

ACM Class: F.2.0; J.4

arXiv:1101.4573 [pdf, other]

doi 10.1073/pnas.1004751108

Finding undetected protein associations in cell signaling by belief propagation

Authors: M. Bailly-Bechet, C. Borgs, A. Braunstein, J. Chayes, A. Dagkessamanskaia, J. -M. François, R. Zecchina

Abstract: External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-prote… ▽ More External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-protein interaction network and mRNA expression profiles. This inference problem can be mapped onto the problem of finding appropriate optimal connected subgraphs of a network defined by these datasets. The optimization procedure turns out to be computationally intractable in general. Here we present a new distributed algorithm for this task, inspired from statistical physics, and apply this scheme to alpha factor and drug perturbations data in yeast. We identify the role of the COS8 protein, a member of a gene family of previously unknown function, and validate the results by genetic experiments. The algorithm we present is specially suited for very large datasets, can run in parallel, and can be adapted to other problems in systems biology. On renowned benchmarks it outperforms other algorithms in the field. △ Less

Submitted 24 January, 2011; originally announced January 2011.

Comments: 6 pages, 3 figures, 1 table, Supporting Information

Journal ref: Published online before print December 27, 2010, doi: 10.1073/pnas.1004751108 PNAS January 11, 2011 vol. 108 no. 2 882-887

arXiv:1011.3058 [pdf, ps, other]

Tight Bounds for Mixing of the Swendsen-Wang Algorithm at the Potts Transition Point

Authors: Christian Borgs, Jennifer T. Chayes, Prasad Tetali

Abstract: We study two widely used algorithms for the Potts model on rectangular subsets of the hypercubic lattice Z^d - heat bath dynamics and the Swendsen-Wang algorithm - and prove that, under certain circumstances, the mixing in these algorithms is torpid or slow. In particular, we show that for heat bath dynamics throughout the region of phase coexistence, and for the Swendsen-Wang algorithm at the tra… ▽ More We study two widely used algorithms for the Potts model on rectangular subsets of the hypercubic lattice Z^d - heat bath dynamics and the Swendsen-Wang algorithm - and prove that, under certain circumstances, the mixing in these algorithms is torpid or slow. In particular, we show that for heat bath dynamics throughout the region of phase coexistence, and for the Swendsen-Wang algorithm at the transition point, the mixing time in a box of side length L with periodic boundary conditions has upper and lower bounds which are exponential in L^{d-1}. This work provides the first upper bound of this form for the Swendsen-Wang algorithm, and gives lower bounds for both algorithms which significantly improve the previous lower bounds that were exponential in L/(log L)^2. △ Less

Submitted 12 November, 2010; originally announced November 2010.

Comments: 45 pages

arXiv:1008.1516 [pdf, other]

The Hitchhiker's Guide to Affiliation Networks: A Game-Theoretic Approach

Authors: Christian Borgs, Jennifer Chayes, Jian Ding, Brendan Lucier

Abstract: We propose a new class of game-theoretic models for network formation in which strategies are not directly related to edge choices, but instead correspond more generally to the exertion of social effort. The observed social network is thus a byproduct of an expressive strategic interaction, which can more naturally explain the emergence of complex social structures. Within this framework, we prese… ▽ More We propose a new class of game-theoretic models for network formation in which strategies are not directly related to edge choices, but instead correspond more generally to the exertion of social effort. The observed social network is thus a byproduct of an expressive strategic interaction, which can more naturally explain the emergence of complex social structures. Within this framework, we present a natural network formation game in which agent utilities are locally defined and that, despite its simplicity, produces a rich class of equilibria that exhibit structural properties commonly observed in social networks - such as triadic closure - that have proved elusive in most existing models. Specifically, we consider a game in which players organize networking events at a cost that grows with the number of attendees. An event's cost is assumed by the organizer but the benefit accrues equally to all attendees: a link is formed between any two players who see each other at more than a certain number r of events per time period. The graph of connections so obtained is the social network of the model. We analyze the Nash equilibria of this game when each player derives a benefit a>0 from all her neighbors in the network and when the costs are linear, i.e., when the cost of an event with L invitees is b+cL, with b>0 and c>0. For a/cr > 1 and b sufficiently small, all Nash equilibria have the complete graph as their social network; for a/cr < 1 the Nash equilibria correspond to a rich class of social networks, all of which have substantial clustering in the sense that the clustering coefficient is bounded below by the inverse of the average degree. Additionally, for any degree sequence with finite mean, and not too many vertices of degree one or two, we can construct a Nash equilibrium producing a social network with the given degree sequence. △ Less

Submitted 9 August, 2010; originally announced August 2010.

Comments: 15 pages; 3 figures

arXiv:1004.2079 [pdf, ps, other]

Bargaining dynamics in exchange networks

Authors: Mohsen Bayati, Christian Borgs, Jennifer Chayes, Yashodhan Kanoria, Andrea Montanari

Abstract: We consider a one-sided assignment market or exchange network with transferable utility and propose a model for the dynamics of bargaining in such a market. Our dynamical model is local, involving iterative updates of 'offers' based on estimated best alternative matches, in the spirit of pairwise Nash bargaining. We establish that when a balanced outcome (a generalization of the pairwise Nash barg… ▽ More We consider a one-sided assignment market or exchange network with transferable utility and propose a model for the dynamics of bargaining in such a market. Our dynamical model is local, involving iterative updates of 'offers' based on estimated best alternative matches, in the spirit of pairwise Nash bargaining. We establish that when a balanced outcome (a generalization of the pairwise Nash bargaining solution to networks) exists, our dynamics converges rapidly to such an outcome. We extend our results to the cases of (i) general agent 'capacity constraints', i.e., an agent may be allowed to participate in multiple matches, and (ii) 'unequal bargaining powers' (where we also find a surprising change in rate of convergence). △ Less

Submitted 6 December, 2011; v1 submitted 12 April, 2010; originally announced April 2010.

Comments: 47 pages, SODA 2011, invited to Journal of Economic Theory

Journal ref: Proc. ACM-SIAM Symp. on Discrete Algorithms (2011) 1518-1537

arXiv:1002.0115 [pdf, ps, other]

Left and right convergence of graphs with bounded degree

Authors: Christian Borgs, Jennifer Chayes, Jeff Kahn, László Lovász

Abstract: The theory of convergent graph sequences has been worked out in two extreme cases, dense graphs and bounded degree graphs. One can define convergence in terms of counting homomorphisms from fixed graphs into members of the sequence (left-convergence), or counting homomorphisms into fixed graphs (right-convergence). Under appropriate conditions, these two ways of defining convergence was proved t… ▽ More The theory of convergent graph sequences has been worked out in two extreme cases, dense graphs and bounded degree graphs. One can define convergence in terms of counting homomorphisms from fixed graphs into members of the sequence (left-convergence), or counting homomorphisms into fixed graphs (right-convergence). Under appropriate conditions, these two ways of defining convergence was proved to be equivalent in the dense case by Borgs, Chayes, Lovász, Sós and Vesztergombi. In this paper a similar equivalence is established in the bounded degree case. In terms of statistical physics, the implication that left convergence implies right convergence means that for a left-convergent sequence, partition functions of a large class of statistical physics models converge. The proof relies on techniques from statistical physics, like cluster expansion and Dobrushin Uniqueness. △ Less

Submitted 31 January, 2010; originally announced February 2010.

Comments: 26 pages

MSC Class: 05C60; 60F99

arXiv:0911.1767 [pdf, ps, other]

A Natural Dynamics for Bargaining on Exchange Networks

Authors: Yashodhan Kanoria, Mohsen Bayati, Christian Borgs, Jennifer Chayes, Andrea Montanari

Abstract: Bargaining networks model the behavior of a set of players that need to reach pairwise agreements for making profits. Nash bargaining solutions are special outcomes of such games that are both stable and balanced. Kleinberg and Tardos proved a sharp algorithmic characterization of such outcomes, but left open the problem of how the actual bargaining process converges to them. A partial answer was… ▽ More Bargaining networks model the behavior of a set of players that need to reach pairwise agreements for making profits. Nash bargaining solutions are special outcomes of such games that are both stable and balanced. Kleinberg and Tardos proved a sharp algorithmic characterization of such outcomes, but left open the problem of how the actual bargaining process converges to them. A partial answer was provided by Azar et al. who proposed a distributed algorithm for constructing Nash bargaining solutions, but without polynomial bounds on its convergence rate. In this paper, we introduce a simple and natural model for this process, and study its convergence rate to Nash bargaining solutions. At each time step, each player proposes a deal to each of her neighbors. The proposal consists of a share of the potential profit in case of agreement. The share is chosen to be balanced in Nash's sense as far as this is feasible (with respect to the current best alternatives for both players). We prove that, whenever the Nash bargaining solution is unique (and satisfies a positive gap condition) this dynamics converges to it in polynomial time. Our analysis is based on an approximate decoupling phenomenon between the dynamics on different substructures of the network. This approach may be of general interest for the analysis of local algorithms on networks. △ Less

Submitted 7 May, 2010; v1 submitted 9 November, 2009; originally announced November 2009.

Comments: 28 pages, 1 figure, second part of this work with different analysis and stronger results available at arXiv:1004.2079v1 [cs.GT]. This paper unchanged in update.

arXiv:0905.3806 [pdf, ps, other]

Limits of randomly grown graph sequences

Authors: C. Borgs, J. Chayes, L. Lovász, V. T. Sós, K. Vesztergombi

Abstract: Motivated in part by various sequences of graphs growing under random rules (like internet models), convergent sequences of dense graphs and their limits were introduced by Borgs, Chayes, Lovász, Sós and Vesztergombi and by Lovász and Szegedy. In this paper we use this framework to study one of the motivating class of examples, namely randomly growing graphs. We prove the (almost sure) convergen… ▽ More Motivated in part by various sequences of graphs growing under random rules (like internet models), convergent sequences of dense graphs and their limits were introduced by Borgs, Chayes, Lovász, Sós and Vesztergombi and by Lovász and Szegedy. In this paper we use this framework to study one of the motivating class of examples, namely randomly growing graphs. We prove the (almost sure) convergence of several such randomly growing graph sequences, and determine their limit. The analysis is not always straightforward: in some cases the cut distance from a limit object can be directly estimated, in other case densities of subgraphs can be shown to converge. △ Less

Submitted 23 May, 2009; originally announced May 2009.

Comments: 17 pages, 8 figures

MSC Class: 05C80

arXiv:0807.3373 [pdf, ps, other]

doi 10.1103/PhysRevLett.101.037208

Statistical Mechanics of Steiner trees

Authors: M. Bayati, C. Borgs, A. Braunstein, J. Chayes, A. Ramezanpour, R. Zecchina

Abstract: The Minimum Weight Steiner Tree (MST) is an important combinatorial optimization problem over networks that has applications in a wide range of fields. Here we discuss a general technique to translate the imposed global connectivity constrain into many local ones that can be analyzed with cavity equation techniques. This approach leads to a new optimization algorithm for MST and allows to analyz… ▽ More The Minimum Weight Steiner Tree (MST) is an important combinatorial optimization problem over networks that has applications in a wide range of fields. Here we discuss a general technique to translate the imposed global connectivity constrain into many local ones that can be analyzed with cavity equation techniques. This approach leads to a new optimization algorithm for MST and allows to analyze the statistical mechanics properties of MST on random graphs of various types. △ Less

Submitted 21 July, 2008; originally announced July 2008.

Journal ref: Phys. Rev. Lett. 101, 037208 (2008)

Showing 1–50 of 74 results for author: Borgs, C