- Split View
-
Views
-
Cite
Cite
Amin Coja-Oghlan, Dan Vilenchik, The Chromatic Number of Random Graphs for Most Average Degrees, International Mathematics Research Notices, Volume 2016, Issue 19, 2016, Pages 5801–5859, https://doi.org/10.1093/imrn/rnv333
- Share Icon Share
Abstract
For a fixed number |$d>0$| and |$n$| large, let |$G(n,d/n)$| be the random graph on |$n$| vertices in which any two vertices are connected with probability |$d/n$| independently. The problem of determining the chromatic number of |$G(n,d/n)$| goes back to the famous 1960 article of Erdös and Rényi that started the theory of random graphs [Magayar Tud. Akad. Mat. Kutato Int. Kozl. 5 (1960) 17–61]. Progress culminated in the landmark paper of Achlioptas and Naor [Ann. Math. 162 (2005) 1333–1349], in which they calculate the chromatic number precisely for all |$d$| in a set |$S\subset (0,\infty )$| of asymptotic density |$\lim _{z\rightarrow \infty }\frac 1z\int _0^z\textbf {1}_S=\frac {1}{2}$|, and up to an additive error of one for the remaining |$d$|. Here we obtain a near-complete answer by determining the chromatic number of |$G(n,d/n)$| for all |$d$| in a set of asymptotic density 1.
1 Introduction
Let |$G(n,p)$| denote the random graph on the vertex set |$V=\left \{{1,\ldots ,n}\right \}$| in which any two vertices are connected with probability |$p\in [0,1]$| independently, known as the Erdös–Rényi model. (Actually this model was introduced by Gilbert [24]. In their seminal paper Erdös and Rényi consider a random graph |$G(n,m)$| in which the number of edges is a fixed integer |$m$| [21]. However, with
1.1 Background and main results
The theory of random graphs was born with the famous 1960 article by Erdös and Rényi [21], and has grown since into a substantial area of research with hundreds, perhaps thousands of contributions dealing with the |$G(n,p)$| model alone. In their paper, Erdös and Rényi showed that the random graph |$G(n,p)$| undergoes a percolation phase transition at |$p=1/n$|, and phase transitions have been the guiding theme of the theory ever since. In addition, Erdös and Rényi set the agenda for future research by posing a number of intriguing questions, all of which have been answered over the years except for one: for a given |$d>0$|, what is the typical chromatic number of |$G(n,d/n)$|?
It is widely conjectured that for any number |$k\geq 3$| of colors there occurs a phase transition for |$k$|-colorability. That is, there exists a number |$d_{k-{\rm col}}$| such that |$G(n,d/n)$| is |$k$|-colorable w.h.p. if |$d<d_{k-{\rm col}}$|, whereas the random graph fails to be |$k$|-colorable w.h.p. if |$d>d_{k-{\rm col}}$|. If true, this would imply that the likely value of the chromatic number, viewed as a function of |$d$|, is a step function that takes the value |$k$| on the interval |$d_{(k-1)-{\rm col}}<d<d_{k-{\rm col}}$|.
Toward this conjecture, Achlioptas and Friedgut [2] proved that for any fixed |$k\geq 3$| there exists a sharp threshold sequence|$d_{k-{\rm col}}(n)$|. This sequence is such that for any |$\varepsilon >0$|,
if |$p<(1-\varepsilon )d_{k-{\rm col}}(n)/n$|, then |$G(n,p)$| is |$k$|-colorable with probability tending to 1 as |$n\rightarrow \infty $|.
if |$p>(1+\varepsilon )d_{k-{\rm col}}(n)/n$|, then |$G(n,p)$| fails to be |$k$|-colorable with probability tending to 1 as |$n\rightarrow \infty $|.
Whether the sequence |$d_{k-{\rm col}}(n)$| converges to an actual “uniform” threshold |$d_{k-{\rm col}}$| is a well-known open problem.
The bounds (1.1) and (1.3) yield an estimate of the chromatic number of |$G(n,d/n)$|. Namely (1.1) implies that for |$d<d_{k,{\rm AN}}$|, the random graph |$G(n,d/n)$| is |$k$|-colorable w.h.p. Moreover, (1.3) shows that for |$d>d_{k-1,{\rm first}}$|, |$G(n,d/n)$| fails to be |$k-1$|-colorable w.h.p. Consequently, for all |$d$| in the interval |$(d_{k-1,{\rm first}}',d_{k,{\rm AN}})$| of length about |$\ln k$|, the chromatic number of |$G(n,d/n)$| is precisely |$k$| w.h.p. However, for all |$d$| in the subsequent interval |$(d_{k,{\rm AN}},d_{k,{\rm first}}')$| of length about |$\ln k$|, (1.1) and (1.3) only imply that the chromatic number is either |$k$| or |$k+1$| w.h.p. Thus, (1.1) and (1.3) yield the typical value of |$\chi (G(n,d/n))$| precisely for “about half” of all |$d$|. Formally, let us say that a (measurable) set |$A\subset \mathbb {R}_{\geq 0}$| has asymptotic density |$\alpha $| if |$\lim _{z\rightarrow \infty }\frac 1z\int _{0}^z\textbf {1}_A=\alpha $|, where |$\textbf {1}_A$| is the indicator of |$A$|. Then the set on which (1.1), (1.3) determine |$\chi (G(n,d/n))$| has asymptotic density |$1/2$| [6, Theorem 2].
Theorem 1.1 enables us to pin the chromatic number down precisely on a set of asymptotic density 1, thereby obtaining a near-complete answer to the question of Erdös and Rényi. More precisely, (1.2) and (1.4) imply the following.
Of course, the constants |$0.99$| and |$1.38$| in the definition of |$S_k$| can be replaced by any numbers less than one and |$2\ln 2$|, respectively. Theorem 1.2 also answers a question of Alon and Krivelevich [8] whether the chromatic number of |$G(n,d/n)$| is concentrated on a single integer for most |$d$| “in an appropriately defined sense”. (A proof that the threshold sequence |$d_{k-{\rm col}}(n)$| converges would imply a one-point concentration result for the chromatic number outside a countable set of average degrees. However, the known result [2] does not. Alon and Krivelevich [8] were concerned also with the case that the average degree |$d$| is a growing function of |$n$|. In this paper, we deal with |$d$| fixed as |$n\rightarrow \infty $|, the original setting considered by Erdös and Rényi.)
Additionally, the cavity method yields predictions on the combinatorial nature of the problem, particularly on the geometry of the set of |$k$|-colorings of the random graph. The proof of Theorem 1.1 is based on a “physics-enhanced” second moment argument that exploits this geometrical intuition. In fact, the physics intuition is one of two key ingredients that enable us to improve over the approach of Achlioptas and Naor [6]. The second one is a novel approach, based on a local variations argument, to the analytical challenge of optimizing a certain (non-convex) function over the Birkhoff polytope. Neither of these ideas seem to depend on particular features of the graph coloring problem, and thus we expect that they will prove vital to tackle a variety of further related problems.
An outline of our physics-enhanced second moment argument follows in Section 2. In addition, in Section 2.5, we will see that the density |$d_{k,{\rm cond}}$| in (1.4) matches the condensation or Kauzmann phase transition predicted by physicists. This implies that the bound obtained in Theorem 1.1 is the best possible one that can be obtained via a second moment-type argument over a certain class of natural random variables (see Section 2.5 for details).
1.2 Related work
As witnessed by the notorious “four color problem” first posed by Guthrie in 1852, solved controversially by Appel and Haken in 1976 [9], and re-solved by Robertson et al. [40], the graph coloring problem has been a central subject in (discrete) mathematics for well over a century. Thus, it is unsurprising that the chromatic number problem on |$G(n,p)$| has received a big deal of attention since it was posed by Erdös and Rényi. Indeed, the problem has inspired the development of techniques that are by now widely used in various areas of mathematics, computer science, physics, and other disciplines.
For instance, pioneering the use of martingale tail bounds, Shamir and Spencer [41] proved concentration bounds for the chromatic number of |$G(n,p)$|. Their result was enhanced first by Łuczak [33] and then by Alon and Krivelevich [8], who used the Lovász local lemma to prove that the chromatic number of |$G(n,p)$| is concentrated on two consecutive integers if |$p\ll n^{-1/2}$|. In a breakthrough contribution, Bollobás [11] determined the asymptotics of the chromatic number of dense random graphs (i.e., |$G(n,p)$| with |$p>n^{-1/3}$|). This result improved prior work by Matula [34], whose “merge-and-exposure” technique Łuczak built upon to obtain a similar result for sparser random graphs [32]. However, in the case that |$p=d/n$| for a fixed real |$d>0$|, the setting originally studied by Erdös and Rényi, Łuczak's formula is far less precise than (1.1)–(1.2). For a comprehensive literature overview, see [12, 26].
The work of Achlioptas and Naor [6], which gave best prior result on the chromatic number of |$G(n,d/n)$|, is based on the second moment method. Its use in the context of phase transitions in random discrete structures was pioneered by Achlioptas and Moore [5] and Frieze and Wormald [23]. The techniques of [6] have been used to prove several further important results. For instance, Achlioptas and Moore [4] identified three (and for some |$d$| just two) consecutive integers on which the chromatic number of the random |$d$|-regular is concentrated. This was reduced to two integers for all fixed of |$d$| (and one for about half of all |$d$|) by adding in the small subgraph conditioning technique [27]. Recently, the methods developed in this work have been harnessed to improve this result further still [15]. Moreover, Dyer et al. [20] extended the second moment argument from [6] to the problem of |$k$|-coloring |$h$|-uniform random hypergraphs. We expect that our approach can be used to obtain improved results in the hypergraph case. Similarly, it should be possible to improve results of Dani et al. [19] on a “decorated” coloring problem.
In several problems, sophisticated applications of the second moment method gave bounds very close to the predictions made by the physicists' cavity method [35]. Examples where the physics predictions have (largely) been verified rigorously in this way include the hypergraph two-coloring problem [16, 18] and the random |$k$|-SAT problem [17]. But thus far, a general limitation of the rigorous proof techniques has been that they only apply to binary problems where there are only two values available for each variable. By contrast, in random graph coloring, each variable (vertex) has |$k$| values (colors) to choose from, where |$k$| can be arbitrarily large. As we will see in Section 2, the large number of available values complicates the problem dramatically. In effect, random graph coloring remained the last among the intensely studied benchmark problems in which there remained a very substantial gap between the physics predictions and the rigorous results, a situation rectified by this paper. Thus, we view this paper as an important step toward the long-term goal of providing a mathematical foundation for the cavity method.
In computer science, the algorithmic problem of finding a |$k$|-coloring of |$G(n,p)$| in polynomial time is a long-standing challenge, mentioned prominently in several influential survey articles [22, 28]. Simple greedy algorithms find a |$k$|-coloring for |$d\leq k\ln k\sim \frac 12d_{k-{\rm col}}$| w.h.p. [3, 25, 29], about half the |$k$|-colorability threshold. However, no efficient algorithm is known to beat the, in the words of Shamir and Spencer [41], “most vexing” factor of two. In fact, it has been suggested changes in the geometry of the set of |$k$|-colorings that occur at |$d\sim \frac 12d_{k-{\rm col}}$| cause the demise of local-search-based algorithms [1, 37]. Interestingly, some of the very phenomena that seem to make the algorithmic problem of coloring |$G(n,p)$| difficult will turn out to be extremely helpful in the construction of our random variable and thus in the proof of Theorem 1.1.
1.3 Notation and preliminaries
In addition to |$G(n,p)$|, we consider the |$G(n,m)$| model, which is a random graph with vertex set |$V=\left \{{1,\ldots ,n}\right \}$| and exactly |$m$| edges, chosen uniformly at random among all such graphs. Working with |$G(n,m)$| facilitates the second moment argument, because the total number of edges is a deterministic quantity. Nonetheless, Lemma 2.1 shows that any results for |$G(n,m)$| with |$m=\lceil dn/2\rceil $| extend to |$G(n,d/n)$|. Thus, throughout the paper, we always set |$m=\lceil dn/2\rceil $|.
Since our goal is to establish a statement that holds with probability tending to 1 as |$n\rightarrow \infty $|, we are always going to assume tacitly that the number |$n$| of vertices is sufficiently large for the various estimates to hold. Similarly, at the expense of the error term |$o_k(1)$| in Theorem 1.1, we will tacitly assume that |$k\geq k_0$| for a large enough constant |$k_0$|.
We use the standard |$O$|-notation to refer to the limit |$n\rightarrow \infty $|. Thus, |$f(n)=O(g(n))$| means that there exist |$C>0$|, |$n_0>0$| such that for all |$n>n_0$| we have |$|f(n)|\leq C\cdot |g(n)|$|. In addition, we use the standard symbols |$o(\cdot ),\Omega (\cdot ),\Theta (\cdot )$|. In particular, |$o(1)$| stands for a term that tends to 0 as |$n\rightarrow \infty $|. Furthermore, we write |$f(n)\sim g(n)$| if |$\lim _{n\rightarrow \infty }f(n)/g(n)=1$|.
Additionally, we use asymptotic notation in the limit of large |$k$|. To make this explicit, we insert |$k$| as an index. Thus, |$f(k)=O_k(g(k))$| means that there exist |$C>0$|, |$k_0>0$| such that for all |$k>k_0$| we have |$|f(k)|\leq C\cdot |g(k)|$|. Further, we write |$f(k)=\tilde O_k(g(k))$| to indicate that there exist |$C>0$|, |$k_0>0$| such that for all |$k>k_0$| we have |$|f(k)|\leq (\ln k)^C\cdot |g(k)|$|.
If |$G$| is a graph |$v$| is a vertex of |$G$|, then we denote by |$N_G(v)$| the neighborhood of |$v$| in |$G$|, that is, the set of all vertices |$w$| that are connected to |$v$| by an edge of |$G$|. Where the graph |$G$| is apparent from the context, we just write |$N(v)$|. If |$s\geq 1$| is an integer, we write |$[s]$| for the set |$\left \{{1,2,\ldots ,s}\right \}$|. Moreover, throughout the paper, we use the conventions that |$0\ln 0=0$| and (consistently) that |$0\ln \frac 00=0$|.
2 Outline
In this section, we first discuss the second moment method in general and the argument pursued in [6] specifically and investigate why it breaks down beyond the density |$d_{k,{\rm AN}}$| from (1.1). Then, we see how the physics intuition can be harnessed to overcome this barrier. Finally, we comment on the condensation phase transition.
2.1 The second moment method
If |$d>0$| is such that |$\liminf _{n\rightarrow \infty }{\rm P}\left [{G(n,m)\hbox { is} k\hbox {-colorable}}\right ]>0$|, then |$\liminf _{n\rightarrow \infty }d_{k-{\rm col}}(n)\geq d.$|
Thus, in order to obtain a lower bound on |$d_{k-{\rm col}}$|, we need to define an appropriate random variable |$Z$| and verify (2.1). Both these steps turn out to be non-trivial.
2.2 Balanced colorings and the Birkhoff polytope
The most obvious choice of random variable seems to be the total number |$Z_{k}$| of |$k$|-colorings of |$G(n,m)$|. But to simplify the calculations, we confine ourselves to a particular type of colorings. Namely, a map |$\sigma :[n]\rightarrow [k]$| is balanced if |$||\sigma ^{-1}(i)|-\frac nk|\leq \sqrt n$| for |$i=1,\ldots ,k$|. Let |$\mathcal {B}=\mathcal {B}_{n,k}$| denote the set of all balanced maps. Moreover, let |$Z_{k,{\rm bal}}$| be the number of balanced |$k$|-colorings of |$G(n,m)$|. This is the random variable that Achlioptas and Naor [6] use. As it happens, (2.1) does not hold for either |$Z_{k}$| or |$Z_{k,{\rm bal}}$| in the entire range |$0<d<d_{k,{\rm cond}}$|. We need to understand why.
While in binary problems the relevant overlap parameter is just a 1-dimensional (e.g., in random |$k$|-SAT, the Hamming distance of two truth assignments), what makes the |$k$|-colorability problem so difficult is the need to work with the high-dimensional overlap matrix.
In summary, following [6], we have transformed the calculation of the second moment into the problem of optimizing |$f$| over the Birkhoff polytope |$\mathcal D$|. Let |$\bar \rho $| be the matrix with all entries equal to |$\frac 1k$|, the barycenter of |$\mathcal D$|. A glimpse at (2.3) reveals that |$f(\bar \rho )\sim \frac 2n\ln {\rm E}\left [{Z_{k,{\rm bal}}}\right ]$| corresponds to the square of the first moment. Therefore, a necessary condition for the success of the second moment method is that the maximum (2.11) is attained at |$\bar \rho $|. Indeed, if |$f(\rho )>f(\bar \rho )$| for some |$\rho \in \mathcal D$|, then |${\rm E}[Z_{k,{\rm bal}}^2]$| exceeds |${\rm E}[Z_{k,{\rm bal}}]^2$| by an exponential factor |$\exp (\Omega (n))$|. It is not difficult to show that this necessary condition is also sufficient. Combinatorially, the condition that |$\bar \rho $| is the maximizer of |$f$| indicates that pairs |$\sigma ,\tau $| that, judging by their overlap, look completely uncorrelated make up the lion's share of |${\rm E}[Z_{k,{\rm bal}}^2]$|.
2.3 The singly stochastic bound
However, for |$d> d_{k,{\rm AN}}$|, the maximum (2.12) is attained elsewhere. For instance, the matrix |$\rho _{{\rm half}}$| whose first |$k/2$| rows coincide with those of the identity matrix |${\rm id}$| (with ones on the diagonal and zeros elsewhere) and whose last |$k/2$| rows have all entries equal to |$1/k$| yields a larger function value than |$\bar \rho $| for |$d>d_{k,{\rm AN}}+o_k(1)$|. Of course, this matrix fails to be doubly stochastic.
2.4 A physics-enhanced random variable
Therefore, to prove Theorem 1.1, we need to work with a different random variable. The key observation behind its definition is that the second moment (2.11) is driven up by certain “wild” |$k$|-colorings |$\sigma $|. Their number behaves like a lottery: while the random graph typically has no wild coloring, a tiny fraction of graphs have an abundance, boosting the second moment. To avoid this heavy-tailed random variable, we define a notion of “tame” colorings. This induces a decomposition |$Z_{k,{\rm bal}}=Z_{k,{\rm tame}}+Z_{k,{\rm wild}}$| such that |${\rm E}\left [{Z_{k,{\rm tame}}}\right ]\sim {\rm E}\left [{Z_{k,{\rm bal}}}\right ]$|. The second moment bound (2.1) turns out to hold for |$Z_{k,{\rm tame}}$| if |$d\leq d_{k,{\rm cond}}-o_k(1)$|.
Let |$G$| be a graph with |$n$| vertices and |$m$| edges. A |$k$|-coloring |$\sigma $| of |$G$| is tame if
T1: |$\sigma $| is balanced,
T2: |$\sigma $| is separable, and
T3: |$\left |{\mathcal {C}(\sigma )}\right |\leq {\rm E}\left [{Z_{k,{\rm bal}}(G(n,m))}\right ]$|.
The choice of the numbers |$0.51$| and |$\kappa =1-\ln ^{20}k/k$| is not entirely arbitrary. The |$0.51$| is motivated by the fact that the function |$f$| may have a stationary point if an entry of |$\rho $| is very close to |$1/2$| (cf. the proof of Lemma 4.13). Moreover, two colorings in a typical cluster coincide on about a |$1-O_k(1/k)$| fraction of the vertices and sacrificing a few log factors allows for a quick and easy bound on the cluster size (cf. Lemma 3.2).
In Section 3, we show that a typical |$k$|-coloring of |$G(n,m)$| is indeed tame, which implies that the expected number of tame |$k$|-colorings satisfies the following.
Thus, going from balanced to tame colorings has no discernible effect on the first moment, which remains exponentially large in |$n$| up to at least |$d= d_{k,{\rm cond}}-\varepsilon _k$|.
Thus, in a sense the proof strategy that we pursue is the opposite of the one from [6]. While Achlioptas and Naor relax the optimization problem (by working with a rather significantly larger domain: singly rather than doubly stochastic matrices), here we restrict the domain by imposing further physics-inspired constraints. This approach, carried out in Section 4, yields
Let |$\varepsilon _k$| be the sequence from Proposition 2.5. For any sufficiently large |$k,$| there is a number |$C(k)>0$| such that for |$d=d_{k,{\rm cond}}-\varepsilon _k$| we have |$0<{\rm E}[Z_{k,{\rm tame}}^2]\leq C(k)\cdot {\rm E}[Z_{k,{\rm tame}}^2]$|.
The proof of Proposition 2.6 essentially comes down to showing that the maximum (2.16) is attained at |$\bar \rho $|. Even though we work with the reduced domain |$\mathcal D_{{\rm tame}}$|, this is anything but straightforward. Indeed, to solve this analytical problem, we develop a novel local variations argument based on properties of the entropy function (among other things). We expect that this argument will prove useful to tackle many related optimization problems that come up in second moment arguments.
Finally, Theorem 1.1 is an immediate consequence of Proposition s 2.5 and 2.6 combined with Lemma 2.1.
2.5 The condensation phase transition
Finally, what would it take to close the (small) remaining gap between the new lower bound (1.4) on |$d_{k-{\rm col}}$| and the upper bound (1.3)? According to the physicists' cavity method, this gap is due to a further phase transition, the so-called condensation or Kauzmann transition, that occurs at |$d_{k,{\rm cond}}+o_k(1)$|, that is, the lower bound established in Theorem 1.1. In fact, the existence and precise location of this phase transition (including the term hidden in the |$o_k(1)$|) can be established rigorously via the methods developed in this paper. The details have been carried out in full in a follow-up article [10].
According to the cavity method [30], the geometry of the set of |$k$|-colorings changes significantly at |$d_{k,{\rm cond}}$|. More precisely, for |$d<d_{k,{\rm cond}}-o_k(1),$| the set of |$k$|-colorings decomposes into clusters that each contain only an exponentially small fraction of all |$k$|-colorings of |$G(n,d/n)$| w.h.p. By contrast, for |$d>d_{k,{\rm cond}}+o_k(1)$|, the size of the largest cluster is conjectured to contain a constant fraction of all |$k$|-colorings. As a result, two random |$k$|-colorings are heavily correlated, as there is a non-vanishing probability that they belong to the same cluster. This explains intuitively why the condensation threshold poses an obstacle to the second moment method, as we saw that a necessary condition for the success of the second moment method is that random pairs of |$k$|-colorings decorrelate.
A refined version of the cavity method, the so-called 1-step replica symmetry breaking (“1RSB”) ansatz [30, 31, 39, 42], yields a precise prediction as to the value of |$d_{k-{\rm col}}=\lim _{n\rightarrow \infty }d_{k-{\rm col}}(n)$| (of course, the existence of the limit is taken for granted in the physics work). However, this prediction is not explicit; for instance, it involves the solution to a seriously complicated fixed point problem on the set of probability distributions on the |$k+1$|-simplex. Yet it is possible to obtain an expansion in the limit of large |$k$|, according to which |$d_{k-{\rm col}}=2k\ln k-\ln k-1+o_k(1)$|. Proving the 1RSB prediction for |$d_{k-{\rm col}}$| remains an open problem. In a very few binary problems, asymptotic versions of the 1RSB prediction have been proved rigorously [16]. However, it seems anything but straightforward to extend these arguments to the random graph coloring problem. That said, we expect that any attempt at determining |$d_{k-{\rm col}}$| precisely would have to build upon the insights gained in this paper and very possibly its techniques.
3 The First Moment
Throughout this section, we keep the assumptions of Proposition 2.5 and the notation introduced in Section 2.
Lemma 3.1 is the key step toward proving Proposition 2.5.
In fact, once we have Lemma 3.1, Proposition 2.5 readily follows from the linearity of expectation, Bayes' formula and the formula (2.3) for |${\rm E}[Z_{k,{\rm bal}}]$|.
To establish Lemma 3.1, we denote by |$G(n,m,\sigma )$| the random graph |$G(n,m)$| conditional on the event that |$\sigma \in \mathcal {B}$| is a |$k$|-coloring. Thus, |$G(n,m,\sigma )$| consists of |$m$| edges drawn uniformly at random without replacement out of those edges that are bichromatic under |$\sigma $|. This probability distribution is also known as the “planted model”.
As we will see, due to expansion properties, no vertex in the core of |$G(n,m,\sigma )$| can be recolored without leaving the cluster |$\mathcal {C}(\sigma )$| w.h.p. The basic reason is that recoloring any vertex |$v$| in the core sets off an avalanche of recolorings: to give |$v$| another color, we will have to recolor at least 100 vertices that also belong to the core, and so on.
In addition, if a vertex |$v$| outside the core is such that for each color other than its own, |$v$| has a neighbor in the core of that color, then it should also be impossible to recolor |$v$| without leaving |$\mathcal {C}(\sigma )$|. For to assign |$v$| some color |$i\neq \sigma (v),$| we will have to recolor at least one vertex in the core. Guided by this observation, we call a vertex |$v$||$\sigma $|-complete, if for each color |$i\neq \sigma (v)$|, |$v$| has a neighbor |$w$| in the core with |$\sigma (w)=i$|.
If |$\sigma $|-complete vertices do not contribute to |$|\mathcal {C}(\sigma )|$|, then the cluster size stems from recoloring vertices |$v$| that fail to have a neighbor in the core of some color |$i\neq \sigma (v)$|. As we shall see, most of these vertices miss out on exactly one color |$i\neq \sigma (v)$| and hence have precisely two colors to choose from. Formally, we call a vertex |$v$||$a$|-free in |$(G,\sigma )$| if, with |$V'$| denoting the core, we have |$\left |{\left \{{i\in [k]:N(v)\cap V'\cap \sigma ^{-1}(i)=\emptyset }\right \}}\right |\geq a+1$|.
Lemma 3.2 summarizes the expansion properties of |$G(n,m,\sigma )$| that the proof of Lemma 3.1 builds upon.
Let |$\sigma \in \mathcal {B}$| and assume that |$2k\ln k-\ln k-2\leq d\leq 2k\ln k$|. Let |$V_i=\sigma ^{-1}(i)$| for |$i=1,\ldots ,k$|. Then w.h.p. the random graph |$G(n,m,\sigma )$| has the following four properties.
P1: let |$i\in [k]$|. For any subset |$S\subset V_i$| of size |$0.509\cdot \frac nk\leq |S|\leq (1-k^{-0.499})\frac nk$|, the number of vertices |$v\in V\setminus V_i$| that do not have a neighbor in |$S$| is less than |$\frac nk-|S|-n^{2/3}$|.
P2: let |$i\in [k]$|. No more than |$\frac {\kappa n}{3k}$| vertices |$v\not \in V_i$| have less than 15 neighbors in |$V_i$|, where |$\kappa =\ln ^{20}k/k$|.
P3: there is no set |$S\subset V$| of size |$|S|\leq k^{-4/3}n$| that contains more than |$5|S|$| edges.
P4: at most |$\frac nk(1+\tilde O_k(1/k))$| vertices are 1-free, and at most |$\tilde O_k(k^{-2})n$| vertices are 2-free.
The proof of Lemma 3.2 is based on arguments that are, by now, fairly standard; in particular, the “core” has, tweaked in various ways, become a standard tool [1, 7, 13, 37]. For the sake of completeness, we give a full proof of Lemma 3.2 in Appendix. Here we proceed to show how Lemma 3.2 implies Lemma 3.1.
Assume that |$2k\ln k-\ln k-2\leq d\leq 2k\ln k$| and let |$\sigma \in \mathcal {B}$|. Then |$\sigma $| is separable in |$G(n,m,\sigma )$| w.h.p.
By Lemma 3.2, we may assume that the random graph |$G(n,m,\sigma )$| has the properties P1–P3. Suppose that |$\tau \in \mathcal {B}$| is another |$k$|-coloring of this random graph and that |$i,j\in [k]$| are such that |$\rho _{ij}(\sigma ,\tau )\geq 0.51$|. Our aim is to show that |$\rho _{ij}(\sigma ,\tau )>1-\kappa $|. Without loss of generality, we may assume that |$i=j=1$|.
As a next step, we are going to verify that the |$\sigma $|-complete vertices take the same color in all the colorings in |$\mathcal {C}(\sigma )$| w.h.p.; a similar argument was used in [1].
Assume that |$2k\ln k-\ln k-2\leq d\leq 2k\ln k$| and let |$\sigma \in \mathcal {B}$|. W.h.p. the random graph |$G(n,m,\sigma )$| has the following property.
If |$\tau \in \mathcal {C}(\sigma )$|, then for all |$\sigma $|-complete vertices |$v$| we have |$\sigma (v)=\tau (v)$| w.h.p.
To establish (3.9), let |$S_i=\Delta _i^+\cup \Delta _i^-$| for |$i=1,\ldots ,k$|. Because |$\Delta _i^+$| is contained in the core, each |$v\in \Delta _i^+$| has at least 100 neighbors in |$\sigma ^{-1}(i)$|. Since |$\tau $| is a |$k$|-coloring, all of these neighbors lie in the set |$\Delta _i^-$|. Hence, the number |$e(S_i)$| of edges spanned by |$S_i$| is at least |$100|\Delta _i^+|$|. On the other hand, (3.8) implies that |$|S_i|\leq k^{-4/3}n$| for all |$i$|. Therefore, P3 entails that |$e(S_i)\leq 5|S_i|$| for all |$i$|. Thus, we obtain |$100|\Delta _i^+|\leq e(S_i)\leq 5|S_i|\leq 5(\left |{\Delta _i^+}\right |+\left |{\Delta _i^-}\right |)$|. Consequently, |$|\Delta _i^-|\geq 2|\Delta _i^+|$| for all |$i$|. Thus, (3.7) shows that |$\Delta _i^+=\Delta _i^-=\emptyset $| for all |$i$|, whence (3.9) follows.
Let |$\sigma \in \mathcal {B}$|. We need to show that |$G(n,m,\sigma )$| enjoys the properties T2–T3 from Definition 2.3 w.h.p. The fact that T2 holds w.h.p. follows directly from Lemma 3.3.
Finally, upon direct inspection, we find |$f(\bar \rho )=2\ln k+d\ln (1-1/k)$|. Thus, (3.13) shows that for |$d=d_{k,{\rm cond}}-\varepsilon _k=2k\ln k-\ln k-2\ln 2-\varepsilon _k$| we have |$k\cdot f(\bar \rho )=2\ln 2+o_k(1)>0$|, as claimed.
4 The Second Moment
Throughout this section, we assume that |$d=d_{k,{\rm cond}}-\varepsilon _k$|, with |$\varepsilon _k$| from Proposition 2.5. We continue to use the notation introduced in Section 2.
4.1 Overview
The goal is to prove Proposition 2.6. As we already hinted at in Section 2, this boils down to maximizing |$f(\rho )$| over |$\rho \in \mathcal D_{{\rm tame}}$|. Formally, we have the following.
If |$f(\rho )<f(\bar \rho )$| for any |$\rho \in \mathcal D_{{\rm tame}}\setminus \left \{{\bar \rho }\right \}$|, then |${\rm E}[Z_{k,{\rm tame}}^2]\leq O({\rm E}[Z_{k,{\rm tame}}]^2)$|.
The proof of Proposition 4.1, based on the Laplace method, is a mere technical exercise, which we put off to Section 5.
Proposition 4.1 reduces the second moment argument to a problem in analysis. Indeed, neither the function |$f$| nor the domain |$\mathcal D_{{\rm tame}}$| over which we need to maximize are dependent on |$n$| (though both involve the parameters |$d$| and |$k$|). In what follows, we aim to establish the following.
If |$\rho \in \mathcal D_{{\rm tame}}\setminus \left \{{\bar \rho }\right \}$|, then |$f(\rho )<f(\bar \rho )$|.
Thus, Proposition 2.6 is immediate from Proposition s 4.1 and 4.2.
It is clear that the entropy is maximized at the barycenter |$\bar \rho $| of the Birkhoff polytope, because |$k^{-1}\bar \rho $| is the uniform distribution on |$[k]\times [k]$|. Furthermore, among all the matrices |$\rho $| with non-negative entries that sum to |$k$|, |$\bar \rho $| is the one that minimizes the Frobenius norm and hence |$E(\rho )$|. This shows that |$\bar \rho $| is a stationary point of |$f(\rho )$|. But how do we prove that |$\bar \rho $| is the global maximizer of |$f$|?
The domain |$\mathcal D_{{\rm tame}}$| admits a natural decomposition into several subsets. Let us call |$\rho \in \mathcal D$||$s$|-stable if the matrix has precisely |$s$| entries that are greater than |$0.51$|. Let |$\mathcal D_{s,{\rm tame}}$| denote the set of all |$s$|-stable |$\rho \in \mathcal D_{{\rm tame}}$|. Geometrically, any |$\rho \in \mathcal D_{s,{\rm tame}}$| is close to a |$k-s$|-dimensional face of the Birkhoff polytope. For if |$\rho $| has |$s$| entries greater than |$0.51$|, then by separability these entries are in fact at least |$1-\kappa $| (with |$\kappa =\ln ^{20}k/k$| as in (2.15)). Hence, |$\rho $| is close to the face where these |$s$| entries are equal to 1. Indeed, as all other entries of |$\rho $| are smaller than |$0.51$|, |$\rho $| is near a point “deep inside” that face. Consequently, for any |$1\leq s<k$| the set |$\mathcal D_{s,{\rm tame}}$| is disconnected: it consists of many tiny “splinters” near the |$k-s$|-dimensional faces of |$\mathcal D$|. Each of these splinters can be mapped to the component where |$\rho _{11},\ldots ,\rho _{ss}>0.51$| by permuting the rows and columns suitably, which does not affect the function |$f$|.
In the following, we are going to optimize |$f$| separately over |$\mathcal D_{s,{\rm tame}}$| for each |$0\leq s<k$|. We are going to argue that for each |$s$|, the point |$\bar \rho _{s{\rm -stable}}$| whose first |$s$| diagonal entries are 1 and whose |$(i,j)$|-entries are equal to |$(k-s)^{-1}$| for |$i,j>s$| comes close to maximizing |$f$| over |$\mathcal D_{s,{\rm tame}}$| (up to a negligible error term in each case). Geometrically, |$\bar \rho _{s{\rm -stable}}$| is the center of the face defined by |$\rho _{11}=\cdots =\rho _{ss}=1$|. Furthermore, in the case |$s=0,$| we have |$\bar \rho _{s{\rm -stable}}=\bar \rho $|, and we will see that the maximum over |$\mathcal D_{0,{\rm tame}}$| is attained at this very point.
We start by showing that we may confine ourselves to matrices without an entry in the interval |$(0.15,1-\kappa )$|. Recall that |$\mathcal {S}$| is the set of all singly stochastic |$k\times k$|-matrices.
For all |$\rho \in \mathcal {S}$| such that |$\rho _{ij}\in \left [{0.15,0.51}\right ]$| for some |$(i,j)\in [k]\times [k],$| we have |$f(\rho )<0$|.
We will see shortly how Proposition 4.3 implies that |$\bar \rho $| is the maximizer of |$f$| over |$D_{0,{\rm tame}}$|. In addition, there are three different ranges of |$1\leq s<k$| that we deal with separately.
Suppose that |$1\leq s\leq k^{0.999}$|. Then for all |$\rho \in \mathcal D_{s,{\rm tame}}$| we have |$f(\rho )<f(\bar \rho )$|.
Suppose that |$k^{0.999}<s<k-k^{0.49}$|. Then for all |$\rho \in \mathcal D_{s,{\rm tame}}$| we have |$f(\rho )<f(\bar \rho )$|.
Suppose that |$k-k^{0.49}\leq s<k$|. Then for all |$\rho \in \mathcal D_{s,{\rm tame}}$| we have |$f(\rho )\,{<}\,f(\bar \rho )$|.
The proofs of Proposition s 4.3–4.5 are based on a local variations argument. Roughly speaking, we are going to argue that if |$\rho \in \mathcal D_{s,{\rm tame}}$| is “far” from |$\bar \rho _{s{\rm -stable}}$|, then a higher function value can be attained by moving slightly in the direction of |$\bar \rho _{s{\rm -stable}}$|. We expect that this argument can be adapted to perform second moment arguments in other problems in probabilistic combinatorics. Indeed, in such arguments, the function that needs to be optimized is typically similar in nature to our |$f$|: an entropy term maximized at |$\bar \rho $| plus a probability term minimized at |$\bar \rho $|.
Suppose that |$\rho \in \mathcal {S}$|. Let |$i\in [k]$| and |$J\subset [k]$| be such that for some number |$3\ln \ln k/\ln k\leq \lambda \leq 1$| we have |$|J|\geq k^\lambda $|. Moreover, assume that |$\max _{j\in J}\rho _{ij}<\lambda /2-\ln \ln k/\ln k$|. Then the matrix |$\hat \rho $| from (4.1) satisfies |$f(\hat \rho )\geq f(\rho )$|. In fact, if |$\rho \neq \hat \rho $|, then |$ f(\hat \rho )>f(\rho )$|.
Let us illustrate the use of Proposition 4.7 by proving the following.
If |$\rho \in \mathcal D_{0,{\rm tame}}\setminus \left \{{\bar \rho }\right \}$|, then |$f(\rho )<f(\bar \rho )$|.
Proposition 4.2 is immediate from Proposition s 4.4–4.6 and Corollary 4.8. Thus, we are left to prove Proposition s 4.3–4.7. In Section 4.3, we prove Proposition 4.7. Building upon that estimate, we then proceed to prove Proposition s 4.3–4.6. But before we start, we introduce a few pieces of notation and some basic facts.
4.2 Preliminaries
We need the following well-known fact about the entropy.
Let |$p\in \left [{0,1}\right ]^k$| be such that |$\sum _{i=1}^kp_i=1$|. Then |$H(p)\geq 0$| and the following two statements hold.
H1: If |$p$| is supported on a set of size |$s$|, then |$H(p)\leq \ln s$|.
- H2: Let |$\mathcal {I}\subset [k]$| and suppose that |$q=\sum _{i\in \mathcal {I}}p_{i}\in (0,1)$|. Let |$p^\mathcal {I}$| be the vector with entriesThen |$H(p)=h(q)+qH(q^{-1}p^\mathcal {I})+(1-q)H((1-q)^{-1}(p-p^\mathcal {I}))$|.\[ p^\mathcal{I}_{i}=p_{i}\cdot\textbf{1}_{i\in \mathcal{I}}\quad\hbox{for } i\in[k]. \]
As an immediate consequence of Fact 4.9, we have the following.
Let |$p\in \left [{0,1}\right ]^k$| be such that |$\sum _{i=1}^kp_i=1$|.
Let |$\mathcal {I}\subset [k]$| and set |$q=\sum _{i\in \mathcal {I}}p_{i}$|. Then |$H(p)\leq h(q)+q\ln |\mathcal {I}|+(1-q)\ln (k-|\mathcal {I}|)$|.
- Let |$\mathcal {I}\subset \left \{{2,\ldots ,k}\right \}$| be a set of size |$0<|\mathcal {I}|<k-1$|. Set |$q=\sum _{i\in \mathcal {I}}p_{i}$|. If |$p_1<1$|, then\[ H(p)\leq h(p_1)+\left(1-p_1\right)h\left(q/\left(1-p_1\right)\right) +q\ln\left(|\mathcal{I}|\right)+\left(1-q-p_1\right)\ln\left(k-|\mathcal{I}|-1\right). \]
The first claim follows simply by first using H2 and then applying H1 to |$q^{-1}p^\mathcal {I}$| and |$(1-q)^{-1}(p-p^\mathcal {I})$|. To obtain the second assertion, use H2 with |$\mathcal {I}=\left \{{1}\right \}$| and then apply (i) to the probability distribution |$q^{-1}p^{\mathcal {I}}$|.
4.3 Proof of Proposition 4.7
We pursue the following strategy. Suppose that |$a,b\in J$| are such that |$\rho _{ia}=\min _{j\in J}\rho _{ij}$| and |$\rho _{ib}=\max _{j\in J}\rho _{ij}$|. If |$\rho _{ia}=\rho _{ib}$|, then |$\rho =\hat \rho $| and there is nothing to prove. Otherwise, we are going to argue that increasing |$\rho _{ia}$| slightly at the expense of |$\rho _{ib}$| yields a matrix |$\rho '$| with |$f(\rho ')>f(\rho )$|. We start by calculating the partial derivatives of |$f$|.
As a next step, we take a closer look at the right-hand side of (4.11).
Let |$\rho \in \mathcal {S}$|, let |$i,j\in [k]$| and assume that |$\rho _{ij}>0$|.
- Ifthen there exists a unique |$\delta ^*>0$| such that(4.12)\begin{equation} \frac 1{\rho_{ij}}>\frac{d}{k-2+\frac1{k}\left\|{\rho}\right\|_F^2}, \end{equation}Furthermore, for all |$0<\delta <\delta ^*,$| we have |$1+\frac {\delta }{\rho _{ij}}- \exp \left [{\frac {d}{k-2+\frac 1{k}\left \|{\rho }\right \|_F^2}\cdot \delta }\right ]>0$|.\[ 1+\frac{\delta^*}{\rho_{ij}}= \exp\left[{\frac{d\cdot\delta^*}{k-2+\frac1{k}\left\|{\rho}\right\|_F^2}}\right]. \]
If (4.12) does not hold, then for all |$\delta >0$| we have |$1+\frac {\delta }{\rho _{ij}}< \exp \left [\!{\frac {d}{k-2+\frac 1{k}\left \|{\rho }\right \|_F^2}\cdot \delta }\!\right ]$|.
If |$\rho _{ij}=0$| for all |$j\in J$|, then |$\hat \rho =\rho $| and there is nothing to show. Thus, assume that |$\sum _{j\in J}\rho _{ij}>0$|. Suppose that |$\tilde \rho \in \mathcal {S}$| maximizes |$f(\tilde \rho )$| subject to the conditions
|$\tilde \rho _{ab}=\rho _{ab}$| for all |$(a,b)\not \in \{i\}\times J$| and
|$\max _{j\in J}\tilde \rho _{ij}\leq \max _{j\in J}\rho _{ij}$|.
We claim that |$\tilde \rho _{ij}>0$| for all |$j\in J$|. Indeed, assume that |$\tilde \rho _{ij}=0$| for |$j\in J$| but |$\tilde \rho _{il}>0$| for some other |$l\in J$|. We recall that |$f(\rho )=H(k^{-1}\rho )+E(\rho )$|. As (4.5) and (4.6) show, |$\partial H(k^{-1}\rho )/\partial \rho _{ij}$| tends to infinity as |$\rho _{ij}$| approaches 0, while |$|\partial E(\rho )/\partial \rho _{ij}|$| remains bounded. Hence, there is |$\xi >0$| such that the matrix |$\rho '$| obtained from |$\tilde \rho $| by replacing |$\tilde \rho _{ij}$| by |$\xi $| and |$\tilde \rho _{il}$| by |$\tilde \rho _{il}-\xi $| satisfied |$f(\rho ')>f(\tilde \rho )$|, in contradiction to the maximality of |$f(\tilde \rho )$|.
4.4 Proof of Proposition 4.3
To proof is based on two key lemmas. The first one rules out that |$f(\rho )$| takes its maximum over |$\rho \in \mathcal {S}$| at a matrix with an entry close to |$1/2$|.
If |$\rho \in \mathcal {S}$| has an entry |$\rho _{ij}\in \left [{0.49,0.51}\right ]$|, then there is |$\rho ' \in \mathcal {S}$| such that |$f(\rho ')\geq f(\rho )+\frac {\ln k}{5k}$|.
Without loss of generality, we may assume that |$(i,j)=(1,1)$| and that |$\rho \in \mathcal {S}$| maximizes |$f$| subject to the condition that |$\rho _{11}\in \left [{0.49,0.51}\right ]$|. There are two cases.
We have |$\max _{\rho \in \mathcal {S}} f(\rho )\leq \frac {\ln k}{8k}+O_k(1/k)$|.
The proof of Lemma 4.14 requires two intermediate steps. We start with the following exercise in calculus.
The first derivative vanishes at the two points |$b=\frac k2(1\pm \sqrt {1-2/\ln k})$| only. Moreover, an elementary calculation shows that |$\mu =\frac k2(1-\sqrt {1-2/\ln k})$| is a local minimum, while |$\frac k2(1+\sqrt {1-2/\ln k})>k/2$| is a local maximum. Hence, |$\xi $| is decreasing on the interval |$(0,\mu )$| and increasing on |$(\mu ,k/2)$|. The last assertion follows by direct inspection of the above expression for |$\xi '$|.
Let |$\rho \in \mathcal {S}$|. Suppose that |$i\in [k]$| is such that |$\rho _{ij}\not \in [0.49,0.51]$| for all |$j\in [k]$|.
- Suppose that |$\rho _{ij}\leq 0.49$| for all |$j\in [k]$|. Let |$\rho '$| be the stochastic matrix with entriesThen |$f(\rho )\leq f(\rho ')$|.\[ \rho'_{hj}=\rho_{hj}\quad\hbox{and}\quad \rho_{ij}'=1/k\quad\hbox{for all}\ j\in[k],\quad h\in[k]\setminus\{i\}. \]
- Suppose that |$\rho _{ij}\geq 0.51$| for some |$j\in [k]$|. Then there is a number |$\alpha =1/k+\tilde O_k(1/k^2)$| such that for the stochastic matrix |$\rho ''$| with entrieswe have |$f(\rho )\leq f(\rho '')$|.\[ \rho''_{hj}=\rho_{hj}\hbox{ and }\rho_{ii}''=1-\alpha,\ \rho''_{ih}=\frac{1-\alpha}{k-1}\quad\hbox{for all}\ j\in[k],\quad h\in[k]\setminus\{i\} \]
To obtain the first assertion, we simply apply Proposition 4.7 to row |$i$| and |$J=[k]$| (with |$\lambda =1$|). With respect to the second claim, we may assume without loss that |$i=j=1$| and |$\rho _{11}\geq 0.51$|. Let |$\hat \rho \in \mathcal {S}$| be the matrix that maximizes |$f$| subject to the conditions
|$\hat \rho _{11}\geq 0.51$|.
|$\hat \rho _a=\rho _a$| for all |$a\in \left \{{2,\ldots ,k}\right \}$|. (In words, the last |$k-1$| rows of |$\hat \rho $| and |$\rho $| coincide.)
First, we observe that |$\beta >0$|. For (4.5), it shows that the derivative |$\partial H(\rho _1)/\partial \rho _{11}$| of the entropy of row |$\rho _1$| tends to |$-\infty $| as |$\rho _{11}$| approaches 1, while (4.6) implies that the derivative |$\partial E(\rho )/\partial \rho _{11}$| remains bounded in absolute value. Hence, the maximality of |$f(\rho )$| implies that |$\beta >0$|.
Since we already know that |$\beta >0$|, (4.25), (4.27) and (4.28) imply |$\beta \in [1-\gamma ,1+\gamma ]$|. Thus, |$\beta =1+\tilde O_k(1/k)$| and consequently |$\hat \rho _{11}=1-\beta /k=1-1/k+\tilde O_k(k^{-2})$|, as desired.
Now, suppose that |$\rho \in \mathcal {S}$| has a row |$i$| such that |$\max _{j\in [k]}\rho _{ij}\in \left [{0.15,0.49}\right ]$|. Without loss of generality, we may assume |$i=1$| and |$\rho _{11}=\max _{j\in [k]}\rho _{ij}$|. In fact, we may assume that |$\rho $| is the maximizer of |$f$| subject to the condition |$\rho _{11}=\max _j\rho _{1j}\in \left [{0.15,0.49}\right ]$|. Again, we show that |$f(\rho )<0$|.
4.5 Proof of Proposition 4.4
Let |$1\leq s\leq k^{0.999}$| and let |$\rho \in \mathcal D_{s,{\rm tame}}$| be the maximizer of |$f$|. Without loss of generality, we may assume that |$\rho _{ii}\geq 0.51$| for |$i=1,\ldots ,s$| and |$f(\rho _{ij})<0.51$| for all |$(i,j)\not \in \left \{{(1,1),\ldots ,(s,s)}\right \}$|. Because |$\rho $| is separable, this implies that in fact |$\rho _{ii}\geq 1-\kappa $| for |$i=1,\ldots ,s$|, with |$\kappa =\ln ^{20}k/k$| as in (2.15). Furthermore, if there is a pair |$(i,j)\not \in \left \{{(1,1),\ldots ,(s,s)}\right \}$| such that |$\rho _{ij}\geq 0.15$|, then Proposition 4.3 implies that |$f(\rho )<0$|. In this case, we are done, because |$f(\bar \rho )>0$| by Proposition 2.5. Thus, assume from now on that |$\rho _{ij}<0.15$| for all |$(i,j)\not \in \left \{{(1,1),\ldots ,(s,s)}\right \}$|.
We have |$H(k^{-1}\hat \rho )\leq H(k^{-1}\bar \rho _{s{\rm -stable}})+o_k(1/k)$|.
We have |$E(\hat \rho )\leq E(\bar \rho _{s{\rm -stable}})+o_k(1/k)$|.
By comparison, we have |$\|\bar \rho _{s{\rm -stable}}\|_2^2=s+1$|. Hence, the bound (4.6) on the derivative of |$E$| and (4.47) yield |$E(\hat \rho )\leq E(\bar \rho _{s{\rm -stable}})+o_k(1/k)$|, as claimed.
4.6 Proof of Proposition 4.5
Suppose that |$k^{0.999}<s<k-k^{0.49}$| and let |$\rho \in \mathcal D_{s,{\rm tame}}$| be the maximizer of |$f$| over |$\mathcal D_{s,{\rm tame}}$|. We may assume without loss that |$\rho _{ii}\geq 0.51$| for |$i=1,\ldots ,s$| and |$\rho _{ij}<0.51$| for |$(i,j)\not \in \left \{{(1,1),\ldots ,(s,s)}\right \}$|. Due to separability, we thus have |$\rho _{ii}\geq 1-\kappa $| for |$i=1,\ldots ,s$|. Further, we may assume that |$\rho _{ij}\leq 0.15$| for all |$(i,j)\not \in \left \{{(1,1),\ldots ,(s,s)}\right \}$| as otherwise Proposition 4.3 yields |$f(\rho )<0<f(\bar \rho )$|.
We have |$H(\hat \rho )\leq 2\ln k+\frac {3q(2+\ln k)}k+\left ({1-s/k}\right )\ln (1-s/k)-\frac {s\ln k}k+\frac {2\ln k}k\sum _{i=1}^st_i+O_k(1/k)$|.
We have |$E(\hat \rho ) =-2\ln k+\frac {s\ln k}{k}\left ({1+\frac {3}{2k}-\frac {s}{2k^2}}\right )-\frac {2\ln k}k\sum _{i=1}^st_i+\tilde O_k(1/k)$|.
4.7 Proof of Proposition 4.6
Suppose that |$k-\sqrt k\leq s\leq k-1$| and that |$\rho \in \mathcal D_{s,{\rm tame}}$| maximizes of |$f$| over |$\mathcal D_{s,{\rm tame}}$|. As before, we assume without loss that |$\rho _{ii}\geq 0.51$| for |$i=1,\ldots ,s$| and |$\rho _{ij}<0.51$| for |$(i,j)\not \in \left \{{(1,1),\ldots ,(s,s)}\right \}$|. Thus, |$\rho _{ii}\geq 1-\kappa $| for |$i=1,\ldots ,s$| as |$\rho $| is separable. Further, if |$\rho _{ij}>0.15$| for some |$(i,j)\in \left \{{(1,1),\ldots ,(s,s)}\right \}$|, then |$f(\rho )<0<f(\bar \rho )$| by Proposition 4.3. Hence, we assume |$\rho _{ij}\leq 0.15$| for all |$(i,j)\not \in \left \{{(1,1),\ldots ,(s,s)}\right \}$|.
With |$\mathcal {H}=\frac 1k\sum _{i=1}^s h(\rho _{ii})$| we have |$H(k^{-1}\rho )\leq \ln k+\mathcal {H}+\frac {q}k\ln k+0.51(k-s)\frac {\ln k}k$|.
We have |$E(\rho )\leq E({\rm id})+(1+\tilde O_k(1/k))\frac {\ln k}{k}\left ({-0.85(k-s)+\sum _{i=1}^s(\rho _{ii}^2-1)}\right )$|.
5 The Laplace Method
In this section, we assume that |$d=d_{k,{\rm cond}}-\varepsilon _k$|, with |$\varepsilon _k$| from Proposition 2.5. We continue to use the notation introduced in Section 2. In particular, we recall that an overlap matrix |$\rho $| is |$k$|-stable if for any |$i\in [k]$| there is |$j\in [k]$| such that |$\rho _{ij}>0.51$|, that |$\rho $| is separable if |$\rho _{ij}\not \in \left ({0.51,1-\kappa }\right )$| for all |$i,j\in [k]$| (with |$\kappa =\ln ^{20}k/k$|) and that |$\mathcal D_{{\rm tame}}$| contains all doubly stochastic |$\rho $| that are separable but not |$k$|-stable.
There is a number |$C=C(k)>0$| such that |$\sum _{\rho \in {\mathcal R}_2}{\rm E}\left [{Z_{\rho ,{\rm tame}}}\right ]\leq C\cdot {\rm E}[Z_{k,{\rm tame}}]^2$|.
To bound the contribution of |$\rho \in {\mathcal R}_3$|, we need the following observation. Recall that |$\mathcal D$| stands for the Birkhoff polytope, that is, the set of all doubly stochastic |$k\times k$| matrices and that |${\mathcal R}$| is the set of all possible overlap matrices of balanced |$\sigma ,\tau $|.
There is a number |$C=C(k)>0$| such that for any |$\rho \in {\mathcal R}$| there is |$\rho '\in \mathcal D$| with |$\left \|{\rho -\rho '}\right \|_F< C/\sqrt n$|.
Let |$\rho \in {\mathcal R}$|. By construction, we have |$\sum _{i,j}\rho _{ij}=k$|. Hence, while there is |$i\in [k]$| such that the row sum is |$\sum _{j}\rho _{ij}=1+\alpha >1$|, there must be another row |$l$| such that |$\sum _{j}\rho _{lj}=1-\alpha '<1$|. Thus, recalling that |$\rho _i=(\rho _{ij})_{j\in [k]}$| denotes the |$i$|th row of |$\rho $|, by replacing row |$i$| by |$(1-\alpha '')\rho _i$| and row |$l$| by |$\rho _l+\alpha ''\rho _i$| for some suitable |$\alpha ''\leq 2k/\sqrt n$|, we can ensure that at least one of the row sums is one. After at most |$k-1$| steps, we thus obtain a stochastic matrix |$\rho ''$| such that |$\|\rho -\rho ''\|_2=2k^3/\sqrt n$|. Repeating the same operation for the columns yields the desired doubly stochastic |$\rho '$|.
If |$f(\rho )<f(\bar \rho )$| for any |$\rho \in \mathcal D_{{\rm tame}}\setminus \left \{{\bar \rho }\right \}$|, then |$\sum _{\rho \in {\mathcal R}_3}{\rm E}\left [{Z_{\rho ,{\rm tame}}}\right ]\leq {\rm E}[Z_{k,{\rm tame}}]^2$|.
We claim that |${\mathcal R}_3\subset \mathcal {D}''$|. Indeed, any |$\rho \in {\mathcal R}_3$| satisfies |$\left \|{\rho -\bar \rho }\right \|_F\geq \eta $| (as otherwise |$\rho \in {\mathcal R}_0$|), is separable (as otherwise |$\rho \in {\mathcal R}_1$|), and fails to be |$k$|-stable (as otherwise |$\rho \in {\mathcal R}_2$|). Moreover, by Lemma 5.3 there is a doubly stochastic |$\rho '$| such that |$\left \|{\rho -\rho '}\right \|_F<C/\sqrt n$|. However, this matrix |$\rho '$| may or may not be separable and/or |$k$|-stable. To rectify this, we form a convex combination between |$\rho '$| and a suitable doubly stochastic matrix. More precisely, suppose that the matrix |$\rho $| has precisely |$l\leq k-1$| entries that are greater than |$0.51$|. Each row and each column contain at most one such entry (as |$\rho \in \mathcal {B}$|). Thus, we may assume without loss of generality that |$\rho _{11},\ldots ,\rho _{ll}>0.51$|. Let us consider two cases.
Case 1: |$l<k-1$|: let |$\rho ''$| be the doubly stochastic matrix with |$\rho ''_{11}=\cdots =\rho _{ll}''=1$| and |$\rho _{ij}''=(k-l)^{-1}$| for |$i,j>l$|.
Case 2: |$l=k-1$|: let |$\rho ''$| be the doubly stochastic matrix with |$\rho ''_{11}=\cdots =\rho _{ll}''=1-1/(k-1)$| and |$\rho _{ik}=1/(k-1)$| for |$i<k$|.
Let |$\beta =n^{-1/3}$| and suppose that |$n$| is large. We claim that in either case |$\rho '''=(1-\beta )\rho '+\beta \rho ''\in \mathcal {D}'$|. Indeed, since |$\rho \in {\mathcal R}_3$| and |$\left \|{\rho -\rho '}\right \|_F<C/\sqrt n$|, the construction of |$\rho ''$| ensures that |$\rho '''_{ii}\geq 1-\kappa $| for all |$i\leq l$|. Similarly, |$\rho '''_{ij}<0.51$| unless |$i=j\leq l$|. Hence, |$\rho \in \mathcal D_{{\rm tame}}$|. Further, since |$\left \|{\rho -\rho '}\right \|_F<C/\sqrt n$| the triangle inequality implies that |$\left \|{\rho -\rho '''}\right \|_F<\delta $|. As |$\left \|{\rho -\bar \rho }\right \|\geq \eta $|, we conclude that |$\rho '''\in \mathcal {D}'$|, whence |$\rho \in \mathcal {D}''$|.
Finally, Proposition 4.1 follows from (5.8) and Lemma s 5.1, 5.2 and 5.4.
Funding
The research leading to these results has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP 7, 2007–2013) / ERC Grant Agreement 278857–PTCC. Funding to pay the Open Access publication charges for this article was provided by the European Research Council under the European Union's Seventh Framework Programme (FP7/2007-2013) / ERC Grant Agreement n. 278857–PTCC.
Appendix. Proof of Lemma 3.2
Throughout this section, we assume that |$2k\ln k-\ln k-2\leq d\leq 2k\ln k$|. In addition, we fix some |$\sigma \in \mathcal {B}$| and we let |$V_i=\sigma ^{-1}(i)$| for |$i=1,\ldots ,n$|.
To simplify the calculations, we consider the following variant of the planted model. Given |$\sigma $|, |$n$| and |$q\in (0,1)$|, we let |$\mathcal {G}(n,q,\sigma )$| be the random graph in which any two vertices |$v,w$| with |$\sigma (v)\neq \sigma (w)$| are adjacent with probability |$p$| independently. The following observation relates this model to the planted model |$G(n,m,\sigma )$| from Lemma 3.2.
The following installment of the Chernoff bound will prove useful.
A.1 Proof of P1
A.2 Proof of P2
In |$\mathcal {G}(n,p,\sigma )$|, for each vertex |$v\in V\setminus V_i$| the number of neighbors of |$v$| in |$V_i$| has distribution |${\rm Bin}(|V_i|,p)$|. Due to (A.2) and because |$\sigma $| is balanced, the mean is |$\lambda =|V_i|p\sim \frac nk p>2\ln k$|. Hence, by Stirling's formula, the probability that |$v$| has fewer than 15 neighbors in |$V_i$| is |$q\leq 2\lambda ^{14}\exp (-\lambda )\leq 2k^{-2}\ln ^{14}k$|. Further, because the event of having fewer than 15 neighbors in |$V_i$| occurs independently for all |$v\in V\setminus V_i$|, the total number |$Y_i$| of such vertices has a binomial distribution |${\rm Bin}(|V\setminus V_i|,q)$|. As |$\sigma $| is balanced, the mean is |$|V\setminus V_i|q\leq (1-1/k+o(1))n\cdot q\leq 3k^{-2}\ln ^{14}k.$| Since we chose |$\kappa =k^{-1}\ln ^{20}k$|, a straightforward application of Lemma A.2 (the Chernoff bound) implies that |${\rm P}\left [{Y_i>\frac {\kappa n}{3k}}\right ]\leq \exp (-\Omega (n)),$| as desired.
A.3 Proof of P3
A.4 Proof of P4
We start by estimating the size of the core; the proof of the following proposition draws on arguments developed in [1, 7].
With probability |$1-\exp \left ({-\Omega (n)}\right )$|, the core of |$\mathcal {G}(n,p,\sigma )$| contains |$(1\,{-}\,\tilde {O}_k(k^{-1}))n$| vertices.
The proof of Proposition A.3 is constructive: basically, we iteratively remove vertices of that have too few neighbors of some color other than their own among the remaining vertices. More precisely, we consider the following process. For a vertex |$v$| and a set |$S$| of vertices let |$e(v,S)$| denote the number of neighbors of |$v$| in |$S$| in |$\mathcal {G}(n,p,\sigma )$|.
CR1: For |$i,j\in [k]$|, |$i\neq j$|, let |$W_{ij}=\{v \in V_i:e(v,V_j) < 300\}$|, |$W_{ii}=\emptyset $|, |$W_{i}=\cup _{j\neq i} W_{ij}$|, and |$W=\cup _{i=1}^k W_i$|.
CR2: For |$i \ne j$|, let |$U_{ij}=\{v \in V_i: e(v,W_j) > 100\}$| and |$U = \cup _{i \ne j} U_{ij}$|.
CR3: Set |$Z^{(0)} = U$| and repeat the following for |$i\geq 0$|:
if there is |$v \in V \setminus Z^{(i)}$| such that |$e(v,Z^{\{i\}})\geq 100$|, pick one such |$v$| and let |$Z^{(i+1)}=Z^{(i)} \cup \{v\}$|;
otherwise, let |$Z^{(i+1)}=Z^{(i)}$|.
Let |$Z=\cup _{i\geq 0}Z^{\{i\}}$| be the final set resulting from CR3. By construction, the set |$V\setminus (W\cup Z)$| is contained in the core. Indeed, if |$v\in V_i\setminus W$|, then |$v$| has at least 300 neighbors in |$V_j$| for every |$j\neq i$|. Further, if |$v\not \in U\subset Z$|, then at least 200 of these belong to |$V_j\setminus W$|. Hence, if |$v\not \in Z$|, then |$v$| has at least 100 neighbors in |$V_j\setminus (W\cup Z)$|. To complete the proof of Proposition A.3, we bound the sizes of |$W$|, |$U$| and |$Z$| (Lemmas A.4–A.6).
With probability at least |$1-\exp \left ({-\Omega (n)}\right ),$| we have |$|W_{ij}|\leq \tilde O_k(k^{-3})$| for any |$i,j$|.
Fix |$i,j$|, |$i \ne j$|. Due to the independence of the edges in |$\mathcal {G}(n,p,\sigma )$|, for any |$v \in V_i$| the number |$e(v,V_j)$| of neighbors in |$V_j$| has distribution |${\rm Bin}(|V_j|,p)$|. As |$\sigma $| is balanced, (A.2) shows that the mean is |$\mu =|V_j|p\geq 2\ln k$|. Using the Chernoff bound (Lemma A.2), we obtain |$P[|e(v,V_j)| \le 300] \le \exp (-2\ln k +O_k(\ln \ln k)) = \tilde O_k(k^{-2})$|. Hence, by the linearity of expectation and because |$\sigma $| is balanced, |$\mathbb E[|W_{ij}|] \le \tilde O_k(k^{-2}) \cdot |V_i|= n\cdot \tilde O_k(k^{-3})$|. Further, once more due to the independence of the edges in |$\mathcal {G}(n,p,\sigma )$|, |$|W_{ij}|$| is a binomial random variable. Thus, using the Chernoff bound once more (with, say, |$t= k^{-4}n$|), we see that |$P[|W_{ij}|\leq \tilde O_k(k^{-3})n]\geq 1-\exp (-\Omega (n))$|, as required.
With probability at least |$1-\exp \left ({-\Omega (n)}\right )$| we have |$|U| \le n/k^{30}$|.
With respect to |$U''_{ij}$|, we observe the following. Given that |$w\in W_{ji}$|, we know that |$w$| has fewer than 300 neighbors in |$V_i$|. But the fact that |$w\in W_{ji}$| has no implications as to which|$v\in V_i$| vertex |$w$| is adjacent to. Thus, given that |$w\in W_{ji}$| and given |$e(w,V_i)$|, the actual set of neighbors of |$w$| in |$V_i$| is a random subset of |$V_i$| of size |$e(w,V_i)\leq 300$|. In fact, these sets are mutually independent for all |$w\in W_{ji}$|. Thus, we can bound |$|U_{ij}''|$| by means of the following balls and bins experiment: let us think of the vertices in |$V_i$| as bins. Then each vertex |$w\in W_{ji}$| tosses 300 balls randomly into the bins |$V_i$|, independently of all other vertices in |$W_{ji}$|. In this experiment, let |$\mathcal {X}$| be the set of |$v\in V_i$| that receive at least 50 balls. Then |$|U_{ij}''|$| is dominated by |$|\mathcal {X}|$| stochastically.
With probability at least |$1-\exp \left ({-\Omega (n)}\right )$| we have |$|Z| \le n/k^{29}$|.
Lemma A.5 entails that with probability at least |$1-\exp \left ({-\Omega (n)}\right )$|, |$|U| \le n/k^{30}$|. Assume that this is indeed the case. Further, suppose that |$|Z\setminus U|\geq i^*=n/k^{30}$|. Let us stop the process CR3 at this point, and let |$Z^*=Z^{(i^*)}$|. By construction, the graph induced on |$S=U \cup Z^*$| spans at least |$100i^*\geq 50|S|$| edges, while |$|S|\leq 2k^{-30}n$|. Thus, the set |$S$| violates condition P3. But since we saw in Section A.3 that P3 is satisfied with probability |$1\,{-}\,\exp (-\Omega (n))$|, the assertion follows.
Now, Proposition A.3 is immediate from Lemma s A.4–A.6. For a set |$Y\subset V$| let us denote by |$N(Y)$| the set of all vertices |$v\in V$| that have a neighbor in |$Y$| in |$\mathcal {G}(n,p,\sigma )$|. As a further step toward the proof of P4, we establish
With probability |$1-\exp (-\Omega (n)),$| we have |$|N(Z)|\leq n k^{-20}$|.
This is immediate from Lemma s A.6 and A.7.
We define two sets of vertices, which capture the 1-free and 2-free vertices. In what follows, when always let |$i,j\in [k]$|, |$i\neq j$|. Let |$S_0$| be the set of vertices that have zero neighbors in some color class other than their own. Moreover, |$S_1 = \{v\in V \setminus S_0 : \exists i,j\ \text {s.t.}\ v \in V_i\ \text {and}\ N(v)\cap V_j \subseteq W_j\}$|. By the construction of the core, we have
If |$v$| is 1-free, then |$v\in S_0\cup S_1\cup Z\cup N(Z)$|.
We proceed by estimating the sizes of |$S_0$|, |$S_1$|.
With probability |$1-\exp (-\Omega (n)),$| we have |$|S_0|\leq \frac nk$|.
With probability |$1-\exp (-\Omega (n)),$| we have |$|S_1|\leq \tilde O_k(k^{-2})n$|.
Fact A.9 implies together with Lemma A.6, Corollary A.8, Lemma A.10 and Lemma A.11 the desired bound on the number of 1-free vertices. To bound the number of 2-free variables, we need
Let |$i,j,l\in [k]$| be distinct. With probability at least |$1-\exp (-\Omega (n)),$| there are no more than |$n\tilde O_k(k^{-5})$| vertices |$v\in V_i$| such that |$e(v,V_j)\leq 100$| and |$e(v,V_l)\leq 100$|.
For any |$v$|, |$e(v,V_j)$|, |$e(v,V_l)$| are independent binomial variables. Because |$\sigma $| is balanced, their means are |$(1+o(1))\frac nkp$|. Hence, (A.2) shows that |${\rm P}\left [{e(v,V_j),e(v,V_l)\leq 100}\right ]\leq \tilde O_k(k^{-4})$|. Consequently, the expected number of |$v\in V_i$| with |$e(v,V_j),e(v,V_l)\leq 100$| is |$nO_k(k^{-5})$|. In fact, this is a binomial random variable due to the independence of the edges in |$\mathcal {G}(n,p,\sigma )$|. Thus, the assertion follows from the Chernoff bound.□
Now, let |$S_2$| be the set of all |$v\in V_i$| such that there exist distinct |$j,l\in [k]\setminus \{i\}$| such that |$e(v,V_j)\leq 100$| and |$e(v,V_l)\leq 100$|. By construction, if |$v$| is 2-free, then |$v\in S_2\cup Z\cup N(Z)$| (note that |$U\subset Z$|). Thus, the desired bound on the number of 2-free vertices follows from Lemma A.6, Corollary A.8 and Lemma A.12.
References
Author notes
Communicated by Prof. Assaf Naor
An extended abstract version of this work appeared in the Proceedings of the 54th IEEE Symposium on Foundations of Computer Science (“FOCS”), 2013.