Abstract

For a fixed number |$d>0$| and |$n$| large, let |$G(n,d/n)$| be the random graph on |$n$| vertices in which any two vertices are connected with probability |$d/n$| independently. The problem of determining the chromatic number of |$G(n,d/n)$| goes back to the famous 1960 article of Erdös and Rényi that started the theory of random graphs [Magayar Tud. Akad. Mat. Kutato Int. Kozl. 5 (1960) 17–61]. Progress culminated in the landmark paper of Achlioptas and Naor [Ann. Math. 162 (2005) 1333–1349], in which they calculate the chromatic number precisely for all |$d$| in a set |$S\subset (0,\infty )$| of asymptotic density |$\lim _{z\rightarrow \infty }\frac 1z\int _0^z\textbf {1}_S=\frac {1}{2}$|⁠, and up to an additive error of one for the remaining |$d$|⁠. Here we obtain a near-complete answer by determining the chromatic number of |$G(n,d/n)$| for all |$d$| in a set of asymptotic density 1.

1 Introduction

Let |$G(n,p)$| denote the random graph on the vertex set |$V=\left \{{1,\ldots ,n}\right \}$| in which any two vertices are connected with probability |$p\in [0,1]$| independently, known as the Erdös–Rényi model. (Actually this model was introduced by Gilbert [24]. In their seminal paper Erdös and Rényi consider a random graph |$G(n,m)$| in which the number of edges is a fixed integer |$m$| [21]. However, with

$p=m/\left (\begin {smallmatrix}n \\ 2\end {smallmatrix}\right )$
both models are essentially equivalent [26].) We write |$p=d/n$| and refer to |$d$| as the average degree. As per common practice, we say that |$G(n,d/n)$| has a property with high probability (“w.h.p.”) if the probability that the property holds converges to 1 as |$n\rightarrow \infty $|⁠. We recall that a graph |$G$| is |$k$|-colorable if it is possible to assign each vertex one of the colors |$\left \{{1,\ldots ,k}\right \}$| such that no edge connects two vertices of the same color. Moreover, the chromatic number|$\chi (G)$| of a graph |$G$| is the least integer |$k$| such that |$G$| is |$k$|-colorable. Unless specified otherwise, we always consider |$d,k$| fixed as |$n\rightarrow \infty $|⁠.

1.1 Background and main results

The theory of random graphs was born with the famous 1960 article by Erdös and Rényi [21], and has grown since into a substantial area of research with hundreds, perhaps thousands of contributions dealing with the |$G(n,p)$| model alone. In their paper, Erdös and Rényi showed that the random graph |$G(n,p)$| undergoes a percolation phase transition at |$p=1/n$|⁠, and phase transitions have been the guiding theme of the theory ever since. In addition, Erdös and Rényi set the agenda for future research by posing a number of intriguing questions, all of which have been answered over the years except for one: for a given |$d>0$|⁠, what is the typical chromatic number of |$G(n,d/n)$|?

It is widely conjectured that for any number |$k\geq 3$| of colors there occurs a phase transition for |$k$|-colorability. That is, there exists a number |$d_{k-{\rm col}}$| such that |$G(n,d/n)$| is |$k$|-colorable w.h.p. if |$d<d_{k-{\rm col}}$|⁠, whereas the random graph fails to be |$k$|-colorable w.h.p. if |$d>d_{k-{\rm col}}$|⁠. If true, this would imply that the likely value of the chromatic number, viewed as a function of |$d$|⁠, is a step function that takes the value |$k$| on the interval |$d_{(k-1)-{\rm col}}<d<d_{k-{\rm col}}$|⁠.

Toward this conjecture, Achlioptas and Friedgut [2] proved that for any fixed |$k\geq 3$| there exists a sharp threshold sequence|$d_{k-{\rm col}}(n)$|⁠. This sequence is such that for any |$\varepsilon >0$|⁠,

  • if |$p<(1-\varepsilon )d_{k-{\rm col}}(n)/n$|⁠, then |$G(n,p)$| is |$k$|-colorable with probability tending to 1 as |$n\rightarrow \infty $|⁠.

  • if |$p>(1+\varepsilon )d_{k-{\rm col}}(n)/n$|⁠, then |$G(n,p)$| fails to be |$k$|-colorable with probability tending to 1 as |$n\rightarrow \infty $|⁠.

Whether the sequence |$d_{k-{\rm col}}(n)$| converges to an actual “uniform” threshold |$d_{k-{\rm col}}$| is a well-known open problem.

Yet [2] is a pure existence result that does not provide any clue as to the location of |$d_{k-{\rm col}}$|⁠. In a landmark paper, Achlioptas and Naor [6] proved via the “second moment method” that
(1.1)
Here and throughout, |$o_k(1)$| denotes a term that tends to zero in the limit of large |$k$|⁠. By comparison, a naive application of the union bound shows that
(1.2)
Recently [14], a more sophisticated union bound argument was used to prove
(1.3)
Thus, the gap between the lower bound (1.1) and the upper bound (1.3) on |$d_{k-{\rm col}}(n)$| is about |$\ln k+1$|⁠, an expression that diverges as |$k$| gets large. By improving the lower bound, Theorem 1.1 reduces this gap to a small absolute constant of |$2\ln 2-1+o_k(1)\approx 0.39$|⁠.  
Theorem 1.1
The |$k$|-colorability threshold satisfies
(1.4)

The bounds (1.1) and (1.3) yield an estimate of the chromatic number of |$G(n,d/n)$|⁠. Namely (1.1) implies that for |$d<d_{k,{\rm AN}}$|⁠, the random graph |$G(n,d/n)$| is |$k$|-colorable w.h.p. Moreover, (1.3) shows that for |$d>d_{k-1,{\rm first}}$|⁠, |$G(n,d/n)$| fails to be |$k-1$|-colorable w.h.p. Consequently, for all |$d$| in the interval |$(d_{k-1,{\rm first}}',d_{k,{\rm AN}})$| of length about |$\ln k$|⁠, the chromatic number of |$G(n,d/n)$| is precisely |$k$| w.h.p. However, for all |$d$| in the subsequent interval |$(d_{k,{\rm AN}},d_{k,{\rm first}}')$| of length about |$\ln k$|⁠, (1.1) and (1.3) only imply that the chromatic number is either |$k$| or |$k+1$| w.h.p. Thus, (1.1) and (1.3) yield the typical value of |$\chi (G(n,d/n))$| precisely for “about half” of all |$d$|⁠. Formally, let us say that a (measurable) set |$A\subset \mathbb {R}_{\geq 0}$| has asymptotic density |$\alpha $| if |$\lim _{z\rightarrow \infty }\frac 1z\int _{0}^z\textbf {1}_A=\alpha $|⁠, where |$\textbf {1}_A$| is the indicator of |$A$|⁠. Then the set on which (1.1), (1.3) determine |$\chi (G(n,d/n))$| has asymptotic density |$1/2$| [6, Theorem 2].

Theorem 1.1 enables us to pin the chromatic number down precisely on a set of asymptotic density 1, thereby obtaining a near-complete answer to the question of Erdös and Rényi. More precisely, (1.2) and (1.4) imply the following.  

Theorem 1.2
There exists a constant |$k_0$| such that the following is true. Let
Set |$F(d)=k$| for all |$d\in S_k$|⁠. Then |$S$| has asymptotic density 1 and

Of course, the constants |$0.99$| and |$1.38$| in the definition of |$S_k$| can be replaced by any numbers less than one and |$2\ln 2$|⁠, respectively. Theorem 1.2 also answers a question of Alon and Krivelevich [8] whether the chromatic number of |$G(n,d/n)$| is concentrated on a single integer for most |$d$| “in an appropriately defined sense”. (A proof that the threshold sequence |$d_{k-{\rm col}}(n)$| converges would imply a one-point concentration result for the chromatic number outside a countable set of average degrees. However, the known result [2] does not. Alon and Krivelevich [8] were concerned also with the case that the average degree |$d$| is a growing function of |$n$|⁠. In this paper, we deal with |$d$| fixed as |$n\rightarrow \infty $|⁠, the original setting considered by Erdös and Rényi.)

Independently of the mathematics literature, the random graph coloring problem has been studied in statistical physics, where it is known as the “diluted mean-field Potts antiferromagnet at zero temperature”. In fact, physicists have developed a generic, ingenious but highly non-rigorous formalism called the “cavity method” for locating phase transitions in random graphs and other discrete structures [35, 36]. The so-called “replica symmetric” variant of the cavity method predicts upper and lower bounds on |$d_{k-{\rm col}}$| [30, 38], namely
(1.5)
Theorem 1.1 establishes the lower bound rigorously.

Additionally, the cavity method yields predictions on the combinatorial nature of the problem, particularly on the geometry of the set of |$k$|-colorings of the random graph. The proof of Theorem 1.1 is based on a “physics-enhanced” second moment argument that exploits this geometrical intuition. In fact, the physics intuition is one of two key ingredients that enable us to improve over the approach of Achlioptas and Naor [6]. The second one is a novel approach, based on a local variations argument, to the analytical challenge of optimizing a certain (non-convex) function over the Birkhoff polytope. Neither of these ideas seem to depend on particular features of the graph coloring problem, and thus we expect that they will prove vital to tackle a variety of further related problems.

An outline of our physics-enhanced second moment argument follows in Section 2. In addition, in Section 2.5, we will see that the density |$d_{k,{\rm cond}}$| in (1.4) matches the condensation or Kauzmann phase transition predicted by physicists. This implies that the bound obtained in Theorem 1.1 is the best possible one that can be obtained via a second moment-type argument over a certain class of natural random variables (see Section 2.5 for details).

1.2 Related work

As witnessed by the notorious “four color problem” first posed by Guthrie in 1852, solved controversially by Appel and Haken in 1976 [9], and re-solved by Robertson et al. [40], the graph coloring problem has been a central subject in (discrete) mathematics for well over a century. Thus, it is unsurprising that the chromatic number problem on |$G(n,p)$| has received a big deal of attention since it was posed by Erdös and Rényi. Indeed, the problem has inspired the development of techniques that are by now widely used in various areas of mathematics, computer science, physics, and other disciplines.

For instance, pioneering the use of martingale tail bounds, Shamir and Spencer [41] proved concentration bounds for the chromatic number of |$G(n,p)$|⁠. Their result was enhanced first by Łuczak [33] and then by Alon and Krivelevich [8], who used the Lovász local lemma to prove that the chromatic number of |$G(n,p)$| is concentrated on two consecutive integers if |$p\ll n^{-1/2}$|⁠. In a breakthrough contribution, Bollobás [11] determined the asymptotics of the chromatic number of dense random graphs (i.e., |$G(n,p)$| with |$p>n^{-1/3}$|⁠). This result improved prior work by Matula [34], whose “merge-and-exposure” technique Łuczak built upon to obtain a similar result for sparser random graphs [32]. However, in the case that |$p=d/n$| for a fixed real |$d>0$|⁠, the setting originally studied by Erdös and Rényi, Łuczak's formula is far less precise than (1.1)–(1.2). For a comprehensive literature overview, see [12, 26].

The work of Achlioptas and Naor [6], which gave best prior result on the chromatic number of |$G(n,d/n)$|⁠, is based on the second moment method. Its use in the context of phase transitions in random discrete structures was pioneered by Achlioptas and Moore [5] and Frieze and Wormald [23]. The techniques of [6] have been used to prove several further important results. For instance, Achlioptas and Moore [4] identified three (and for some |$d$| just two) consecutive integers on which the chromatic number of the random |$d$|-regular is concentrated. This was reduced to two integers for all fixed of |$d$| (and one for about half of all |$d$|⁠) by adding in the small subgraph conditioning technique [27]. Recently, the methods developed in this work have been harnessed to improve this result further still [15]. Moreover, Dyer et al. [20] extended the second moment argument from [6] to the problem of |$k$|-coloring |$h$|-uniform random hypergraphs. We expect that our approach can be used to obtain improved results in the hypergraph case. Similarly, it should be possible to improve results of Dani et al. [19] on a “decorated” coloring problem.

In several problems, sophisticated applications of the second moment method gave bounds very close to the predictions made by the physicists' cavity method [35]. Examples where the physics predictions have (largely) been verified rigorously in this way include the hypergraph two-coloring problem [16, 18] and the random |$k$|-SAT problem [17]. But thus far, a general limitation of the rigorous proof techniques has been that they only apply to binary problems where there are only two values available for each variable. By contrast, in random graph coloring, each variable (vertex) has |$k$| values (colors) to choose from, where |$k$| can be arbitrarily large. As we will see in Section 2, the large number of available values complicates the problem dramatically. In effect, random graph coloring remained the last among the intensely studied benchmark problems in which there remained a very substantial gap between the physics predictions and the rigorous results, a situation rectified by this paper. Thus, we view this paper as an important step toward the long-term goal of providing a mathematical foundation for the cavity method.

In computer science, the algorithmic problem of finding a |$k$|-coloring of |$G(n,p)$| in polynomial time is a long-standing challenge, mentioned prominently in several influential survey articles [22, 28]. Simple greedy algorithms find a |$k$|-coloring for |$d\leq k\ln k\sim \frac 12d_{k-{\rm col}}$| w.h.p. [3, 25, 29], about half the |$k$|-colorability threshold. However, no efficient algorithm is known to beat the, in the words of Shamir and Spencer [41], “most vexing” factor of two. In fact, it has been suggested changes in the geometry of the set of |$k$|-colorings that occur at |$d\sim \frac 12d_{k-{\rm col}}$| cause the demise of local-search-based algorithms [1, 37]. Interestingly, some of the very phenomena that seem to make the algorithmic problem of coloring |$G(n,p)$| difficult will turn out to be extremely helpful in the construction of our random variable and thus in the proof of Theorem 1.1.

1.3 Notation and preliminaries

In addition to |$G(n,p)$|⁠, we consider the |$G(n,m)$| model, which is a random graph with vertex set |$V=\left \{{1,\ldots ,n}\right \}$| and exactly |$m$| edges, chosen uniformly at random among all such graphs. Working with |$G(n,m)$| facilitates the second moment argument, because the total number of edges is a deterministic quantity. Nonetheless, Lemma 2.1 shows that any results for |$G(n,m)$| with |$m=\lceil dn/2\rceil $| extend to |$G(n,d/n)$|⁠. Thus, throughout the paper, we always set |$m=\lceil dn/2\rceil $|⁠.

Since our goal is to establish a statement that holds with probability tending to 1 as |$n\rightarrow \infty $|⁠, we are always going to assume tacitly that the number |$n$| of vertices is sufficiently large for the various estimates to hold. Similarly, at the expense of the error term |$o_k(1)$| in Theorem 1.1, we will tacitly assume that |$k\geq k_0$| for a large enough constant |$k_0$|⁠.

We use the standard |$O$|-notation to refer to the limit |$n\rightarrow \infty $|⁠. Thus, |$f(n)=O(g(n))$| means that there exist |$C>0$|⁠, |$n_0>0$| such that for all |$n>n_0$| we have |$|f(n)|\leq C\cdot |g(n)|$|⁠. In addition, we use the standard symbols |$o(\cdot ),\Omega (\cdot ),\Theta (\cdot )$|⁠. In particular, |$o(1)$| stands for a term that tends to 0 as |$n\rightarrow \infty $|⁠. Furthermore, we write |$f(n)\sim g(n)$| if |$\lim _{n\rightarrow \infty }f(n)/g(n)=1$|⁠.

Additionally, we use asymptotic notation in the limit of large |$k$|⁠. To make this explicit, we insert |$k$| as an index. Thus, |$f(k)=O_k(g(k))$| means that there exist |$C>0$|⁠, |$k_0>0$| such that for all |$k>k_0$| we have |$|f(k)|\leq C\cdot |g(k)|$|⁠. Further, we write |$f(k)=\tilde O_k(g(k))$| to indicate that there exist |$C>0$|⁠, |$k_0>0$| such that for all |$k>k_0$| we have |$|f(k)|\leq (\ln k)^C\cdot |g(k)|$|⁠.

If |$G$| is a graph |$v$| is a vertex of |$G$|⁠, then we denote by |$N_G(v)$| the neighborhood of |$v$| in |$G$|⁠, that is, the set of all vertices |$w$| that are connected to |$v$| by an edge of |$G$|⁠. Where the graph |$G$| is apparent from the context, we just write |$N(v)$|⁠. If |$s\geq 1$| is an integer, we write |$[s]$| for the set |$\left \{{1,2,\ldots ,s}\right \}$|⁠. Moreover, throughout the paper, we use the conventions that |$0\ln 0=0$| and (consistently) that |$0\ln \frac 00=0$|⁠.

2 Outline

In this section, we first discuss the second moment method in general and the argument pursued in [6] specifically and investigate why it breaks down beyond the density |$d_{k,{\rm AN}}$| from (1.1). Then, we see how the physics intuition can be harnessed to overcome this barrier. Finally, we comment on the condensation phase transition.

2.1 The second moment method

Suppose that |$Z=Z($|G(n,m)|$)\geq 0$| is a random variable such that |$Z(G)>0$| implies that |$G$| is |$k$|-colorable. Moreover, suppose that there is a number |$C=C(d,k)>0$| that may depend on the average degree |$d$| and the number of colors |$k$| but not on |$n$| such that
(2.1)
Then the Paley–Zygmund inequality
(2.2)
implies that
This inequality yields a lower bound on the |$k$|-colorability threshold.  
Lemma 2.1 ([2])

If |$d>0$| is such that |$\liminf _{n\rightarrow \infty }{\rm P}\left [{G(n,m)\hbox { is} k\hbox {-colorable}}\right ]>0$|⁠, then |$\liminf _{n\rightarrow \infty }d_{k-{\rm col}}(n)\geq d.$|

Thus, in order to obtain a lower bound on |$d_{k-{\rm col}}$|⁠, we need to define an appropriate random variable |$Z$| and verify (2.1). Both these steps turn out to be non-trivial.

2.2 Balanced colorings and the Birkhoff polytope

The most obvious choice of random variable seems to be the total number |$Z_{k}$| of |$k$|-colorings of |$G(n,m)$|⁠. But to simplify the calculations, we confine ourselves to a particular type of colorings. Namely, a map |$\sigma :[n]\rightarrow [k]$| is balanced if |$||\sigma ^{-1}(i)|-\frac nk|\leq \sqrt n$| for |$i=1,\ldots ,k$|⁠. Let |$\mathcal {B}=\mathcal {B}_{n,k}$| denote the set of all balanced maps. Moreover, let |$Z_{k,{\rm bal}}$| be the number of balanced |$k$|-colorings of |$G(n,m)$|⁠. This is the random variable that Achlioptas and Naor [6] use. As it happens, (2.1) does not hold for either |$Z_{k}$| or |$Z_{k,{\rm bal}}$| in the entire range |$0<d<d_{k,{\rm cond}}$|⁠. We need to understand why.

To get started, we compute the first moment. By Stirling's formula the number of balanced maps is |$|\mathcal {B}|=\Theta (k^{n})$|⁠. Furthermore, for |$\sigma $| to be a |$k$|-coloring, the random graph |$G(n,m)$| must not contain any of the
“forbidden” edges that join two vertices with the same color under |$\sigma $|⁠. If |$\sigma $| is balanced, we easily check that
$\mathcal {F}(\sigma )=(1-1/k)\left (\begin {smallmatrix}n\\ 2\end {smallmatrix}\right )+O(n)$
. Thus, letting
$N=\left (\begin {smallmatrix}n\\ 2\end {smallmatrix}\right )$
and using Stirling's formula, we find that the probability that |$\sigma $| is a |$k$|-coloring of |$G(n,m)$| comes to
Hence, by the linearity of expectation,
(2.3)
Working out the second moment is not quite so easy. Since |${\rm E}[Z_{k,{\rm bal}}^2]$| is the expected number of pairs of balanced |$k$|-colorings, we need to compute the probability that |$\sigma ,\tau \in \mathcal {B}$|simultaneously happen to be |$k$|-colorings of |$G(n,m)$|⁠. Of course, this probability depends on how “similar” |$\sigma ,\tau $| are. To quantify this, we define the |$k\times k$|overlap matrix|$\rho (\sigma ,\tau )$| whose entries
(2.4)
represent the proportion of vertices with color |$i$| under |$\sigma $| and color |$j$| under |$\tau $|⁠.

While in binary problems the relevant overlap parameter is just a 1-dimensional (e.g., in random |$k$|-SAT, the Hamming distance of two truth assignments), what makes the |$k$|-colorability problem so difficult is the need to work with the high-dimensional overlap matrix.

The upshot is that |$\rho (\sigma ,\tau )$| contains all the information necessary to determine the probability that both |$\sigma ,\tau $| are |$k$|-colorings. In fact, let |$Z_{\rho ,{\rm bal}}$| be the number of pairs of balanced |$k$|-colorings with overlap |$\rho $|⁠, and let |${\mathcal R}$| denote the set of all possible overlap matrices of maps |$\sigma ,\tau \in \mathcal {B}$|⁠. For a |$k\times k$| matrix |$\rho ,$| we denote the Frobenius norm by
 
Fact 2.2 ([6])
Uniformly, for |$\rho \in {\mathcal R},$| we have
(2.5)
 
Proof
Since the function |$f$| turns out to be the key object in this paper, we include the simple proof to explain where it comes from combinatorially. By Stirling's formula, the total number of |$\sigma ,\tau \in \mathcal {B}$| with overlap |$\rho $| equals
(2.6)
Now, suppose that |$\sigma ,\tau $| have overlap |$\rho $|⁠. By inclusion/exclusion, the number of “forbidden” edges joining two vertices with the same color under either |$\sigma $| or |$\tau $| equals
Let
$N=\left (\begin {smallmatrix}n\\ 2\end {smallmatrix}\right )$
. Then Stirling's formula yields
(2.7)
The assertion follows from (2.6), (2.7) and the linearity of expectation.□
The bound (2.5) is essentially tight as similar calculations show that
(2.8)
Moreover, by the linearity of expectation, we can express the second moment as
(2.9)
As the total number of summands is |$\left |{\mathcal R}\right |\leq n^{k^2}$|⁠, we obtain from (2.8) and (2.9) that
(2.10)
Further, because we work with balanced colorings, the row and column sums of any |$\rho \in {\mathcal R}$| are |$1+O(n^{-\frac 12})$|⁠. Thus, let |$\mathcal D$| be the set of all doubly stochastic |$k\times k$| matrices, the Birkhoff polytope. Together with the continuity of |$f$| and the observation that |${\mathcal R}\cap \mathcal {D}$| becomes a dense subset of |$\mathcal {D}$| as |$n\rightarrow \infty $|⁠, (2.10) implies that
(2.11)

In summary, following [6], we have transformed the calculation of the second moment into the problem of optimizing |$f$| over the Birkhoff polytope |$\mathcal D$|⁠. Let |$\bar \rho $| be the matrix with all entries equal to |$\frac 1k$|⁠, the barycenter of |$\mathcal D$|⁠. A glimpse at (2.3) reveals that |$f(\bar \rho )\sim \frac 2n\ln {\rm E}\left [{Z_{k,{\rm bal}}}\right ]$| corresponds to the square of the first moment. Therefore, a necessary condition for the success of the second moment method is that the maximum (2.11) is attained at |$\bar \rho $|⁠. Indeed, if |$f(\rho )>f(\bar \rho )$| for some |$\rho \in \mathcal D$|⁠, then |${\rm E}[Z_{k,{\rm bal}}^2]$| exceeds |${\rm E}[Z_{k,{\rm bal}}]^2$| by an exponential factor |$\exp (\Omega (n))$|⁠. It is not difficult to show that this necessary condition is also sufficient. Combinatorially, the condition that |$\bar \rho $| is the maximizer of |$f$| indicates that pairs |$\sigma ,\tau $| that, judging by their overlap, look completely uncorrelated make up the lion's share of |${\rm E}[Z_{k,{\rm bal}}^2]$|⁠.

2.3 The singly stochastic bound

Yet solving the optimization problem (2.11) proves seriously difficult. Achlioptas and Naor resort to a relaxation: with |$\mathcal {S}\supset \mathcal {D},$| the set of all |$k\times k$|singly stochastic matrices, they study
(2.12)
Because |$\mathcal {S}$| is just a product of simplices, (2.12) turns out to be much more amenable than (2.11). Achlioptas and Naor solve (2.12) completely. More precisely, they optimize |$f$| over the sets |$\left \{{\rho \in \mathcal {S}:\|\rho \|_F=s}\right \}$| for each |$s$|⁠, that is, over the intersection of |$\mathcal {S}$| with a sphere. Their argument relies on the product structure of |$\mathcal {S}$| and a sophisticated global analysis (going to the sixth derivative). The result is that the maximum of (2.12) and therefore also of (2.11) is attained at the doubly stochastic |$\bar \rho $| for |$d\leq d_{k,{\rm AN}}$|⁠.

However, for |$d> d_{k,{\rm AN}}$|⁠, the maximum (2.12) is attained elsewhere. For instance, the matrix |$\rho _{{\rm half}}$| whose first |$k/2$| rows coincide with those of the identity matrix |${\rm id}$| (with ones on the diagonal and zeros elsewhere) and whose last |$k/2$| rows have all entries equal to |$1/k$| yields a larger function value than |$\bar \rho $| for |$d>d_{k,{\rm AN}}+o_k(1)$|⁠. Of course, this matrix fails to be doubly stochastic.

Hence, one might hope that |$\bar \rho $| remains the maximizer of (2.11) for |$d$| up to |$d_{k,{\rm cond}}$|⁠. That is, however, not the case. Indeed, consider the doubly stochastic
(2.13)
where |$\textbf {1}$| denotes the matrix with all entries equal to one. A simple calculation reveals that |$f(\rho _{{\rm stable}})>f(\bar \rho )$|⁠, and thus that the second moment argument for |$Z_{k,{\rm bal}}$| definitely fails, for |$d>d_{k,{\rm stable}}(2k-1)\ln k-(1+\ln 2)+o_k(1)$|⁠, that is, strictly below |$d_{k,{\rm cond}}$|⁠. Thus, for all we know, it might be possible that |${\rm E}[Z_{k,{\rm bal}}^2]\leq O({\rm E}[Z_{k,{\rm bal}}]^2)$| for |$d<d_{k,{\rm stable}}$|⁠, but definitely not beyond.

2.4 A physics-enhanced random variable

Therefore, to prove Theorem 1.1, we need to work with a different random variable. The key observation behind its definition is that the second moment (2.11) is driven up by certain “wild” |$k$|-colorings |$\sigma $|⁠. Their number behaves like a lottery: while the random graph typically has no wild coloring, a tiny fraction of graphs have an abundance, boosting the second moment. To avoid this heavy-tailed random variable, we define a notion of “tame” colorings. This induces a decomposition |$Z_{k,{\rm bal}}=Z_{k,{\rm tame}}+Z_{k,{\rm wild}}$| such that |${\rm E}\left [{Z_{k,{\rm tame}}}\right ]\sim {\rm E}\left [{Z_{k,{\rm bal}}}\right ]$|⁠. The second moment bound (2.1) turns out to hold for |$Z_{k,{\rm tame}}$| if |$d\leq d_{k,{\rm cond}}-o_k(1)$|⁠.

The notion of “tame” is inspired by statistical physics predictions on the geometry of the set of |$k$|-colorings. More precisely, according to the physicists' cavity method [30, 42], for |$(1+o_k(1))k\ln k<d<d_{k,{\rm cond}},$| the set of all |$k$|-colorings, viewed as a subset of |$[k^n]$|⁠, decomposes into “tiny clusters” that are “well-separated” from each other. Formally, we define the cluster of a balanced |$k$|-coloring |$\sigma $| of |$G(n,m)$| as the set
(2.14)
In words, |$\mathcal {C}(\sigma )$| contains all balanced |$k$|-colorings |$\tau $| where more than |$51\%$| of the vertices in each color class of |$\sigma $| retain their color. According to the cavity method, for |$d<d_{k,{\rm cond}},$| each cluster contains only an exponentially small fraction of all |$k$|-colorings of |$G(n,m)$| w.h.p. But for our purposes, it suffices to formalize “tiny” by just requiring that |$|\mathcal {C}(\sigma )|\leq {\rm E}[Z]_{k}$|⁠.
Furthermore, to formalize the notion that the clusters are “well-separated”, we call a balanced |$k$|-coloring |$\sigma $|separable if
(2.15)
In other words, the overlap matrix |$\rho (\sigma ,\tau )$| does not have entries in the interval |$(0.51,1\,{-}\,\kappa )$|⁠. Hence, if two color classes have an overlap of more than |$51\%$|⁠, then they must, in fact, be nearly identical. This definition ensures that the clusters of two separable colorings |$\sigma ,\tau $| are either disjoint or identical. We thus arrive at Definition 2.3.  
Definition 2.3

Let |$G$| be a graph with |$n$| vertices and |$m$| edges. A |$k$|-coloring |$\sigma $| of |$G$| is tame if

  • T1: |$\sigma $| is balanced,

  • T2: |$\sigma $| is separable, and

  • T3: |$\left |{\mathcal {C}(\sigma )}\right |\leq {\rm E}\left [{Z_{k,{\rm bal}}(G(n,m))}\right ]$|⁠.

 
Remark 2.4

The choice of the numbers |$0.51$| and |$\kappa =1-\ln ^{20}k/k$| is not entirely arbitrary. The |$0.51$| is motivated by the fact that the function |$f$| may have a stationary point if an entry of |$\rho $| is very close to |$1/2$| (cf. the proof of Lemma 4.13). Moreover, two colorings in a typical cluster coincide on about a |$1-O_k(1/k)$| fraction of the vertices and sacrificing a few log factors allows for a quick and easy bound on the cluster size (cf. Lemma 3.2).

In Section 3, we show that a typical |$k$|-coloring of |$G(n,m)$| is indeed tame, which implies that the expected number of tame |$k$|-colorings satisfies the following.  

Proposition 2.5
There exists a sequence |$\varepsilon _k\rightarrow 0$| such that for |$d=d_{k,{\rm cond}}-\varepsilon _k,$| we have

Thus, going from balanced to tame colorings has no discernible effect on the first moment, which remains exponentially large in |$n$| up to at least |$d= d_{k,{\rm cond}}-\varepsilon _k$|⁠.

Working with tame colorings has a substantial impact on the second moment. As before, computing the second moment boils down to a continuous optimization problem. But in comparison to (2.11), this problem is over a significantly reduced domain |$\mathcal D_{{\rm tame}}\subset \mathcal D$|⁠. Indeed, let us call a |$k\times k$|-matrix |$\rho $|separable if |$\rho _{ij}\not \in \left ({0.51,1-\kappa }\right )$| for all |$i,j\in [k]$|⁠. Furthermore, call |$\rho $|k-stable if for any |$i,$| there is |$j$| such that |$\rho _{ij}>0.51$|⁠. Let |$\mathcal D_{{\rm tame}}$| be the set of all |$\rho \in \mathcal D$| that are separable but not |$k$|-stable. In particular, the matrix |$\rho _{{\rm stable}}$| from (2.13) does not belong to |$\mathcal D_{{\rm tame}}$|⁠. Geometrically, one can think of |$\mathcal D_{{\rm tame}}$| as being obtained by cutting out (huge) cylinders from the Birkhoff polytope. In Section 4, we will see that the second moment calculation for |$Z_{k,{\rm tame}}$| boils down to showing that
(2.16)
is attained at |$\bar \rho $|⁠. Indeed, that (2.16) mirrors the second moment calculation seems reasonable: for any two tame colorings |$\sigma ,\tau $| the overlap matrix |$\rho (\sigma ,\tau )$| is separable by T2. Moreover, if |$\rho (\sigma ,\tau )$| is |$k$|-stable, then |$\tau \in \mathcal {C}(\sigma )$| by the very definition of |$\mathcal {C}(\sigma )$|⁠, and T3 provides an a priori bound on the number of such |$\tau $|⁠. (We admit that writing |$\mathcal D_{{\rm tame}}$| is a slight abuse of notation, but it is an intuitive one as (2.16) shows that controlling |$Z_{k,{\rm tame}}$| requires optimizing over |$\mathcal D_{{\rm tame}}$|⁠.)

Thus, in a sense the proof strategy that we pursue is the opposite of the one from [6]. While Achlioptas and Naor relax the optimization problem (by working with a rather significantly larger domain: singly rather than doubly stochastic matrices), here we restrict the domain by imposing further physics-inspired constraints. This approach, carried out in Section 4, yields  

Proposition 2.6

Let |$\varepsilon _k$| be the sequence from Proposition 2.5. For any sufficiently large |$k,$| there is a number |$C(k)>0$| such that for |$d=d_{k,{\rm cond}}-\varepsilon _k$| we have |$0<{\rm E}[Z_{k,{\rm tame}}^2]\leq C(k)\cdot {\rm E}[Z_{k,{\rm tame}}^2]$|⁠.

The proof of Proposition 2.6 essentially comes down to showing that the maximum (2.16) is attained at |$\bar \rho $|⁠. Even though we work with the reduced domain |$\mathcal D_{{\rm tame}}$|⁠, this is anything but straightforward. Indeed, to solve this analytical problem, we develop a novel local variations argument based on properties of the entropy function (among other things). We expect that this argument will prove useful to tackle many related optimization problems that come up in second moment arguments.

Finally, Theorem 1.1 is an immediate consequence of Proposition s 2.5 and 2.6 combined with Lemma 2.1.

2.5 The condensation phase transition

Finally, what would it take to close the (small) remaining gap between the new lower bound (1.4) on |$d_{k-{\rm col}}$| and the upper bound (1.3)? According to the physicists' cavity method, this gap is due to a further phase transition, the so-called condensation or Kauzmann transition, that occurs at |$d_{k,{\rm cond}}+o_k(1)$|⁠, that is, the lower bound established in Theorem 1.1. In fact, the existence and precise location of this phase transition (including the term hidden in the |$o_k(1)$|⁠) can be established rigorously via the methods developed in this paper. The details have been carried out in full in a follow-up article [10].

According to the cavity method [30], the geometry of the set of |$k$|-colorings changes significantly at |$d_{k,{\rm cond}}$|⁠. More precisely, for |$d<d_{k,{\rm cond}}-o_k(1),$| the set of |$k$|-colorings decomposes into clusters that each contain only an exponentially small fraction of all |$k$|-colorings of |$G(n,d/n)$| w.h.p. By contrast, for |$d>d_{k,{\rm cond}}+o_k(1)$|⁠, the size of the largest cluster is conjectured to contain a constant fraction of all |$k$|-colorings. As a result, two random |$k$|-colorings are heavily correlated, as there is a non-vanishing probability that they belong to the same cluster. This explains intuitively why the condensation threshold poses an obstacle to the second moment method, as we saw that a necessary condition for the success of the second moment method is that random pairs of |$k$|-colorings decorrelate.

More formally, we prove in [10] that for |$d>d_{k,{\rm cond}}+o_k(1)$| there does not exist a random variable |$Z=Z($|G(n,m)) with the following properties. First, |$Z(G)>0$| only if |$G$| is |$k$|-colorable. Second,
By contrast, Proposition s 2.5 and 2.6 show that |$Z_{k,{\rm tame}}$| has these two properties if |$d<d_{k,{\rm cond}}-o_k(1)$|⁠. Hence, in this sense, the approach (and random variable) put forward in this paper is best possible.

A refined version of the cavity method, the so-called 1-step replica symmetry breaking (“1RSB”) ansatz [30, 31, 39, 42], yields a precise prediction as to the value of |$d_{k-{\rm col}}=\lim _{n\rightarrow \infty }d_{k-{\rm col}}(n)$| (of course, the existence of the limit is taken for granted in the physics work). However, this prediction is not explicit; for instance, it involves the solution to a seriously complicated fixed point problem on the set of probability distributions on the |$k+1$|-simplex. Yet it is possible to obtain an expansion in the limit of large |$k$|⁠, according to which |$d_{k-{\rm col}}=2k\ln k-\ln k-1+o_k(1)$|⁠. Proving the 1RSB prediction for |$d_{k-{\rm col}}$| remains an open problem. In a very few binary problems, asymptotic versions of the 1RSB prediction have been proved rigorously [16]. However, it seems anything but straightforward to extend these arguments to the random graph coloring problem. That said, we expect that any attempt at determining |$d_{k-{\rm col}}$| precisely would have to build upon the insights gained in this paper and very possibly its techniques.

3 The First Moment

Throughout this section, we keep the assumptions of Proposition 2.5 and the notation introduced in Section 2.

Lemma 3.1 is the key step toward proving Proposition 2.5.  

Lemma 3.1
There exists a sequence |$\varepsilon _k\rightarrow 0$| such that for |$d=d_{k,{\rm cond}}-\varepsilon _k$| we have

In fact, once we have Lemma 3.1, Proposition 2.5 readily follows from the linearity of expectation, Bayes' formula and the formula (2.3) for |${\rm E}[Z_{k,{\rm bal}}]$|⁠.

To establish Lemma 3.1, we denote by |$G(n,m,\sigma )$| the random graph |$G(n,m)$| conditional on the event that |$\sigma \in \mathcal {B}$| is a |$k$|-coloring. Thus, |$G(n,m,\sigma )$| consists of |$m$| edges drawn uniformly at random without replacement out of those edges that are bichromatic under |$\sigma $|⁠. This probability distribution is also known as the “planted model”.

To establish the bound T3 on the cluster size, we show that w.h.p. |$G(n,m,\sigma )$| contains a vast “core” comprising of vertices that have several neighbors of each color other than their own that also belong to the core. Formally, if |$G=(V,E)$| is a graph on the vertex set |$V=\left \{{1,\ldots ,n}\right \}$| and |$\sigma \in \mathcal {B}$|⁠, we define the core of |$(G,\sigma )$| as the largest subset |$V'\subset V$| such that
(3.1)
The core is well defined: if |$V',V''$| satisfy (3.1), then so does |$V'\cup V''$|⁠. (Of course, the constant 100 is a bit arbitrary.)

As we will see, due to expansion properties, no vertex in the core of |$G(n,m,\sigma )$| can be recolored without leaving the cluster |$\mathcal {C}(\sigma )$| w.h.p. The basic reason is that recoloring any vertex |$v$| in the core sets off an avalanche of recolorings: to give |$v$| another color, we will have to recolor at least 100 vertices that also belong to the core, and so on.

In addition, if a vertex |$v$| outside the core is such that for each color other than its own, |$v$| has a neighbor in the core of that color, then it should also be impossible to recolor |$v$| without leaving |$\mathcal {C}(\sigma )$|⁠. For to assign |$v$| some color |$i\neq \sigma (v),$| we will have to recolor at least one vertex in the core. Guided by this observation, we call a vertex |$v$||$\sigma $|-complete, if for each color |$i\neq \sigma (v)$|⁠, |$v$| has a neighbor |$w$| in the core with |$\sigma (w)=i$|⁠.

If |$\sigma $|-complete vertices do not contribute to |$|\mathcal {C}(\sigma )|$|⁠, then the cluster size stems from recoloring vertices |$v$| that fail to have a neighbor in the core of some color |$i\neq \sigma (v)$|⁠. As we shall see, most of these vertices miss out on exactly one color |$i\neq \sigma (v)$| and hence have precisely two colors to choose from. Formally, we call a vertex |$v$||$a$|-free in |$(G,\sigma )$| if, with |$V'$| denoting the core, we have |$\left |{\left \{{i\in [k]:N(v)\cap V'\cap \sigma ^{-1}(i)=\emptyset }\right \}}\right |\geq a+1$|⁠.

Lemma 3.2 summarizes the expansion properties of |$G(n,m,\sigma )$| that the proof of Lemma 3.1 builds upon.  

Lemma 3.2

Let |$\sigma \in \mathcal {B}$| and assume that |$2k\ln k-\ln k-2\leq d\leq 2k\ln k$|⁠. Let |$V_i=\sigma ^{-1}(i)$| for |$i=1,\ldots ,k$|⁠. Then w.h.p. the random graph |$G(n,m,\sigma )$| has the following four properties.

  • P1: let |$i\in [k]$|⁠. For any subset |$S\subset V_i$| of size |$0.509\cdot \frac nk\leq |S|\leq (1-k^{-0.499})\frac nk$|⁠, the number of vertices |$v\in V\setminus V_i$| that do not have a neighbor in |$S$| is less than |$\frac nk-|S|-n^{2/3}$|⁠.

  • P2: let |$i\in [k]$|⁠. No more than |$\frac {\kappa n}{3k}$| vertices |$v\not \in V_i$| have less than 15 neighbors in |$V_i$|⁠, where |$\kappa =\ln ^{20}k/k$|⁠.

  • P3: there is no set |$S\subset V$| of size |$|S|\leq k^{-4/3}n$| that contains more than |$5|S|$| edges.

  • P4: at most |$\frac nk(1+\tilde O_k(1/k))$| vertices are 1-free, and at most |$\tilde O_k(k^{-2})n$| vertices are 2-free.

The proof of Lemma 3.2 is based on arguments that are, by now, fairly standard; in particular, the “core” has, tweaked in various ways, become a standard tool [1, 7, 13, 37]. For the sake of completeness, we give a full proof of Lemma 3.2 in Appendix. Here we proceed to show how Lemma 3.2 implies Lemma 3.1.  

Lemma 3.3

Assume that |$2k\ln k-\ln k-2\leq d\leq 2k\ln k$| and let |$\sigma \in \mathcal {B}$|⁠. Then |$\sigma $| is separable in |$G(n,m,\sigma )$| w.h.p.

 
Proof

By Lemma 3.2, we may assume that the random graph |$G(n,m,\sigma )$| has the properties P1–P3. Suppose that |$\tau \in \mathcal {B}$| is another |$k$|-coloring of this random graph and that |$i,j\in [k]$| are such that |$\rho _{ij}(\sigma ,\tau )\geq 0.51$|⁠. Our aim is to show that |$\rho _{ij}(\sigma ,\tau )>1-\kappa $|⁠. Without loss of generality, we may assume that |$i=j=1$|⁠.

Let |$R=\sigma ^{-1}(1)\setminus \tau ^{-1}(1)$|⁠, |$S=\tau ^{-1}(1)\cap \sigma ^{-1}(1)$| and |$T=\tau ^{-1}(1)\setminus \sigma ^{-1}(1)$|⁠. Because |$\tau $| is a |$k$|-coloring, none of the vertices in |$T$| has a neighbor in |$S$|⁠. Furthermore, because |$\tau $| is balanced we have |$|S\cup T|\geq \frac nk-\sqrt n$|⁠, and thus |$|T|\geq \frac nk-|S|-\sqrt n$|⁠. Since |$|S|=\frac nk\rho _{11}(\sigma ,\tau )>0.509\frac nk$|⁠, P1 implies that
(3.2)
Now, let |$U$| be the set of all |$v\in T$| that have at least 15 neighbors in |$\sigma ^{-1}(1)$|⁠. Then all of these neighbors lie in |$R$|⁠, because |$\tau $| is a |$k$|-coloring. Further, as |$\sigma ,\tau $| are asymptotically balanced we obtain from (3.2)
Hence, P3 applies to |$R\cup U$|⁠. By the definition of |$U$| and P3, the number |$e(R\cup U)$| of edges spanned by |$R\cup U$| satisfies
(3.3)
Let |$W=T\setminus U$|⁠. Because |$W$| consists of vertices with fewer than 15 neighbors in |$\sigma ^{-1}(1)$|⁠, P2 yields
(3.4)
Since |$\sigma ,\tau $| are balanced, we have
(3.5)
Hence, by (3.3) and (3.4)
(3.6)
Finally, (3.5) and (3.6) imply that |$\rho _{11}(\sigma ,\tau )=\frac kn\cdot |S|= 1+o(1)-\frac kn\cdot |R|>1-\kappa $|⁠, as desired.

As a next step, we are going to verify that the |$\sigma $|-complete vertices take the same color in all the colorings in |$\mathcal {C}(\sigma )$| w.h.p.; a similar argument was used in [1].  

Lemma 3.4

Assume that |$2k\ln k-\ln k-2\leq d\leq 2k\ln k$| and let |$\sigma \in \mathcal {B}$|⁠. W.h.p. the random graph |$G(n,m,\sigma )$| has the following property.

If |$\tau \in \mathcal {C}(\sigma )$|⁠, then for all |$\sigma $|-complete vertices |$v$| we have |$\sigma (v)=\tau (v)$| w.h.p.

 
Proof
By Lemma s 3.2 and 3.3, we may assume that P3 holds and that |$\sigma $| is separable in |$G(n,m,\sigma )$|⁠. Let |$V'$| be the core of this random graph. Moreover, set
(3.7)
The assumptions that |$\sigma $| is separable and that both |$\sigma ,\tau $| are asymptotically balanced imply that
(3.8)
We are going to show that
(3.9)
By construction, this implies that |$\sigma (v)=\tau (v)$| for all |$\sigma $|-complete vertices.

To establish (3.9), let |$S_i=\Delta _i^+\cup \Delta _i^-$| for |$i=1,\ldots ,k$|⁠. Because |$\Delta _i^+$| is contained in the core, each |$v\in \Delta _i^+$| has at least 100 neighbors in |$\sigma ^{-1}(i)$|⁠. Since |$\tau $| is a |$k$|-coloring, all of these neighbors lie in the set |$\Delta _i^-$|⁠. Hence, the number |$e(S_i)$| of edges spanned by |$S_i$| is at least |$100|\Delta _i^+|$|⁠. On the other hand, (3.8) implies that |$|S_i|\leq k^{-4/3}n$| for all |$i$|⁠. Therefore, P3 entails that |$e(S_i)\leq 5|S_i|$| for all |$i$|⁠. Thus, we obtain |$100|\Delta _i^+|\leq e(S_i)\leq 5|S_i|\leq 5(\left |{\Delta _i^+}\right |+\left |{\Delta _i^-}\right |)$|⁠. Consequently, |$|\Delta _i^-|\geq 2|\Delta _i^+|$| for all |$i$|⁠. Thus, (3.7) shows that |$\Delta _i^+=\Delta _i^-=\emptyset $| for all |$i$|⁠, whence (3.9) follows.

 
Proof of Lemma 3.1

Let |$\sigma \in \mathcal {B}$|⁠. We need to show that |$G(n,m,\sigma )$| enjoys the properties T2–T3 from Definition 2.3 w.h.p. The fact that T2 holds w.h.p. follows directly from Lemma 3.3.

With respect to T3, by Lemma 3.4 we may assume that for all |$\sigma $|-complete |$v$| and all |$\tau \in \mathcal {C}(\sigma )$| we have |$\tau (v)=\sigma (v)$|⁠. Let |$F_j$| be the set of |$j$|-free vertices for |$j=1,2$|⁠. By Lemma 3.2 we may assume that
(3.10)
By construction, for any vertex |$v\in F_1\setminus F_2,$| there is a set |$C_v\subset [k]$| of at most two colors such that |$\tau (v)\in C_v$| for all |$\tau \in \mathcal {C}(\sigma )$|⁠. Hence,
(3.11)
Combining (3.10) and (3.11), we see that w.h.p. in |$G(n,m,\sigma )$|⁠,
(3.12)
We need to compare the r.h.s. of (3.12) with |$\frac 1n\ln {\rm E}[Z]_{k,{\rm bal}}$|⁠. By (2.3) and Taylor expansion
Writing |$d=d_{k,{\rm cond}}-\varepsilon _k=2k\ln k-\ln k-2\ln 2-\varepsilon _k$|⁠, we obtain
(3.13)
Letting, say, |$\varepsilon _k=\Theta _k(k^{-1/2})$|⁠, we obtain from (3.12) and (3.13) that |$|\mathcal {C}(\sigma )|\leq {\rm E}[Z_{k,{\rm bal}}]$| w.h.p. Hence, T3 holds in |$G(n,m,\sigma )$| w.h.p.

Finally, upon direct inspection, we find |$f(\bar \rho )=2\ln k+d\ln (1-1/k)$|⁠. Thus, (3.13) shows that for |$d=d_{k,{\rm cond}}-\varepsilon _k=2k\ln k-\ln k-2\ln 2-\varepsilon _k$| we have |$k\cdot f(\bar \rho )=2\ln 2+o_k(1)>0$|⁠, as claimed.

4 The Second Moment

Throughout this section, we assume that |$d=d_{k,{\rm cond}}-\varepsilon _k$|⁠, with |$\varepsilon _k$| from Proposition 2.5. We continue to use the notation introduced in Section 2.

4.1 Overview

The goal is to prove Proposition 2.6. As we already hinted at in Section 2, this boils down to maximizing |$f(\rho )$| over |$\rho \in \mathcal D_{{\rm tame}}$|⁠. Formally, we have the following.  

Proposition 4.1

If |$f(\rho )<f(\bar \rho )$| for any |$\rho \in \mathcal D_{{\rm tame}}\setminus \left \{{\bar \rho }\right \}$|⁠, then |${\rm E}[Z_{k,{\rm tame}}^2]\leq O({\rm E}[Z_{k,{\rm tame}}]^2)$|⁠.

The proof of Proposition 4.1, based on the Laplace method, is a mere technical exercise, which we put off to Section 5.

Proposition 4.1 reduces the second moment argument to a problem in analysis. Indeed, neither the function |$f$| nor the domain |$\mathcal D_{{\rm tame}}$| over which we need to maximize are dependent on |$n$| (though both involve the parameters |$d$| and |$k$|⁠). In what follows, we aim to establish the following.  

Proposition 4.2

If |$\rho \in \mathcal D_{{\rm tame}}\setminus \left \{{\bar \rho }\right \}$|⁠, then |$f(\rho )<f(\bar \rho )$|⁠.

Thus, Proposition 2.6 is immediate from Proposition s 4.1 and 4.2.

The proof of Proposition 4.2 is the heart of the second moment argument. Of course, we need to take a closer look at the function |$f$|⁠. As we will see, it consists of two ingredients: an entropy term and a probability term. More specifically, suppose that |$p:\Omega \rightarrow \left [{0,1}\right ]$| is a probability distribution on a finite set |$\Omega $| (i.e., |$\sum _{x\in \Omega }p(x)=1$|⁠). Recalling our convention that |$0\ln 0=0$|⁠, we denote by
the entropy of |$p$|⁠. Since any |$\rho \in \mathcal {D}$| satisfies |$\sum _{i,j}\rho _{ij}=k$|⁠, we can view |$k^{-1}\rho $| as a probability distribution on |$[k]\times [k]$|⁠. Hence, we can write
Combinatorially, |$E(\rho )$| corresponds to the (logarithm of the) probability that |$\sigma ,\tau \in \mathcal {B}$| with overlap |$\rho $| simultaneously happen to be |$k$|-colorings, cf. the proof of Fact 2.2.

It is clear that the entropy is maximized at the barycenter |$\bar \rho $| of the Birkhoff polytope, because |$k^{-1}\bar \rho $| is the uniform distribution on |$[k]\times [k]$|⁠. Furthermore, among all the matrices |$\rho $| with non-negative entries that sum to |$k$|⁠, |$\bar \rho $| is the one that minimizes the Frobenius norm and hence |$E(\rho )$|⁠. This shows that |$\bar \rho $| is a stationary point of |$f(\rho )$|⁠. But how do we prove that |$\bar \rho $| is the global maximizer of |$f$|?

The domain |$\mathcal D_{{\rm tame}}$| admits a natural decomposition into several subsets. Let us call |$\rho \in \mathcal D$||$s$|-stable if the matrix has precisely |$s$| entries that are greater than |$0.51$|⁠. Let |$\mathcal D_{s,{\rm tame}}$| denote the set of all |$s$|-stable |$\rho \in \mathcal D_{{\rm tame}}$|⁠. Geometrically, any |$\rho \in \mathcal D_{s,{\rm tame}}$| is close to a |$k-s$|-dimensional face of the Birkhoff polytope. For if |$\rho $| has |$s$| entries greater than |$0.51$|⁠, then by separability these entries are in fact at least |$1-\kappa $| (with |$\kappa =\ln ^{20}k/k$| as in (2.15)). Hence, |$\rho $| is close to the face where these |$s$| entries are equal to 1. Indeed, as all other entries of |$\rho $| are smaller than |$0.51$|⁠, |$\rho $| is near a point “deep inside” that face. Consequently, for any |$1\leq s<k$| the set |$\mathcal D_{s,{\rm tame}}$| is disconnected: it consists of many tiny “splinters” near the |$k-s$|-dimensional faces of |$\mathcal D$|⁠. Each of these splinters can be mapped to the component where |$\rho _{11},\ldots ,\rho _{ss}>0.51$| by permuting the rows and columns suitably, which does not affect the function |$f$|⁠.

In the following, we are going to optimize |$f$| separately over |$\mathcal D_{s,{\rm tame}}$| for each |$0\leq s<k$|⁠. We are going to argue that for each |$s$|⁠, the point |$\bar \rho _{s{\rm -stable}}$| whose first |$s$| diagonal entries are 1 and whose |$(i,j)$|-entries are equal to |$(k-s)^{-1}$| for |$i,j>s$| comes close to maximizing |$f$| over |$\mathcal D_{s,{\rm tame}}$| (up to a negligible error term in each case). Geometrically, |$\bar \rho _{s{\rm -stable}}$| is the center of the face defined by |$\rho _{11}=\cdots =\rho _{ss}=1$|⁠. Furthermore, in the case |$s=0,$| we have |$\bar \rho _{s{\rm -stable}}=\bar \rho $|⁠, and we will see that the maximum over |$\mathcal D_{0,{\rm tame}}$| is attained at this very point.

We start by showing that we may confine ourselves to matrices without an entry in the interval |$(0.15,1-\kappa )$|⁠. Recall that |$\mathcal {S}$| is the set of all singly stochastic |$k\times k$|-matrices.  

Proposition 4.3

For all |$\rho \in \mathcal {S}$| such that |$\rho _{ij}\in \left [{0.15,0.51}\right ]$| for some |$(i,j)\in [k]\times [k],$| we have |$f(\rho )<0$|⁠.

We will see shortly how Proposition 4.3 implies that |$\bar \rho $| is the maximizer of |$f$| over |$D_{0,{\rm tame}}$|⁠. In addition, there are three different ranges of |$1\leq s<k$| that we deal with separately.  

Proposition 4.4

Suppose that |$1\leq s\leq k^{0.999}$|⁠. Then for all |$\rho \in \mathcal D_{s,{\rm tame}}$| we have |$f(\rho )<f(\bar \rho )$|⁠.

 
Proposition 4.5

Suppose that |$k^{0.999}<s<k-k^{0.49}$|⁠. Then for all |$\rho \in \mathcal D_{s,{\rm tame}}$| we have |$f(\rho )<f(\bar \rho )$|⁠.

 
Proposition 4.6

Suppose that |$k-k^{0.49}\leq s<k$|⁠. Then for all |$\rho \in \mathcal D_{s,{\rm tame}}$| we have |$f(\rho )\,{<}\,f(\bar \rho )$|⁠.

The proofs of Proposition s 4.3–4.5 are based on a local variations argument. Roughly speaking, we are going to argue that if |$\rho \in \mathcal D_{s,{\rm tame}}$| is “far” from |$\bar \rho _{s{\rm -stable}}$|⁠, then a higher function value can be attained by moving slightly in the direction of |$\bar \rho _{s{\rm -stable}}$|⁠. We expect that this argument can be adapted to perform second moment arguments in other problems in probabilistic combinatorics. Indeed, in such arguments, the function that needs to be optimized is typically similar in nature to our |$f$|⁠: an entropy term maximized at |$\bar \rho $| plus a probability term minimized at |$\bar \rho $|⁠.

More precisely, the following fact is the cornerstone of the local variations argument. Let |$\rho \in \mathcal {S}$|⁠, let |$i\in [k]$| be a row index, and let |$\emptyset \neq J\subset [k]$| be a set of column indices. Obtain |$\hat \rho \in \mathcal {S}$| from |$\rho $| by letting
(4.1)
That is, |$\hat \rho $| is obtained by redistributing in row |$i$| the total mass of the columns in |$J$| equally over these columns. Clearly, the entropy satisfies |$H(k^{-1}\hat \rho )\geq H(k^{-1}\rho )$|⁠. In fact, this inequality is strict unless |$\hat \rho =\rho $|⁠. However, it may well be that for the probability term we have |$E(\hat \rho )<E(\rho )$|⁠. Proposition 4.7 trades the increase in entropy against the drop in the probability term and shows that |$f(\hat \rho )\geq f(\rho )$| if |$J$| is “not too small” and |$\max _{j\in J}\rho _{ij}$| is “not too big”.  
Proposition 4.7

Suppose that |$\rho \in \mathcal {S}$|⁠. Let |$i\in [k]$| and |$J\subset [k]$| be such that for some number |$3\ln \ln k/\ln k\leq \lambda \leq 1$| we have |$|J|\geq k^\lambda $|⁠. Moreover, assume that |$\max _{j\in J}\rho _{ij}<\lambda /2-\ln \ln k/\ln k$|⁠. Then the matrix |$\hat \rho $| from (4.1) satisfies |$f(\hat \rho )\geq f(\rho )$|⁠. In fact, if |$\rho \neq \hat \rho $|⁠, then |$ f(\hat \rho )>f(\rho )$|⁠.

Let us illustrate the use of Proposition 4.7 by proving the following.  

Corollary 4.8

If |$\rho \in \mathcal D_{0,{\rm tame}}\setminus \left \{{\bar \rho }\right \}$|⁠, then |$f(\rho )<f(\bar \rho )$|⁠.

 
Proof
Let |$\rho \in \mathcal D_{0,{\rm tame}}$|⁠. Then |$\rho _{ij}\leq 0.51$| for all |$i,j$| (as |$\rho $| is 0-stable). In fact, if there are |$i,j$| such that |$\rho _{ij}>0.15$|⁠, then Proposition 4.3 implies that |$f(\rho )<0$|⁠, while |$f(\bar \rho )>0$| by Proposition 2.5. Hence, we may assume that |$\rho _{ij}\leq 0.15$| for all |$i,j$|⁠. Let |$\rho [l]$| be the matrix whose first |$l$| rows are identical to those of |$\bar \rho $|⁠, and whose last |$k-l$| rows are identical to those of |$\rho $|⁠. Thus, |$\rho [0]=\rho $| and |$\rho [k]=\bar \rho $|⁠. We claim that
(4.2)
To obtain (4.2), we apply Proposition 4.7 to the |$i$|th row of |$\rho [i-1]$| with |$J=[k]$| and |$\lambda =1$|⁠. This is possible because |$\max _{j}\rho _{ij}[i-1]=\max _{j}\rho _{ij}\leq 0.15$|⁠. The resulting matrix |$\hat \rho $| is precisely |$\rho [i]$|⁠. Thus, (4.2) follows from Proposition 4.7. Indeed, Proposition 4.7 shows that one of the inequalities (4.2) is strict (as |$\rho \neq \bar \rho $|⁠). Hence, |$f(\rho )<f(\bar \rho )$|⁠.

Proposition 4.2 is immediate from Proposition s 4.4–4.6 and Corollary 4.8. Thus, we are left to prove Proposition s 4.3–4.7. In Section 4.3, we prove Proposition 4.7. Building upon that estimate, we then proceed to prove Proposition s 4.3–4.6. But before we start, we introduce a few pieces of notation and some basic facts.

4.2 Preliminaries

For |$x\in \mathbb {R},$| we denote by |${\rm sign}(x)\in \left \{{-1,0,1}\right \}$| the sign of |$x$|⁠. Moreover, if |$\rho $| is matrix, then |$\rho _i$| denotes the |$i$|th row of |$\rho $| and |$\rho _{ij}$| the |$j$|th entry of |$\rho _i$|⁠. We let |$\left \|{\rho }\right \|_\infty =\max _{i,j}|\rho _{ij}|$|⁠. Further,
denotes the entropy function. We recall the elementary inequality |$h(z)\leq z(1-\ln z)$|⁠. In addition, we note that
(4.3)
Indeed, we have |$h(z)-z\ln k\leq z(1-\ln z-\ln k)$| and differentiating twice, we see that |$z\mapsto z(1-\ln z-\ln k)$| takes its global maximum |$1/k$| at |$z=1/k$|⁠.

We need the following well-known fact about the entropy.  

Fact 4.9

Let |$p\in \left [{0,1}\right ]^k$| be such that |$\sum _{i=1}^kp_i=1$|⁠. Then |$H(p)\geq 0$| and the following two statements hold.

  • H1: If |$p$| is supported on a set of size |$s$|⁠, then |$H(p)\leq \ln s$|⁠.

  • H2: Let |$\mathcal {I}\subset [k]$| and suppose that |$q=\sum _{i\in \mathcal {I}}p_{i}\in (0,1)$|⁠. Let |$p^\mathcal {I}$| be the vector with entries
    Then |$H(p)=h(q)+qH(q^{-1}p^\mathcal {I})+(1-q)H((1-q)^{-1}(p-p^\mathcal {I}))$|⁠.

As an immediate consequence of Fact 4.9, we have the following.  

Corollary 4.10

Let |$p\in \left [{0,1}\right ]^k$| be such that |$\sum _{i=1}^kp_i=1$|⁠.

  • Let |$\mathcal {I}\subset [k]$| and set |$q=\sum _{i\in \mathcal {I}}p_{i}$|⁠. Then |$H(p)\leq h(q)+q\ln |\mathcal {I}|+(1-q)\ln (k-|\mathcal {I}|)$|⁠.

  • Let |$\mathcal {I}\subset \left \{{2,\ldots ,k}\right \}$| be a set of size |$0<|\mathcal {I}|<k-1$|⁠. Set |$q=\sum _{i\in \mathcal {I}}p_{i}$|⁠. If |$p_1<1$|⁠, then

 
Proof

The first claim follows simply by first using H2 and then applying H1 to |$q^{-1}p^\mathcal {I}$| and |$(1-q)^{-1}(p-p^\mathcal {I})$|⁠. To obtain the second assertion, use H2 with |$\mathcal {I}=\left \{{1}\right \}$| and then apply (i) to the probability distribution |$q^{-1}p^{\mathcal {I}}$|⁠.

Let |$\rho \in \mathcal {S}$| be a singly stochastic matrix. We can view each row |$\rho _i$| as a probability distribution on |$[k]$|⁠. With this interpretation, we see that
(4.4)
To facilitate the following calculations, we note that
(4.5)
Moreover, differentiating |$E(\rho )$| by |$y=\|\rho \|_F^2$| and recalling that |$d=2k\ln k+O_k(\ln k)$|⁠, we obtain
(4.6)
Further, using the expansion |$\ln (1+z)=z-z^2/2+O(z^3)$|⁠, we obtain the approximation
(4.7)
Finally, we calculate the function values |$f(\bar \rho _{s{\rm -stable}})$| explicitly; recall that |$\bar \rho _{s{\rm -stable}}$| is the barycenter of the face of |$\mathcal D$| defined by the equations |$\rho _{11}=\cdots =\rho _{ss}=1$|⁠. Let |$1\leq s\leq k-1$|⁠. The first |$s$| rows of |$\bar \rho _{s{\rm -stable}}$| have entropy 0, while the last |$k-s$| rows have entropy |$\ln (k-s)$|⁠. Hence, (4.4) yields
(4.8)
Moreover, |$\left \|{\bar \rho _{s{\rm -stable}}}\right \|_F^2=s+1$|⁠. Thus, using (4.7) and plugging in |$d=2k\ln k-\ln k-2\ln 2-o_k(1)$|⁠, we get
(4.9)
Since |$f(\rho )=H(k^{-1}\rho )+E(\rho )$|⁠, (4.8) and (4.9) yield
(4.10)

4.3 Proof of Proposition 4.7

We pursue the following strategy. Suppose that |$a,b\in J$| are such that |$\rho _{ia}=\min _{j\in J}\rho _{ij}$| and |$\rho _{ib}=\max _{j\in J}\rho _{ij}$|⁠. If |$\rho _{ia}=\rho _{ib}$|⁠, then |$\rho =\hat \rho $| and there is nothing to prove. Otherwise, we are going to argue that increasing |$\rho _{ia}$| slightly at the expense of |$\rho _{ib}$| yields a matrix |$\rho '$| with |$f(\rho ')>f(\rho )$|⁠. We start by calculating the partial derivatives of |$f$|⁠.  

Lemma 4.11
Let |$\rho \in \mathcal {S}$|⁠. Let |$i,j,l\in [k]$| and set |$\delta =\rho _{il}-\rho _{ij}$|⁠. Suppose that |$\rho _{ij},\rho _{il}>0$|⁠. Then
(4.11)
 
Proof
Using (4.5) and (4.6) and the chain rule, we obtain
Substituting |$\delta =\rho _{il}-\rho _{ij}$|⁠, we find
Taking exponentials completes the proof.

As a next step, we take a closer look at the right-hand side of (4.11).  

Lemma 4.12

Let |$\rho \in \mathcal {S}$|⁠, let |$i,j\in [k]$| and assume that |$\rho _{ij}>0$|⁠.

  1. If
    (4.12)
    then there exists a unique |$\delta ^*>0$| such that
    Furthermore, for all |$0<\delta <\delta ^*,$| we have |$1+\frac {\delta }{\rho _{ij}}- \exp \left [{\frac {d}{k-2+\frac 1{k}\left \|{\rho }\right \|_F^2}\cdot \delta }\right ]>0$|⁠.
  2. If (4.12) does not hold, then for all |$\delta >0$| we have |$1+\frac {\delta }{\rho _{ij}}< \exp \left [\!{\frac {d}{k-2+\frac 1{k}\left \|{\rho }\right \|_F^2}\cdot \delta }\!\right ]$|⁠.

 
Proof
There is at most one |$\delta ^*>0$| where the straight line |$\delta \mapsto 1+\frac {\delta }{\rho _{ij}}$| intersects the strictly convex function
In fact, there is exactly one such |$\delta ^*$| iff the differential of the linear function is greater than that of the exponential function at |$\delta =0$|⁠, which occurs iff (4.12) holds.
 
Proof of Proposition 4.7

If |$\rho _{ij}=0$| for all |$j\in J$|⁠, then |$\hat \rho =\rho $| and there is nothing to show. Thus, assume that |$\sum _{j\in J}\rho _{ij}>0$|⁠. Suppose that |$\tilde \rho \in \mathcal {S}$| maximizes |$f(\tilde \rho )$| subject to the conditions

  • |$\tilde \rho _{ab}=\rho _{ab}$| for all |$(a,b)\not \in \{i\}\times J$| and

  • |$\max _{j\in J}\tilde \rho _{ij}\leq \max _{j\in J}\rho _{ij}$|⁠.

Such a maximizer |$\tilde \rho $| exists because (i)–(ii) define a compact domain. Because |$\tilde \rho \in \mathcal {S},$| we have
(4.13)

We claim that |$\tilde \rho _{ij}>0$| for all |$j\in J$|⁠. Indeed, assume that |$\tilde \rho _{ij}=0$| for |$j\in J$| but |$\tilde \rho _{il}>0$| for some other |$l\in J$|⁠. We recall that |$f(\rho )=H(k^{-1}\rho )+E(\rho )$|⁠. As (4.5) and (4.6) show, |$\partial H(k^{-1}\rho )/\partial \rho _{ij}$| tends to infinity as |$\rho _{ij}$| approaches 0, while |$|\partial E(\rho )/\partial \rho _{ij}|$| remains bounded. Hence, there is |$\xi >0$| such that the matrix |$\rho '$| obtained from |$\tilde \rho $| by replacing |$\tilde \rho _{ij}$| by |$\xi $| and |$\tilde \rho _{il}$| by |$\tilde \rho _{il}-\xi $| satisfied |$f(\rho ')>f(\tilde \rho )$|⁠, in contradiction to the maximality of |$f(\tilde \rho )$|⁠.

Thus, let |$a$| be such that |$\tilde \rho _{ia}=\min _{j\in J}\tilde \rho _{ij}>0$|⁠. Because |$\tilde \rho $| is stochastic, we have |$\left \|{\tilde \rho }\right \|_F^2\in \left [{1,k}\right ]$| and |$|J|\tilde \rho _{ia}\leq \sum _{j\in J}\tilde \rho _{ij}\leq 1$|⁠. Therefore, our assumptions |$\lambda \geq 3\ln \ln k/\ln k$| and |$d\leq 2k\ln k$| imply that
(4.14)
Thus, (4.12) is satisfied. Further, setting |$\hat \delta =\lambda /2-\ln \ln k/\ln k$|⁠, we find
(4.15)
Now, let |$b\in J$| be such that |$\tilde \rho _{ib}=\max _{j\in J}\tilde \rho _{ij}$| and assume that |$\delta =\tilde \rho _{ib}-\tilde \rho _{ia}>0$|⁠. Moreover, recall that we are assuming that |$\tilde \rho _{ib}\leq \max _{j\in J}\rho _{ij}\leq \hat \delta $|⁠. Since |$\delta \leq \tilde \rho _{ib}\leq \hat \delta $|⁠, (4.14) and (4.15) yield in combination with Lemma s 4.11 and 4.12 that
Hence, there is |$\xi >0$| such that the matrix |$\rho '$| obtained from |$\tilde \rho $| by increasing |$\tilde \rho _{ia}$| by |$\xi $| and decreasing |$\tilde \rho _{ib}$| by |$\xi $| satisfies |$f(\rho ')>f(\tilde \rho )$|⁠. But this contradicts the maximality of |$f(\tilde \rho )$| subject to (i)–(ii). Thus, we conclude that |$\min _{j\in J}\tilde \rho _{ij}=\tilde \rho _{ia}=\tilde \rho _{ib}=\max _{j\in J}\tilde \rho _{ib}$|⁠. Therefore, (4.13) implies that |$\tilde \rho =\hat \rho $| is the unique maximizer of |$f$| subject to (i)–(ii).

4.4 Proof of Proposition 4.3

To proof is based on two key lemmas. The first one rules out that |$f(\rho )$| takes its maximum over |$\rho \in \mathcal {S}$| at a matrix with an entry close to |$1/2$|⁠.  

Lemma 4.13

If |$\rho \in \mathcal {S}$| has an entry |$\rho _{ij}\in \left [{0.49,0.51}\right ]$|⁠, then there is |$\rho ' \in \mathcal {S}$| such that |$f(\rho ')\geq f(\rho )+\frac {\ln k}{5k}$|⁠.

 
Proof

Without loss of generality, we may assume that |$(i,j)=(1,1)$| and that |$\rho \in \mathcal {S}$| maximizes |$f$| subject to the condition that |$\rho _{11}\in \left [{0.49,0.51}\right ]$|⁠. There are two cases.

Case 1: |$\rho _{1j}<0.49$| for all |$j\geq 2$|⁠: Applying Proposition 4.7 to the set |$J=\left \{{2,\ldots ,k}\right \}$| (with |$\lambda =\frac {\ln (k-1)}{\ln k}$|⁠), we see that |$\rho _{1j}=\frac {1-\rho _{11}}{k-1}$| for all |$j\geq 2$|⁠, due to the maximality of |$f(\rho )$|⁠. Hence, Corollary 4.10 yields
(4.16)
Moreover, because |$\rho _{11}\leq 0.51$| we have
(4.17)
Let |$\rho '$| be the matrix obtained from |$\rho $| by replacing the first row by |$(1,0,\ldots ,0)$|⁠. Since |$H(1,0,\ldots ,0)=0$|⁠, (4.4) and (4.16) yield
(4.18)
Furthermore, (4.17) entails |$\left \|{\rho }\right \|_F^2-\left \|{\rho '}\right \|_F^2\leq \left \|{\rho _1}\right \|_2^2-1\leq -0.739$|⁠. Hence, (4.6) yields
(4.19)
Combining (4.18) and (4.19), we obtain |$f(\rho )-f(\rho ')\leq \frac 1k\left [{\ln 2-0.22\ln k}\right ]\leq -\frac {\ln k}{5k}$|⁠.
Case 2: there is |$j\geq 2$| such that |$\rho _{1j}>0.49$|⁠: we may assume that |$j=2$|⁠. Because |$\sum _{j}\rho _{1j}=1$|⁠, we see that |$\max _{j\geq 3}\rho _{1j}\leq 0.02$|⁠. Hence, we can apply Proposition 4.7 to |$J=\left \{{3,\ldots ,k}\right \}$| (with, say, |$\lambda =1/2$|⁠). Due to the maximality of |$f(\rho )$|⁠, we obtain |$\rho _{1j}=(1-\rho _{11}-\rho _{12})/(k-2)$| for all |$j\geq 3$|⁠. Hence, Corollary 4.10 yields
(4.20)
Further, because |$\rho _{11}^2+\rho _{12}^2\leq 0.51^2+0.49^2$| as |$\rho _{11},\rho _{12}\in \left [{0.49,0.51}\right ]$| and |$\rho _{11}+\rho _{12}\leq 1$|⁠, we see that
(4.21)
As in the first case, obtain |$\rho '$| from |$\rho $| by replacing the first row by |$(1,0,\ldots ,0)$|⁠. From (4.21), we obtain |$\left \|{\rho }\right \|_F^2-\left \|{\rho '}\right \|_F^2\leq 0.501-1=-0.499$|⁠. Hence, (4.6) yields
(4.22)
Combining (4.20) and (4.22), we find
Hence, in either case, we obtain the desired bound.

The second key ingredient is  

Lemma 4.14

We have |$\max _{\rho \in \mathcal {S}} f(\rho )\leq \frac {\ln k}{8k}+O_k(1/k)$|⁠.

The proof of Lemma 4.14 requires two intermediate steps. We start with the following exercise in calculus.  

Lemma 4.15
Let |$\xi :b\in (0,k/2)\mapsto k^{2b/k}(b^{-1}-k^{-1})$|⁠. Let |$\mu =\frac k2(1-\sqrt {1-2/\ln k})$|⁠. Then |$\xi $| is decreasing on the interval |$(0,\mu )$| and increasing on |$(\mu ,k/2)$|⁠. Furthermore, we have
(4.23)
 
Proof
The derivatives of |$\xi $| are

The first derivative vanishes at the two points |$b=\frac k2(1\pm \sqrt {1-2/\ln k})$| only. Moreover, an elementary calculation shows that |$\mu =\frac k2(1-\sqrt {1-2/\ln k})$| is a local minimum, while |$\frac k2(1+\sqrt {1-2/\ln k})>k/2$| is a local maximum. Hence, |$\xi $| is decreasing on the interval |$(0,\mu )$| and increasing on |$(\mu ,k/2)$|⁠. The last assertion follows by direct inspection of the above expression for |$\xi '$|⁠.

 
Lemma 4.16

Let |$\rho \in \mathcal {S}$|⁠. Suppose that |$i\in [k]$| is such that |$\rho _{ij}\not \in [0.49,0.51]$| for all |$j\in [k]$|⁠.

  1. Suppose that |$\rho _{ij}\leq 0.49$| for all |$j\in [k]$|⁠. Let |$\rho '$| be the stochastic matrix with entries
    Then |$f(\rho )\leq f(\rho ')$|⁠.
  2. Suppose that |$\rho _{ij}\geq 0.51$| for some |$j\in [k]$|⁠. Then there is a number |$\alpha =1/k+\tilde O_k(1/k^2)$| such that for the stochastic matrix |$\rho ''$| with entries
    we have |$f(\rho )\leq f(\rho '')$|⁠.

 
Proof

To obtain the first assertion, we simply apply Proposition 4.7 to row |$i$| and |$J=[k]$| (with |$\lambda =1$|⁠). With respect to the second claim, we may assume without loss that |$i=j=1$| and |$\rho _{11}\geq 0.51$|⁠. Let |$\hat \rho \in \mathcal {S}$| be the matrix that maximizes |$f$| subject to the conditions

  • |$\hat \rho _{11}\geq 0.51$|⁠.

  • |$\hat \rho _a=\rho _a$| for all |$a\in \left \{{2,\ldots ,k}\right \}$|⁠. (In words, the last |$k-1$| rows of |$\hat \rho $| and |$\rho $| coincide.)

Since |$\hat \rho _{1j}\leq 1-\hat \rho _{11}\leq 0.49$| for all |$j\geq 2$|⁠, Proposition 4.7 applies to |$J=\left \{{2,\ldots ,k}\right \}$| (with |$\lambda =\frac {\ln (k-1)}{\ln k}$|⁠) and yields
(4.24)
Let |$\delta =\hat \rho _{11}-\hat \rho _{12}$|⁠, let |$0\leq \beta \leq 0.49k$| be such that |$\hat \rho _{11}=1-\beta /k$| and let |$Q=1-1/k+\|\hat \rho \|_2^2/k^2$|⁠.
Because |$\hat \rho $| is the maximizer of |$f$| subject to i. and ii., Lemma 4.11 implies that
(4.25)
We are going to argue that (4.25) entails that |$\beta =1+\tilde O_k(1/k)$|⁠.

First, we observe that |$\beta >0$|⁠. For (4.5), it shows that the derivative |$\partial H(\rho _1)/\partial \rho _{11}$| of the entropy of row |$\rho _1$| tends to |$-\infty $| as |$\rho _{11}$| approaches 1, while (4.6) implies that the derivative |$\partial E(\rho )/\partial \rho _{11}$| remains bounded in absolute value. Hence, the maximality of |$f(\rho )$| implies that |$\beta >0$|⁠.

Further, since |$\|\hat \rho \|_2^2\in \left [{1,k}\right ]$|⁠, we have |$Q\geq (1-1/k)^2$|⁠. Moreover, (4.24) implies that |$\delta =\hat \rho _{11}-O_k(1/k)$|⁠. Therefore, recalling that |$d=2k\ln k+O_k(\ln k)$|⁠, we obtain
Thus, with |$\xi (b)=k^{2b/k}(b^{-1}-k^{-1})$| the function from Lemma 4.15, we see that for a certain |$\eta =O_k(\ln k/k)$|⁠,
(4.26)
Let |$\mu =\frac k2(1-\sqrt {1-2/\ln k})=(1+o_k(1))\frac {k}{2\ln k}$|⁠. By Lemma 4.15, |$\xi $| is decreasing on |$(0,\mu )$|⁠. Moreover, |$\xi '(b)$| is negative and bounded away from 0 for |$b$| close to 1. Hence, setting |$\gamma =\ln ^2 k/k$|⁠, we find
In addition, |$\xi $| is increasing on |$(\mu ,k/2)$|⁠. Thus,
Plugging these two bounds into (4.26), we get
(4.27)
Similarly, because |$\mu $| is the unique local minimum of |$\xi $|⁠, we have
Hence, (4.26) yields
(4.28)

Since we already know that |$\beta >0$|⁠, (4.25), (4.27) and (4.28) imply |$\beta \in [1-\gamma ,1+\gamma ]$|⁠. Thus, |$\beta =1+\tilde O_k(1/k)$| and consequently |$\hat \rho _{11}=1-\beta /k=1-1/k+\tilde O_k(k^{-2})$|⁠, as desired.

 
Proof of Lemma 4.14
Lemma 4.13 implies that |$\max _{\rho \in \mathcal {S}}f(\rho )$| is attained at a matrix |$\rho $| without entries in |$\left [{0.49,0.51}\right ]$|⁠. Therefore, Lemma 4.16 shows that the maximizer |$\rho $| has the following form for some integer |$0\leq s\leq k$| and certain |$\alpha _i=1/k+\tilde O_k(1/k^2)$|⁠:
(4.29)
Thus, for |$i\in [s]$| we have
(4.30)
(4.31)
Let |$\rho '$| be the matrix obtained from |$\rho $| by replacing the first |$s$| rows by |$(1,0,\ldots ,0)$|⁠. This matrix satisfies
(4.32)
Set |$\alpha =\frac 1s\sum _{i=1}^s\alpha _i=\frac 1k+\tilde O_k(k^{-2})$|⁠. Then (4.4), (4.30)–(4.32) and the concavity of |$h$| imply that
(4.33)
(4.34)
Plugging (4.34) into (4.6), we obtain
(4.35)
Combining (4.33) and (4.35) and recalling that |$\alpha =1/k+\tilde O_k(1/k^2)$|⁠, we see that
(4.36)
To complete the proof, we calculate |$f(\rho ')$|⁠. Recall that |$d = 2k \ln k - \ln k - c$| with |$c = 2 \ln 2 + o_k(1)$|⁠. Moreover, (4.32) shows that |$\left \|{\rho '_i}\right \|_2^2=1$| for |$i=1,\ldots ,s$|⁠. In addition, since |$\rho _{ij}'=1/k$| for all |$i>s$|⁠, |$j\in [k]$|⁠, we get |$\left \|{\rho _i'}\right \|_2^2=1/k$| for |$i>s$|⁠. Hence, |$\left \|{\rho '}\right \|_F^2=1+(1-1/k)s$|⁠. Thus, using (4.7) and performing an elementary calculation, we get
Further, |$H(\rho _i')=0$| for |$i\leq s$|⁠, while |$H(\rho _i')=\ln k$| for |$i>s$|⁠. Hence, (4.4) yields |$H(k^{-1}\rho ')=\ln k+(1-s/k)\ln k=2\ln k-\frac sk\ln k$|⁠. Thus,
(4.37)
Finally, combining (4.36) and (4.37), we see that |$f(\rho )\leq \frac sk(1-s/k)\cdot \frac {\ln k}{2k}+O_k(1/k)\leq \frac {\ln k}{8k}+O_k(1/k)$|⁠, as claimed.
 
Proof of Proposition 4.3
Suppose that |$\rho \in \mathcal {S}$| has an entry |$\rho _{ij}\in \left [{0.49,0.51}\right ]$|⁠. We claim that |$f(\rho )<0$|⁠. Indeed, by Lemma s 4.13 and 4.14

Now, suppose that |$\rho \in \mathcal {S}$| has a row |$i$| such that |$\max _{j\in [k]}\rho _{ij}\in \left [{0.15,0.49}\right ]$|⁠. Without loss of generality, we may assume |$i=1$| and |$\rho _{11}=\max _{j\in [k]}\rho _{ij}$|⁠. In fact, we may assume that |$\rho $| is the maximizer of |$f$| subject to the condition |$\rho _{11}=\max _j\rho _{1j}\in \left [{0.15,0.49}\right ]$|⁠. Again, we show that |$f(\rho )<0$|⁠.

What can we say about this maximizer |$\rho $|? We apply Proposition 4.7 to |$i=1$| and |$J=\left \{{2,\ldots ,k}\right \}$|⁠: if we let |$\lambda =\ln (k-1)/\ln k$|⁠, then |$|J|=k-1\geq k^{\lambda }$|⁠. Moreover, |$\rho _{1j}\leq 0.49<\lambda /2-10/\ln k$| for all |$j\in J$|⁠. Hence, Proposition 4.7 implies that
(4.38)
Thus, Corollary 4.10 shows that the entropy of |$\rho _1$| is
By comparison, let |$\hat \rho $| be the matrix obtained from |$\rho $| by replacing the first row by |$\frac 1k\textbf {1}$|⁠. Then |$H(\hat \rho _1)=\ln k$|⁠. Therefore, (4.4) yields
(4.39)
Moreover, (4.38) yields |$\left \|{\rho _{1}}\right \|_2^2=\rho _{11}^2+(1-\rho _{11})^2/(k-1)$| and |$\left \|{\hat \rho _{1}}\right \|_2^2=1/k$|⁠, whence
Hence, (4.6) implies |$E(\rho )-E(\hat \rho )\leq \rho _{11}^2\frac {\ln k}{k}+\tilde O_k(1/k^2)$|⁠. Combining this estimate with (4.39), we get
(4.40)
Since |$f(\hat \rho )\leq \frac {\ln k}{8k}+O_k(1/k)$| by Lemma 4.14, we obtain from (4.40)
The assertion follows because |$\rho _{11}(1-\rho _{11})>1/8$| for |$\rho _{11}\in \left [{0.15,0.49}\right ]$|⁠.

4.5 Proof of Proposition 4.4

Let |$1\leq s\leq k^{0.999}$| and let |$\rho \in \mathcal D_{s,{\rm tame}}$| be the maximizer of |$f$|⁠. Without loss of generality, we may assume that |$\rho _{ii}\geq 0.51$| for |$i=1,\ldots ,s$| and |$f(\rho _{ij})<0.51$| for all |$(i,j)\not \in \left \{{(1,1),\ldots ,(s,s)}\right \}$|⁠. Because |$\rho $| is separable, this implies that in fact |$\rho _{ii}\geq 1-\kappa $| for |$i=1,\ldots ,s$|⁠, with |$\kappa =\ln ^{20}k/k$| as in (2.15). Furthermore, if there is a pair |$(i,j)\not \in \left \{{(1,1),\ldots ,(s,s)}\right \}$| such that |$\rho _{ij}\geq 0.15$|⁠, then Proposition 4.3 implies that |$f(\rho )<0$|⁠. In this case, we are done, because |$f(\bar \rho )>0$| by Proposition 2.5. Thus, assume from now on that |$\rho _{ij}<0.15$| for all |$(i,j)\not \in \left \{{(1,1),\ldots ,(s,s)}\right \}$|⁠.

Let |$\hat \rho $| be the singly stochastic matrix with entries
Since |$k-s=(1-o_k(1))k$| and |$\max _{j>s}\rho _{ij}<0.15$|⁠, we can apply Proposition 4.7 to |$J=[k]\setminus [s]$| for any |$i\in [k]$| (with, say, |$\lambda =1/2$|⁠). Hence,
(4.41)
We are going to compare |$f(\hat \rho )$| with |$f(\bar \rho _{s{\rm -stable}})$|⁠, the barycenter of the face of |$\mathcal D$| where the first |$s$| diagonal entries are equal to one. To this end, we need to estimate |$f(\hat \rho )=H(k^{-1}\hat \rho )+E(\hat \rho )$|⁠.
As |$\hat \rho $| is stochastic and |$\hat \rho _{ii}=\rho _{ii}\geq 1-\kappa $| for |$i\leq s$|⁠, we find that
(4.42)
Further, let |$q_i=\sum _{j=1}^s\hat \rho _{ij}$| for |$i>s$|⁠. Because |$\rho $| is doubly stochastic and |$\rho _{ii}\geq 1-\kappa $| for |$i\leq s$|⁠, we see that
(4.43)
Based on (4.42)–(4.43), we obtain the following estimate of the entropy.  
Claim 4.17

We have |$H(k^{-1}\hat \rho )\leq H(k^{-1}\bar \rho _{s{\rm -stable}})+o_k(1/k)$|⁠.

 
Proof
By Corollary 4.10 and (4.42),
(4.44)
Once more by Corollary 4.10,
(4.45)
Since |$h$| is concave, (4.43) and (4.45) yield
(4.46)
Plugging the bounds (4.44) and (4.46) into (4.4), we arrive at
thereby proving the claim.
 
Claim 4.18

We have |$E(\hat \rho )\leq E(\bar \rho _{s{\rm -stable}})+o_k(1/k)$|⁠.

 
Proof
As a first step, we show that there is a constant |$\gamma >0$| such that
(4.47)
Indeed, as |$\hat \rho $| is a stochastic matrix, we have
(4.48)
Furthermore, since |$\sum _{j>s}\rho _{ij}\leq 1$| for each |$i\in [k]\setminus [s]$|⁠, we have
(4.49)
Moreover, (4.43) shows that |$\sum _{i>s}q_i=\sum _{i>s}\sum _{j\leq s}\hat \rho _{ij}\leq \kappa s$|⁠. Hence,
(4.50)
As |$s\leq k^{0.999}$| and because |$\kappa =\ln ^{20}k/k$|⁠, there is a constant |$\gamma >0$| such that |$\kappa s\leq k^{-0.001}\ln ^{20}k\leq k^{-\gamma /2}$| (provided that |$k$| is sufficiently large). Thus, combining (4.48)–(4.50), we obtain (4.47).

By comparison, we have |$\|\bar \rho _{s{\rm -stable}}\|_2^2=s+1$|⁠. Hence, the bound (4.6) on the derivative of |$E$| and (4.47) yield |$E(\hat \rho )\leq E(\bar \rho _{s{\rm -stable}})+o_k(1/k)$|⁠, as claimed.

Combining Claims 4.17 and 4.18, we see that |$f(\hat \rho )\leq f(\bar \rho _{s{\rm -stable}})+o_k(1/k)$|⁠. Hence, (4.41) yields
The last expression is decreasing in |$s$| (for |$1\leq s\leq k^{0.999}$|⁠). Thus, |$f(\rho )< f(\bar \rho )-1/k+o_k(1/k)$|⁠. This implies the assertion because we chose |$\rho $| to be the maximizer of |$f$| over |$\mathcal D_{s,{\rm tame}}$|⁠.

4.6 Proof of Proposition 4.5

Suppose that |$k^{0.999}<s<k-k^{0.49}$| and let |$\rho \in \mathcal D_{s,{\rm tame}}$| be the maximizer of |$f$| over |$\mathcal D_{s,{\rm tame}}$|⁠. We may assume without loss that |$\rho _{ii}\geq 0.51$| for |$i=1,\ldots ,s$| and |$\rho _{ij}<0.51$| for |$(i,j)\not \in \left \{{(1,1),\ldots ,(s,s)}\right \}$|⁠. Due to separability, we thus have |$\rho _{ii}\geq 1-\kappa $| for |$i=1,\ldots ,s$|⁠. Further, we may assume that |$\rho _{ij}\leq 0.15$| for all |$(i,j)\not \in \left \{{(1,1),\ldots ,(s,s)}\right \}$| as otherwise Proposition 4.3 yields |$f(\rho )<0<f(\bar \rho )$|⁠.

Let |$\hat \rho $| be the stochastic matrix with entries
Since |$\max _{i\neq j}\rho _{ij}\leq 0.15$| and |$s,k-s>k^{0.49}$|⁠, we can apply Proposition 4.7 to |$J_i=\left [{k}\right ]\setminus [s]$| and to |$J_i'=[s]\setminus \{i\}$| for all |$i\in [k]$| (with, say, |$\lambda =0.4$|⁠). We thus obtain
(4.51)
To estimate |$f(\hat \rho )$|⁠, let
Since |$\rho $| is doubly stochastic and |$\rho _{ii}\geq 1-\kappa $| for |$i\leq s$|⁠, we see that
(4.52)
In addition, let
(4.53)
 
Claim 4.19

We have |$H(\hat \rho )\leq 2\ln k+\frac {3q(2+\ln k)}k+\left ({1-s/k}\right )\ln (1-s/k)-\frac {s\ln k}k+\frac {2\ln k}k\sum _{i=1}^st_i+O_k(1/k)$|⁠.

 
Proof
Applying Corollary 4.10, we obtain
(4.54)
Set
Summing (4.54) up, recalling from (4.52) that |$q=\sum _{i\leq s}q_i$|⁠, and using the concavity of |$h$|⁠, we get
(4.55)
Furthermore, again by Corollary 4.10, for |$i>s,$| we have
Once more due to the concavity of |$h$| and as |$q=\sum _{i>s}q_i$|⁠, we see that
(4.56)
Combining (4.55) and (4.56), we get
Using the elementary inequality |$h(z)\leq z(1-\ln z)$| to simplify the above, we get
(4.57)
Since |$s\leq k$|⁠, we obtain
(4.58)
Finally, the assertions follows by combining (4.57) and (4.58).
 
Claim 4.20

We have |$E(\hat \rho ) =-2\ln k+\frac {s\ln k}{k}\left ({1+\frac {3}{2k}-\frac {s}{2k^2}}\right )-\frac {2\ln k}k\sum _{i=1}^st_i+\tilde O_k(1/k)$|⁠.

 
Proof
As a first step, we show that
(4.59)
Indeed, together with the definition of |$\hat \rho $|⁠, Equation (4.53) shows that for |$i\in [s]$|⁠,
(4.60)
(4.61)
Moreover, since |$\hat \rho $| is stochastic and |$\hat \rho _{ii}\geq 1-\kappa $| if |$i\leq s$|⁠, we have
(4.62)
Combining (4.60)–(4.62) and recalling that |$\kappa =\tilde O_k(k^{-1})$|⁠, we obtain
(4.63)
Further, since |$\rho _{jj}\geq 1-\kappa $| for |$j\leq s$| and because |$\rho $| is doubly stochastic, we have |$\rho _{ij}\leq \kappa $| for all |$j\leq s<i$|⁠. By the construction of |$\hat \rho $|⁠, this implies that |$\hat \rho _{ij}\leq \kappa $| for all |$j\leq s<i$|⁠. Furthermore, |$q=\sum _{i>s}\sum _{j\in [s]}\hat \rho _{ij}\leq \kappa s$| by (4.52). As a sum of squares is maximized if the summands are as unequal as possible, we obtain
(4.64)
In addition, once more by the construction of |$\hat \rho $|⁠,
(4.65)
Combining (4.63)–(4.65), we obtain (4.59).
By comparison, we have |$\|\bar \rho _{s{\rm -stable}}\|_2^2=s+1$|⁠. Hence, (4.6) implies together with (4.59) that
Plugging in the expression (4.9) for |$E(\bar \rho _{s{\rm -stable}})$| yields the assertion.
Finally, combining Claims 4.19 and 4.20, we see that
(4.66)
Our assumption |$k^{0.999}<s<k-k^{0.49}$| ensures that |$-\frac sk(1-s/k)+\tilde O_k(1/k)<0$|⁠. Thus, (4.66) and Proposition 2.5 show that |$f(\rho )<0<f(\bar \rho )$|⁠. This completes the proof as |$\rho $| was chosen to be the maximizer of |$f$| over |$\mathcal D_{s,{\rm tame}}$|⁠.

4.7 Proof of Proposition 4.6

Suppose that |$k-\sqrt k\leq s\leq k-1$| and that |$\rho \in \mathcal D_{s,{\rm tame}}$| maximizes of |$f$| over |$\mathcal D_{s,{\rm tame}}$|⁠. As before, we assume without loss that |$\rho _{ii}\geq 0.51$| for |$i=1,\ldots ,s$| and |$\rho _{ij}<0.51$| for |$(i,j)\not \in \left \{{(1,1),\ldots ,(s,s)}\right \}$|⁠. Thus, |$\rho _{ii}\geq 1-\kappa $| for |$i=1,\ldots ,s$| as |$\rho $| is separable. Further, if |$\rho _{ij}>0.15$| for some |$(i,j)\in \left \{{(1,1),\ldots ,(s,s)}\right \}$|⁠, then |$f(\rho )<0<f(\bar \rho )$| by Proposition 4.3. Hence, we assume |$\rho _{ij}\leq 0.15$| for all |$(i,j)\not \in \left \{{(1,1),\ldots ,(s,s)}\right \}$|⁠.

Let |$q_i=\sum _{j\neq i}\rho _{ij}$| for |$i\in [s]$|⁠. Because |$\rho $| is doubly stochastic and |$\rho _{ii}\geq 1-\kappa $| for |$i\leq s$|⁠, we see that
(4.67)
In addition, let
Since |$\rho $| is doubly stochastic, we have
(4.68)
We are going to compare |$f(\rho )$| with |$f({\rm id})$|⁠, where |${\rm id}$| is the identity matrix (with ones on the diagonal and zeros elsewhere).  
Claim 4.21

With |$\mathcal {H}=\frac 1k\sum _{i=1}^s h(\rho _{ii})$| we have |$H(k^{-1}\rho )\leq \ln k+\mathcal {H}+\frac {q}k\ln k+0.51(k-s)\frac {\ln k}k$|⁠.

 
Proof
Corollary 4.10 implies together with the concavity of |$h$| that
(4.69)
Because |$-z\ln z\leq 1$| for all |$z>0$|⁠, we have |$-\frac tk\ln t\leq 1/k$|⁠. Moreover, as |$\rho $| is doubly stochastic (4.68) implies that |$t\leq k-s$|⁠. Additionally, (4.67) shows that |$q\leq \kappa s\leq \kappa k=\tilde O_k(1)$|⁠, because |$\kappa =\ln ^{20}k/k$|⁠. Thus,
Plugging this last estimate into (4.69), we obtain
(4.70)
Furthermore, using Corollary 4.10, (4.68) and the concavity of |$h$|⁠, we see that
(4.71)
Plugging (4.70) and (4.71) into (4.4), we find
(4.72)
as claimed.
 
Claim 4.22

We have |$E(\rho )\leq E({\rm id})+(1+\tilde O_k(1/k))\frac {\ln k}{k}\left ({-0.85(k-s)+\sum _{i=1}^s(\rho _{ii}^2-1)}\right )$|⁠.

 
Proof
The Frobenius norm of |$\rho $| can be estimated as follows. Since |$\rho _{ii}\geq 1-\kappa $| for all |$i\leq s$| and |$\rho $| is stochastic, we have |$\rho _{ij}\leq \kappa $| for all |$i\leq s$|⁠, |$j\neq i$|⁠. Hence, the bound (4.67) implies together with the fact that a sum of squares is maximized by having the summands as unequal as possible that
(4.73)
A similar argument applies to the remaining rows. More precisely, if |$i>s$| then |$\rho _{ij}\leq 0.15$| for all |$j$| by our initial assumption on |$\rho $|⁠. Therefore,
(4.74)
Combining (4.73) and (4.74), we arrive at
(4.75)
By comparison, |$\left \|{\rm id}\right \|_F^2=k$|⁠. Thus, (4.75) yields |$\|\rho \|_F^2-\|{\rm id}\|_F^2\leq -0.85(k-s)+\sum _{i=1}^s(\rho _{ii}^2-1)+\tilde O_k(1/k)$|⁠. Combining this estimate with (4.6) completes the proof.
Observing that |$H(k^{-1}{\rm id})=\ln k$| and using Claims 4.21 and 4.22, we obtain
(4.76)
To complete the proof, let |$r_i=1-\rho _{ii}$| for |$i=1,\ldots ,s$|⁠. Then (4.67) shows that |$q=\sum _{i=1}^sr_i$|⁠. Moreover, |$\mathcal {H}=\frac 1k\sum _{i=1}^sh(r_i)$|⁠, as |$h(1-z)=h(z)$| for all |$z$|⁠. Since |$r_i\leq \kappa =\tilde O_k(1/k)$|⁠, we have
Plugging this bound into (4.76) and recalling that |$s\leq k-1$|⁠, we get
(4.77)
Finally, we calculate |$f({\rm id})=\ln k+\frac d2\ln (1-1/k)=\frac 12f(\bar \rho )$|⁠. Since |$f(\bar \rho )>0$| (by Proposition 2.5), we conclude that |$f({\rm id})<f(\bar \rho )$|⁠. Thus, the assertion follows from (4.77).

5 The Laplace Method

In this section, we assume that |$d=d_{k,{\rm cond}}-\varepsilon _k$|⁠, with |$\varepsilon _k$| from Proposition 2.5. We continue to use the notation introduced in Section 2. In particular, we recall that an overlap matrix |$\rho $| is |$k$|-stable if for any |$i\in [k]$| there is |$j\in [k]$| such that |$\rho _{ij}>0.51$|⁠, that |$\rho $| is separable if |$\rho _{ij}\not \in \left ({0.51,1-\kappa }\right )$| for all |$i,j\in [k]$| (with |$\kappa =\ln ^{20}k/k$|⁠) and that |$\mathcal D_{{\rm tame}}$| contains all doubly stochastic |$\rho $| that are separable but not |$k$|-stable.

In this section, we prove Proposition 4.1. Recalling that |${\mathcal R}={\mathcal R}_{n,k}$| is the (discrete) set of overlap matrices, let
Then we can cast the second moment as a sum over the discrete set |${\mathcal R}$|⁠, namely
(5.1)
Because any tame |$k$|-coloring is balanced, Fact 2.2 yields
(5.2)
By Taylor-expanding |$f$| around |$\bar \rho $|⁠, we can estimate the contribution to the sum (5.1) resulting from |$\rho $| near |$\bar \rho $|⁠.  
Lemma 5.1
There exist |$C=C(k)>0$| and |$\eta =\eta (k)>0$| such that with |${\mathcal R}_0=\left \{{\rho \in {\mathcal R}:\left \|{\rho -\bar \rho }\right \|_F<\eta }\right \},$| we have
 
Proof
By construction, we have |$\sum _{i,j=1}^k\rho _{ij}=k$| for all |$\rho \in {\mathcal R}$|⁠. Therefore, we can parameterize |${\mathcal R}$| as follows. Let
Moreover, let |$\hat {\mathcal R}=\mathcal {L}^{-1}({\mathcal R})\ \hbox {and}\ \tilde \rho =\mathcal {L}^{-1}(\bar \rho )$|⁠.
We compute the Hessian of |$f\circ \mathcal {L}=H\circ \mathcal {L}+E\circ \mathcal {L}$| at |$\tilde \rho $|⁠. A direct calculation yields for |$(a,b)\neq (i,j)$|
(5.3)
Furthermore,
Thus, by the chain rule
(5.4)
Combining (5.3) and (5.4), we see that the first derivative of |$f\circ \mathcal {L}$| at the point |$\tilde \rho $| vanishes, and that the Hessian is
(5.5)
where |$\textbf {1}$| denotes the matrix with all entries equal to one and |${\rm id}$| is the identity matrix.
As |${\rm id}$| is positive definite, |$\textbf {1}$| is positive semidefinite and |$d/(k^2(1-1/k)^2)=O_k(\ln k/k)<\frac 12$|⁠, (5.5) shows that the Hessian is negative definite at |$\tilde \rho $|⁠. In fact, by continuity there exist numbers |$\tilde \eta ,\tilde \xi >0$| independent of |$n$| such that the largest eigenvalue of |$D^2f\circ \mathcal {L}$| is smaller than |$-\tilde \xi $| at all points |$\hat \rho $| such that |$\|\hat \rho -\tilde \rho \|_2<\tilde \eta $|⁠. Further, because |$\mathcal {L}$| is linear there is an |$n$|-independent |$\eta >0$| such that for all |$\rho \in {\mathcal R}_0=\left \{{\rho \in {\mathcal R}:\left \|{\rho -\bar \rho }\right \|_F<\eta }\right \},$| we have |$\|\mathcal {L}^{-1}(\rho )-\tilde \rho \|_2<\tilde \eta $|⁠. Hence, by Taylor's formula there is a number |$\xi >0$| that does not depend on |$n$| such that
(5.6)
Combining (5.2) and (5.6), we obtain
(5.7)
Finally, a direct calculation shows that |$f(\bar \rho )=2(\ln k+\frac d2\ln (1-1/k))$|⁠, whence |$\exp \left ({f(\bar \rho )n}\right )=O(k^n(1-1/k)^m)^2$| (as |$m=\lceil dn/2\rceil $|⁠). Thus, the assertion follows from Proposition 2.5 and (5.7).
To estimate the contribution of |$\rho \not \in {\mathcal R}_0$|⁠, we decompose |${\mathcal R}\setminus {\mathcal R}_0$| into three subsets:
Condition T2 from Definition 2.3 directly implies that
(5.8)
With respect to |${\mathcal R}_2$|⁠, we have  
Lemma 5.2

There is a number |$C=C(k)>0$| such that |$\sum _{\rho \in {\mathcal R}_2}{\rm E}\left [{Z_{\rho ,{\rm tame}}}\right ]\leq C\cdot {\rm E}[Z_{k,{\rm tame}}]^2$|⁠.

 
Proof
Let |${\mathcal R}_2'$| be the set of all |$k$|-stable |$\rho '\in {\mathcal R}$| (i.e., |$\rho _{ii}'>0.51$| for all |$i\in [k]$|⁠). Because we restrict ourselves to balanced |$k$|-colorings, the row and column sums of each matrix |$\rho \in {\mathcal R}$| are |$1+O(n^{-1/2})$|⁠. Hence, for any matrix |$\rho \in {\mathcal R},$| there is at most one entry greater than |$0.51$| in each row or column. Thus, suppose that |$\sigma ,\tau $| are tame |$k$|-colorings of |$G(n,m)$| such that |$\rho (\sigma ,\tau )\in {\mathcal R}_2$|⁠. Then each row and each column of |$\rho (\sigma ,\tau )$| have exactly one entry that is greater than |$0.51$|⁠. Therefore, there exists a permutation |$\pi :[k]\rightarrow [k]$| such that |$\sigma ,\pi \circ \tau $| are two colorings such that |$\rho (\sigma ,\pi \circ \tau )\in {\mathcal R}_2'$|⁠. Consequently,
(5.9)
Further, if |$\sigma ,\tau $| are |$k$|-colorings such that |$\rho (\sigma ,\tau )\in {\mathcal R}_2'$|⁠, then |$\tau \in \mathcal {C}(\sigma )$| by the very definition of the cluster |$\mathcal {C}(\sigma )$|⁠. Therefore, by the linearity of expectation and Bayes' formula, we have
(5.10)
Now, if |$\sigma $| is a tame |$k$|-coloring, then by T3 we know that |$\mathcal {C}(\sigma )\leq {\rm E}[Z_{k,{\rm bal}}]$| with certainty. Thus, (5.10) yields
(5.11)
Combining (5.9) and (5.11), we get |$\sum _{\rho \in {\mathcal R}_2}{\rm E}\left [{Z_{\rho ,{\rm tame}}}\right ]\leq (1+o(1))k!\cdot {\rm E}\left [{Z_{k,{\rm tame}}}\right ]^2$|⁠, as claimed.

To bound the contribution of |$\rho \in {\mathcal R}_3$|⁠, we need the following observation. Recall that |$\mathcal D$| stands for the Birkhoff polytope, that is, the set of all doubly stochastic |$k\times k$| matrices and that |${\mathcal R}$| is the set of all possible overlap matrices of balanced |$\sigma ,\tau $|⁠.  

Lemma 5.3

There is a number |$C=C(k)>0$| such that for any |$\rho \in {\mathcal R}$| there is |$\rho '\in \mathcal D$| with |$\left \|{\rho -\rho '}\right \|_F< C/\sqrt n$|⁠.

 
Proof

Let |$\rho \in {\mathcal R}$|⁠. By construction, we have |$\sum _{i,j}\rho _{ij}=k$|⁠. Hence, while there is |$i\in [k]$| such that the row sum is |$\sum _{j}\rho _{ij}=1+\alpha >1$|⁠, there must be another row |$l$| such that |$\sum _{j}\rho _{lj}=1-\alpha '<1$|⁠. Thus, recalling that |$\rho _i=(\rho _{ij})_{j\in [k]}$| denotes the |$i$|th row of |$\rho $|⁠, by replacing row |$i$| by |$(1-\alpha '')\rho _i$| and row |$l$| by |$\rho _l+\alpha ''\rho _i$| for some suitable |$\alpha ''\leq 2k/\sqrt n$|⁠, we can ensure that at least one of the row sums is one. After at most |$k-1$| steps, we thus obtain a stochastic matrix |$\rho ''$| such that |$\|\rho -\rho ''\|_2=2k^3/\sqrt n$|⁠. Repeating the same operation for the columns yields the desired doubly stochastic |$\rho '$|⁠.

 
Lemma 5.4

If |$f(\rho )<f(\bar \rho )$| for any |$\rho \in \mathcal D_{{\rm tame}}\setminus \left \{{\bar \rho }\right \}$|⁠, then |$\sum _{\rho \in {\mathcal R}_3}{\rm E}\left [{Z_{\rho ,{\rm tame}}}\right ]\leq {\rm E}[Z_{k,{\rm tame}}]^2$|⁠.

 
Proof
Let |$\eta >0$| be the number from Lemma 5.1 and let |$\mathcal {D}'$| be the set of all |$\rho \in \mathcal D_{{\rm tame}}$| such that |$\left \|{\rho -\bar \rho }\right \|_F\geq \eta /2$|⁠. The set |$\mathcal {D}'$| is compact. Hence, our assumption that |$f(\rho )<f(\bar \rho )$| for any |$\rho \in \mathcal D_{{\rm tame}}\setminus \left \{{\bar \rho }\right \}$| implies that there exists a number |$\gamma >0$| (independent of |$n$|⁠) such that
(5.12)
In fact, because the function |$f$| is uniformly continuous on |$[0,1]^{k^2}$|⁠, there is |$0<\delta <\eta /3$| such that
(5.13)

We claim that |${\mathcal R}_3\subset \mathcal {D}''$|⁠. Indeed, any |$\rho \in {\mathcal R}_3$| satisfies |$\left \|{\rho -\bar \rho }\right \|_F\geq \eta $| (as otherwise |$\rho \in {\mathcal R}_0$|⁠), is separable (as otherwise |$\rho \in {\mathcal R}_1$|⁠), and fails to be |$k$|-stable (as otherwise |$\rho \in {\mathcal R}_2$|⁠). Moreover, by Lemma 5.3 there is a doubly stochastic |$\rho '$| such that |$\left \|{\rho -\rho '}\right \|_F<C/\sqrt n$|⁠. However, this matrix |$\rho '$| may or may not be separable and/or |$k$|-stable. To rectify this, we form a convex combination between |$\rho '$| and a suitable doubly stochastic matrix. More precisely, suppose that the matrix |$\rho $| has precisely |$l\leq k-1$| entries that are greater than |$0.51$|⁠. Each row and each column contain at most one such entry (as |$\rho \in \mathcal {B}$|⁠). Thus, we may assume without loss of generality that |$\rho _{11},\ldots ,\rho _{ll}>0.51$|⁠. Let us consider two cases.

Case 1: |$l<k-1$|⁠: let |$\rho ''$| be the doubly stochastic matrix with |$\rho ''_{11}=\cdots =\rho _{ll}''=1$| and |$\rho _{ij}''=(k-l)^{-1}$| for |$i,j>l$|⁠.

Case 2: |$l=k-1$|⁠: let |$\rho ''$| be the doubly stochastic matrix with |$\rho ''_{11}=\cdots =\rho _{ll}''=1-1/(k-1)$| and |$\rho _{ik}=1/(k-1)$| for |$i<k$|⁠.

Let |$\beta =n^{-1/3}$| and suppose that |$n$| is large. We claim that in either case |$\rho '''=(1-\beta )\rho '+\beta \rho ''\in \mathcal {D}'$|⁠. Indeed, since |$\rho \in {\mathcal R}_3$| and |$\left \|{\rho -\rho '}\right \|_F<C/\sqrt n$|⁠, the construction of |$\rho ''$| ensures that |$\rho '''_{ii}\geq 1-\kappa $| for all |$i\leq l$|⁠. Similarly, |$\rho '''_{ij}<0.51$| unless |$i=j\leq l$|⁠. Hence, |$\rho \in \mathcal D_{{\rm tame}}$|⁠. Further, since |$\left \|{\rho -\rho '}\right \|_F<C/\sqrt n$| the triangle inequality implies that |$\left \|{\rho -\rho '''}\right \|_F<\delta $|⁠. As |$\left \|{\rho -\bar \rho }\right \|\geq \eta $|⁠, we conclude that |$\rho '''\in \mathcal {D}'$|⁠, whence |$\rho \in \mathcal {D}''$|⁠.

As |${\mathcal R}_3\subset \mathcal {D}''$|⁠, (5.13) yields
(5.14)
Thus, (5.2) implies
(5.15)
Upon direct inspection, we find |$f(\bar \rho )=2(\ln k+\frac d2\ln (1-1/k))$|⁠. Recalling that |$m=\lceil dn/2\rceil $|⁠, we thus obtain from Proposition 2.5
(5.16)
Combining (5.15) and (5.16), we obtain
thereby completing the proof.

Finally, Proposition 4.1 follows from (5.8) and Lemma s 5.1, 5.2 and 5.4.

Funding

The research leading to these results has received funding from the European Research Council under the European Union's Seventh Framework Programme (FP 7, 2007–2013) / ERC Grant Agreement 278857–PTCC. Funding to pay the Open Access publication charges for this article was provided by the European Research Council under the European Union's Seventh Framework Programme (FP7/2007-2013) / ERC Grant Agreement n. 278857–PTCC.

Appendix. Proof of Lemma 3.2

Throughout this section, we assume that |$2k\ln k-\ln k-2\leq d\leq 2k\ln k$|⁠. In addition, we fix some |$\sigma \in \mathcal {B}$| and we let |$V_i=\sigma ^{-1}(i)$| for |$i=1,\ldots ,n$|⁠.

To simplify the calculations, we consider the following variant of the planted model. Given |$\sigma $|⁠, |$n$| and |$q\in (0,1)$|⁠, we let |$\mathcal {G}(n,q,\sigma )$| be the random graph in which any two vertices |$v,w$| with |$\sigma (v)\neq \sigma (w)$| are adjacent with probability |$p$| independently. The following observation relates this model to the planted model |$G(n,m,\sigma )$| from Lemma 3.2.  

Fact A.1
Given |$\sigma \in \mathcal {B}$|⁠, let |$p$| be such that the expected number of edges in |$\mathcal {G}(n,p,\sigma )$| is equal to |$m=\lceil dn/2\rceil $|⁠. There is a number |$C=C(k)>0$| such that
 
Proof
By the choice of |$p$|⁠, the number |$e(\mathcal {G}(n,p,\sigma ))$| of edges of the random graph |$G(n,p,\sigma )$| has a binomial distribution with mean
(A.1)
Hence, Stirling's formula shows that for some number |$C=C(k)>0$| we have |${\rm P}\left [{e(\mathcal {G}(n,p,\sigma ))=m}\right ]\geq (C\sqrt n)^{-1}$|⁠. Further, given that |$e(\mathcal {G}(n,p,\sigma ))=m$|⁠, the distribution of the random graph |$\mathcal {G}(n,p,\sigma ))$| is identical to that of |$G(n,m,\sigma )$|⁠. Thus, for any event |$\mathcal {A}$|
as claimed.

From here on out, we fix |$\sigma \in \mathcal {B}$| and choose |$p\in (0,1)$| such that the expected number of edges in |$\mathcal {G}(n,p,\sigma )$| is equal to |$m$|⁠; because |$\sigma $| is balanced, (A.1) implies that
(A.2)
In the following, we are going to show that the properties P1–P4 are satisfied in |$\mathcal {G}(n,p,\sigma )$| with probability |$1-O(1/n)$|⁠. Then Fact A.1 readily implies that they hold in |$G(n,m,\sigma )$| w.h.p.

The following installment of the Chernoff bound will prove useful.  

Lemma A.2 ([26])
Let |$\varphi (x)=(1+x)\ln (1+x)-x$|⁠. Let |$X$| be a binomial random variable with mean |$\mu >0$|⁠. Then for any |$t>0$|⁠,
In particular, for any |$t>1,$| we have |${\rm P}\left [{X>t\mu }\right ]\leq \exp \left [{-t\mu \ln (t/{\rm e})}\right ]$|⁠.

A.1 Proof of P1
We may assume |$i=1$| without loss of generality. Let |$0.509\leq \alpha \leq 1-k^{-0.499}$| and let |$S\subset V_1$| be a set of size |$|S|=\alpha n/k$|⁠. Because in |$\mathcal {G}(n,p,\sigma )$| edges occur independently, for any |$v\in V\setminus V_1,$| the number of neighbors of |$v$| in |$S$| has distribution |${\rm Bin}(\alpha n/k,p)$|⁠. Hence, as |$\sigma $| is balanced the number |$X_S$| of |$v\in V\setminus V_1$| with no neighbor in |$S$| has a binomial distribution with mean |$n(1-1/k+o(1))(1-p)^{\alpha n/k}$|⁠. Our assumption on |$d$| and (A.2) imply that |$(1\,{-}\,p)^{\alpha n/k}\leq \exp \left [{-\alpha np/k}\right ]\leq 2k^{-2\alpha }$|⁠. Thus,
(A.3)
Consequently, by Lemma A.2
(A.4)
By comparison, because |$\sigma $| is balanced, for a given |$\alpha $| the number of ways to choose |$S$| is
(A.5)
Let us call |$S$||$\alpha $|-bad if |$X_S\geq (1-\alpha )\frac nk-n^{2/3}$|⁠. Combining (A.3), (A.4) and (A.5) and taking the union bound over |$S\subset V_1$| with |$|S|=\alpha n/k$|⁠, we obtain
To complete the proof of P1, we are going to show that the right-hand side is |$\exp (-\Omega (n))$|⁠.
Thus, we need to estimate
This is negative iff
(A.6)
By convexity, the exponential function on the l.h.s. and the linear function on the r.h.s. intersect at most twice, and between these two intersections the linear function is greater. Further, an explicit calculation verifies that the r.h.s. of (A.6) is larger than the l.h.s. at both |$\alpha =0.509$| and |$\alpha =1-k^{-0.499}$|⁠. Thus, (A.6) is true in the entire range |$0.509<\alpha <1-k^{-0.499}$|⁠.
A.2 Proof of P2

In |$\mathcal {G}(n,p,\sigma )$|⁠, for each vertex |$v\in V\setminus V_i$| the number of neighbors of |$v$| in |$V_i$| has distribution |${\rm Bin}(|V_i|,p)$|⁠. Due to (A.2) and because |$\sigma $| is balanced, the mean is |$\lambda =|V_i|p\sim \frac nk p>2\ln k$|⁠. Hence, by Stirling's formula, the probability that |$v$| has fewer than 15 neighbors in |$V_i$| is |$q\leq 2\lambda ^{14}\exp (-\lambda )\leq 2k^{-2}\ln ^{14}k$|⁠. Further, because the event of having fewer than 15 neighbors in |$V_i$| occurs independently for all |$v\in V\setminus V_i$|⁠, the total number |$Y_i$| of such vertices has a binomial distribution |${\rm Bin}(|V\setminus V_i|,q)$|⁠. As |$\sigma $| is balanced, the mean is |$|V\setminus V_i|q\leq (1-1/k+o(1))n\cdot q\leq 3k^{-2}\ln ^{14}k.$| Since we chose |$\kappa =k^{-1}\ln ^{20}k$|⁠, a straightforward application of Lemma A.2 (the Chernoff bound) implies that |${\rm P}\left [{Y_i>\frac {\kappa n}{3k}}\right ]\leq \exp (-\Omega (n)),$| as desired.

A.3 Proof of P3
Let |$0<\alpha <k^{-4/3}$| and let |$S\subset V$| of size |$|S|=\alpha n$|⁠. The number |$e(S)$| of edges spanned by |$S$| in |$\mathcal {G}(n,p,\sigma )$| is stochastically dominated by a random variable with distribution
${\rm Bin}(\left (\begin {smallmatrix}\alpha n\\ 2\end {smallmatrix}\right ),p)$
. For any two vertices |$v,w\in S$| are connected with probability at most |$p$| in |$\mathcal {G}(n,p,\sigma )$| (as the probability is exactly |$p$| if |$\sigma (v)\neq \sigma (w)$| and 0 otherwise). Thus,
Now, let |$X_\alpha $| be the number of sets |$S$| of size |$|S|=\alpha n$| such that |$e(S)\geq 5|S|$|⁠. Let |$d'=pn\sim \frac {dk}{k-1}$|⁠. By the union bound,
(A.7)
Further, let |$X=\sum _\alpha X_\alpha $|⁠, where the sum ranges over |$0<\alpha <k^{-4/3}$| such that |$\alpha n$| is an integer. Then (A.7) implies together with the assumption that |$\alpha <k^{-4/3}$| that
Thus, the probability that there is a set violating P3 is |$O(1/n)$|⁠.
A.4 Proof of P4

We start by estimating the size of the core; the proof of the following proposition draws on arguments developed in [1, 7].  

Proposition A.3

With probability |$1-\exp \left ({-\Omega (n)}\right )$|⁠, the core of |$\mathcal {G}(n,p,\sigma )$| contains |$(1\,{-}\,\tilde {O}_k(k^{-1}))n$| vertices.

The proof of Proposition A.3 is constructive: basically, we iteratively remove vertices of that have too few neighbors of some color other than their own among the remaining vertices. More precisely, we consider the following process. For a vertex |$v$| and a set |$S$| of vertices let |$e(v,S)$| denote the number of neighbors of |$v$| in |$S$| in |$\mathcal {G}(n,p,\sigma )$|⁠.

  • CR1: For |$i,j\in [k]$|⁠, |$i\neq j$|⁠, let |$W_{ij}=\{v \in V_i:e(v,V_j) < 300\}$|⁠, |$W_{ii}=\emptyset $|⁠, |$W_{i}=\cup _{j\neq i} W_{ij}$|⁠, and |$W=\cup _{i=1}^k W_i$|⁠.

  • CR2: For |$i \ne j$|⁠, let |$U_{ij}=\{v \in V_i: e(v,W_j) > 100\}$| and |$U = \cup _{i \ne j} U_{ij}$|⁠.

  • CR3: Set |$Z^{(0)} = U$| and repeat the following for |$i\geq 0$|⁠:

    • if there is |$v \in V \setminus Z^{(i)}$| such that |$e(v,Z^{\{i\}})\geq 100$|⁠, pick one such |$v$| and let |$Z^{(i+1)}=Z^{(i)} \cup \{v\}$|⁠;

    • otherwise, let |$Z^{(i+1)}=Z^{(i)}$|⁠.

Let |$Z=\cup _{i\geq 0}Z^{\{i\}}$| be the final set resulting from CR3. By construction, the set |$V\setminus (W\cup Z)$| is contained in the core. Indeed, if |$v\in V_i\setminus W$|⁠, then |$v$| has at least 300 neighbors in |$V_j$| for every |$j\neq i$|⁠. Further, if |$v\not \in U\subset Z$|⁠, then at least 200 of these belong to |$V_j\setminus W$|⁠. Hence, if |$v\not \in Z$|⁠, then |$v$| has at least 100 neighbors in |$V_j\setminus (W\cup Z)$|⁠. To complete the proof of Proposition A.3, we bound the sizes of |$W$|⁠, |$U$| and |$Z$| (Lemmas A.4–A.6).  

Lemma A.4

With probability at least |$1-\exp \left ({-\Omega (n)}\right ),$| we have |$|W_{ij}|\leq \tilde O_k(k^{-3})$| for any |$i,j$|⁠.

 
Proof

Fix |$i,j$|⁠, |$i \ne j$|⁠. Due to the independence of the edges in |$\mathcal {G}(n,p,\sigma )$|⁠, for any |$v \in V_i$| the number |$e(v,V_j)$| of neighbors in |$V_j$| has distribution |${\rm Bin}(|V_j|,p)$|⁠. As |$\sigma $| is balanced, (A.2) shows that the mean is |$\mu =|V_j|p\geq 2\ln k$|⁠. Using the Chernoff bound (Lemma A.2), we obtain |$P[|e(v,V_j)| \le 300] \le \exp (-2\ln k +O_k(\ln \ln k)) = \tilde O_k(k^{-2})$|⁠. Hence, by the linearity of expectation and because |$\sigma $| is balanced, |$\mathbb E[|W_{ij}|] \le \tilde O_k(k^{-2}) \cdot |V_i|= n\cdot \tilde O_k(k^{-3})$|⁠. Further, once more due to the independence of the edges in |$\mathcal {G}(n,p,\sigma )$|⁠, |$|W_{ij}|$| is a binomial random variable. Thus, using the Chernoff bound once more (with, say, |$t= k^{-4}n$|⁠), we see that |$P[|W_{ij}|\leq \tilde O_k(k^{-3})n]\geq 1-\exp (-\Omega (n))$|⁠, as required.

 
Lemma A.5

With probability at least |$1-\exp \left ({-\Omega (n)}\right )$| we have |$|U| \le n/k^{30}$|⁠.

 
Proof
We define two sets whose union contains |$U_{ij}$|⁠:
Thus, it suffices to bound the sizes of |$U'_{ij}$|⁠, |$U''_{ij}$| separately.
Let us start with |$U'_{ij}$|⁠. By construction, which vertices belong to |$W_j\setminus W_{ji}$| is independent of the edges between color classes |$V_i,V_j$|⁠. Hence, for any |$v\in V_i$| the number |$e(v,W_i\setminus W_{ji})$| has distribution |${\rm Bin}(|W_i\setminus W_{ji}|,p)$|⁠. Thus,
Therefore, the Chernoff bound (Lemma A.2) applied with, say, |$t=45$| yields
(A.8)
Once more due to the independence of the edges in |$\mathcal {G}(n,p,\sigma )$|⁠, the events |$v\in U_{ij}'$| are mutually independent for |$v\in V_i$|⁠. by Lemma A.4, this event occurs with probability |$1-\exp (-\Omega (n))$|⁠. In effect, given |$|W_j\setminus W_{ji}| \le n\cdot \tilde O_k(k^{-2})$|⁠, |$|U_{ij}'|$| has a binomial distribution. Thus, (A.8) implies together with the Chernoff bound (applied with, say, |$t=k^{-100}n$|⁠) that
(A.9)
Further, Lemma A.4 implies that |${\rm P}[|W_j\setminus W_{ji}| \le n\cdot \tilde O_k(k^{-2})]\geq 1-\exp (-\Omega (n))$|⁠. Combining this bound with (A.10), we obtain
(A.10)

With respect to |$U''_{ij}$|⁠, we observe the following. Given that |$w\in W_{ji}$|⁠, we know that |$w$| has fewer than 300 neighbors in |$V_i$|⁠. But the fact that |$w\in W_{ji}$| has no implications as to which|$v\in V_i$| vertex |$w$| is adjacent to. Thus, given that |$w\in W_{ji}$| and given |$e(w,V_i)$|⁠, the actual set of neighbors of |$w$| in |$V_i$| is a random subset of |$V_i$| of size |$e(w,V_i)\leq 300$|⁠. In fact, these sets are mutually independent for all |$w\in W_{ji}$|⁠. Thus, we can bound |$|U_{ij}''|$| by means of the following balls and bins experiment: let us think of the vertices in |$V_i$| as bins. Then each vertex |$w\in W_{ji}$| tosses 300 balls randomly into the bins |$V_i$|⁠, independently of all other vertices in |$W_{ji}$|⁠. In this experiment, let |$\mathcal {X}$| be the set of |$v\in V_i$| that receive at least 50 balls. Then |$|U_{ij}''|$| is dominated by |$|\mathcal {X}|$| stochastically.

Now, consider one |$v\in V_i$|⁠. Given |$|W_{ji}|$|⁠, the number of balls that land in |$v$| has distribution |${\rm Bin}(300|W_{ji}|,|V_i|^{-1})$|⁠. Therefore, the Chernoff bound yields
Hence, by the linearity of expectation |${\rm E}|\mathcal {X}|\leq nk^{-45}$|⁠. Hence, Azuma's inequality yields
Thus, Lemma A.4 implies
(A.11)
Finally, the assertion follows from (A.10) and (A.11), with room to spare.
 
Lemma A.6

With probability at least |$1-\exp \left ({-\Omega (n)}\right )$| we have |$|Z| \le n/k^{29}$|⁠.

 
Proof

Lemma A.5 entails that with probability at least |$1-\exp \left ({-\Omega (n)}\right )$|⁠, |$|U| \le n/k^{30}$|⁠. Assume that this is indeed the case. Further, suppose that |$|Z\setminus U|\geq i^*=n/k^{30}$|⁠. Let us stop the process CR3 at this point, and let |$Z^*=Z^{(i^*)}$|⁠. By construction, the graph induced on |$S=U \cup Z^*$| spans at least |$100i^*\geq 50|S|$| edges, while |$|S|\leq 2k^{-30}n$|⁠. Thus, the set |$S$| violates condition P3. But since we saw in Section  A.3 that P3 is satisfied with probability |$1\,{-}\,\exp (-\Omega (n))$|⁠, the assertion follows.

Now, Proposition A.3 is immediate from Lemma s A.4–A.6. For a set |$Y\subset V$| let us denote by |$N(Y)$| the set of all vertices |$v\in V$| that have a neighbor in |$Y$| in |$\mathcal {G}(n,p,\sigma )$|⁠. As a further step toward the proof of P4, we establish  

Lemma A.7
With probability |$1-\exp (-\Omega (n))$| the random graph |$\mathcal {G}(n,p,\sigma )$| has the following property.
(A.12)
 
Proof
Let |$\alpha <k^{-29}$| be the largest number such that |$\alpha n$| is an integer and let |$q=1-(1\,{-}\,p)^{\alpha n}$|⁠. For a set |$Y\subset V$| with |$|Y|=\alpha n$| the number of vertices |$v\in V\setminus Y$| that have a neighbor in |$Y$| in |$\mathcal {G}(n,p,\sigma )$| is stochastically dominated by |${\rm Bin}(n,q)$|⁠. This is because for any vertex |$y\in Y$| the probability that |$v,y$| are adjacent is either |$p$| (if |$\sigma (v)\neq \sigma (y)$|⁠) or 0 (if |$\sigma (v)=\sigma (y)$|⁠). Hence, observing that |$p\leq \alpha np$| and using the Chernoff bound, we get
(A.13)
Now, let |$X$| be the number of sets |$Y$| with |$|Y|=\alpha n$| such that |$|N(Y)\setminus Y|\geq nk^{-21}$|⁠. Together with the union bound, (A.13) shows
(A.14)
the last inequality follows because |$\alpha (1-\ln \alpha )\leq 32k^{-29}\ln k$| for |$0<\alpha <k^{-29}$|⁠. Thus, we obtain from (A.14) that |$X_\alpha =0$| for all such |$\alpha $| with probability |$1-\exp (-\Omega (n))$|⁠. If so, we see that any set |$Y$| of size |$|Y|\leq nk^{-29}$| satisfies |$|N(Y)|\leq |Y|+|N(Y)\setminus Y|\leq n(k^{-29}+k^{-21})\leq nk^{-20}$|⁠, as claimed.
 
Corollary A.8

With probability |$1-\exp (-\Omega (n)),$| we have |$|N(Z)|\leq n k^{-20}$|⁠.

 
Proof

This is immediate from Lemma s A.6 and A.7.

We define two sets of vertices, which capture the 1-free and 2-free vertices. In what follows, when always let |$i,j\in [k]$|⁠, |$i\neq j$|⁠. Let |$S_0$| be the set of vertices that have zero neighbors in some color class other than their own. Moreover, |$S_1 = \{v\in V \setminus S_0 : \exists i,j\ \text {s.t.}\ v \in V_i\ \text {and}\ N(v)\cap V_j \subseteq W_j\}$|⁠. By the construction of the core, we have  

Fact A.9

If |$v$| is 1-free, then |$v\in S_0\cup S_1\cup Z\cup N(Z)$|⁠.

We proceed by estimating the sizes of |$S_0$|⁠, |$S_1$|⁠.  

Lemma A.10

With probability |$1-\exp (-\Omega (n)),$| we have |$|S_0|\leq \frac nk$|⁠.

 
Proof
Consider a vertex |$v\in V_i$|⁠. The number |$e(v,V_j)$| of neighbors of |$V_i$| in |$V_j$| has distribution |${\rm Bin}(|V_j|,p)$|⁠. Since |$\sigma $| is balanced, (A.2) yields |${\rm P}\left [{e(v,V_j)=0}\right ]\leq (1-p)^{|V_j|}\leq k^{-2}$|⁠. Thus, by the union bound,
(A.15)
Because the events |$\{v\in S_0\}$| are mutually independent for all |$v\in V_i$|⁠, the Chernoff bound and (A.15) yield |$P\left [{|S_0\cap V_i|>n/k^2}\right ]\leq \exp (-\Omega (n))$|⁠. Taking the union bound over |$i$| completes the proof.
 
Lemma A.11

With probability |$1-\exp (-\Omega (n)),$| we have |$|S_1|\leq \tilde O_k(k^{-2})n$|⁠.

 
Proof
Fix |$i\neq j$|⁠. The total number |$e(V_i,V_j)$| of edges joining |$V_i$| and |$V_j$| in |$\mathcal {G}(n,p,\sigma )$| has distribution |${\rm Bin}(|V_i\times V_j|,p)$|⁠. Because |$\sigma $| is balanced, the Chernoff bound yields
(A.16)
In addition, we claim that the number |$e(V_i,W_j)$| of |$V_i$|-|$W_j$|-edges satisfies
(A.17)
Indeed, by Lemma A.4, we may assume that |$|W_j\setminus W_{ji}|\leq \tilde O_k(k^{-2})n$|⁠. By construction, the set |$W_j\setminus W_{ji}$| is independent of the random bipartite subgraph of |$\mathcal {G}(n,p,\sigma )$| consisting of the |$V_i$|-|$V_j$|-edges. Hence, the number |$e(V_i,W_j\setminus W_{ji})$| of edges between |$V_i$| and |$W_j\setminus W_{ji}$| has distribution |${\rm Bin}(|V_i\times (W_j\setminus W_{ji}),p)$|⁠. Given the upper bound on |$|W_j\setminus W_{ji}|$|⁠, the Chernoff bound thus implies that
(A.18)
Further, by construction the number of |$V_i$|-|$W_{ji}$|-edges is bounded by |$300|W_{ji}|$|⁠. Since by Lemma A.4 we may assume that |$|W_{ji}|\leq n\tilde O_k(k^{-3})$|⁠, (A.18) implies (A.17).
Let us condition on the event |$\mathcal {A}$| that |$b=e(V_i,V_j\setminus W_j)\geq \frac 13k^{-2}n^2p$| and |$r=e(V_i,W_j)\leq O_k(k^{-3})\leq n^2p$|⁠. Let us think of the vertices in |$V_i$| as bins, and of the |$V_i$|-|$V_j\setminus V_j$| edges as balls that are tossed independently and uniformly into the bins. More precisely, we think of the |$V_i$|-|$V_j\setminus W_j$| edges as blue balls, and of the |$V_i$|-|$W_j$|-edges as red balls. Let |$\mathcal {X}_{ij}$| be the number of bins |$v\in V_i$| that receive at least one ball but that do not receive a blue ball. Now, given that |$v$| receives |$l$| balls in total, the probability that all the balls it receives are red is equal to the probability that a hypergeometric random variable with parameters |$l,b,r$| takes the value |$l$|⁠. Therefore, summing over all |$l\geq 1$| and using our conditions on |$b,r$|⁠, we see that |${\rm P}\left [{v\in \mathcal {X}_{ij}}\right ]\leq \tilde O_k(k^{-3})$|⁠. Because |$\sigma $| is balanced, we thus obtain
(A.19)
In fact, because the balls are tossed into the bins independently of each other, Azuma's inequality implies together with (A.19) that
(A.20)
Since |${\rm P}[\mathcal {A}]\geq 1-\exp (-\Omega (n))$| by (A.16) and (A.17), (A.20) yields that |${\rm P}[|\mathcal {X}_{ij}|\leq \tilde O_k(k^{-4})n]\geq 1-\exp (-\Omega (n))$|⁠. Taking the union bound over |$i,j$| completes the proof because |$S_1\,{\subset }\,{\cup }_{i,j}\mathcal {X}_{ij}$|⁠.

Fact A.9 implies together with Lemma A.6, Corollary A.8, Lemma A.10 and Lemma A.11 the desired bound on the number of 1-free vertices. To bound the number of 2-free variables, we need  

Lemma A.12

Let |$i,j,l\in [k]$| be distinct. With probability at least |$1-\exp (-\Omega (n)),$| there are no more than |$n\tilde O_k(k^{-5})$| vertices |$v\in V_i$| such that |$e(v,V_j)\leq 100$| and |$e(v,V_l)\leq 100$|⁠.

 
Proof

For any |$v$|⁠, |$e(v,V_j)$|⁠, |$e(v,V_l)$| are independent binomial variables. Because |$\sigma $| is balanced, their means are |$(1+o(1))\frac nkp$|⁠. Hence, (A.2) shows that |${\rm P}\left [{e(v,V_j),e(v,V_l)\leq 100}\right ]\leq \tilde O_k(k^{-4})$|⁠. Consequently, the expected number of |$v\in V_i$| with |$e(v,V_j),e(v,V_l)\leq 100$| is |$nO_k(k^{-5})$|⁠. In fact, this is a binomial random variable due to the independence of the edges in |$\mathcal {G}(n,p,\sigma )$|⁠. Thus, the assertion follows from the Chernoff bound.□

Now, let |$S_2$| be the set of all |$v\in V_i$| such that there exist distinct |$j,l\in [k]\setminus \{i\}$| such that |$e(v,V_j)\leq 100$| and |$e(v,V_l)\leq 100$|⁠. By construction, if |$v$| is 2-free, then |$v\in S_2\cup Z\cup N(Z)$| (note that |$U\subset Z$|⁠). Thus, the desired bound on the number of 2-free vertices follows from Lemma A.6, Corollary A.8 and Lemma A.12.

References

1

Achlioptas
D.
and
Coja-Oghlan
A.
. “
Algorithmic Barriers from Phase Transitions
.”
Proc. 49th Annual IEEE Symposium on Foundations of Computer Science
,
793
802
.
2008
.

2

Achlioptas
D.
and
Friedgut
E.
. “
A sharp threshold for |$k$|-colorability
.”
Random Structures & Algorithms
14
, no.
1
(
1999
):
63
70
.

3

Achlioptas
D.
and
Molloy
M.
.
The Analysis of a List-Coloring Algorithm on a Random Graph
.”
Proc. 38th Annual IEEE Symposium on Foundations of Computer Science
,
204
12
.
1997
.

4

Achlioptas
D.
and
Moore
C.
.
The Chromatic Number of Random Regular Graphs
.”
Proc. 8th International Workshop on Randomization and Computation
,
219
28
.
2004
.

5

Achlioptas
D.
and
Moore
C.
. “
Random |$k$|-SAT: two moments suffice to cross a sharp threshold
.”
SIAM Journal on Computing
36
, no.
3
(
2006
):
740
62
.

6

Achlioptas
D.
and
Naor
A.
. “
The two possible values of the chromatic number of a random graph
.”
Annals of Mathematics
162
, no.
3
(
2005
):
1333
49
.

7

Alon
N.
and
Kahale
N. A.
. “
A spectral technique for coloring random 3-colorable graphs
.”
SIAM Journal on Computing
26
, no.
6
(
1997
):
1733
48
.

8

Alon
N.
and
Krivelevich
M.
. “
The concentration of the chromatic number of random graphs
.”
Combinatorica
17
, no.
3
(
1997
):
303
13
.

9

Appel
K.
and
Haken
W.
. “
Every planar map is four colorable
.”
Illinois Journal of Mathematics
21
, no.
3
(
1977
):
429
567
.

10

Bapst
V.
,
Coja-Oghlan
A.
,
Hetterich
S.
,
Raßmann
F.
, and
Vilenchik
D.
.
“The condensation transition in random graph coloring.” (2014): preprint arXiv:1404.5513
.

11

Bollobás
B.
The chromatic number of random graphs
.”
Combinatorica
8
, no.
1
(
1988
):
49
55
.

12

Bollobás
B.
Random Graphs
, 2nd edn.
Cambridge: Cambridge University Press
,
2001
.

13

Bollobás
B.
,
Borgs
C.
,
Chayes
J.
,
Kim
J.-H.
, and
Wilson
D.
. “
The scaling window of the 2-SAT transition
.”
Random Structures & Algorithms
18
, no.
3
(
2001
):
201
56
.

14

Coja-Oghlan
A.
Upper-bounding the |$k$|-colorability threshold by counting covers
.”
Electronic Journal of Combinatorics
20
, no.
3
(
2013
):
P32
.

15

Coja-Oghlan
A.
,
Hetterich
S.
, and
Efthymiou
C.
.
“On the chromatic number of random regular graphs.” (2013): preprint arXiv:1308.4287
.

16

Coja-Oghlan
A.
and
Panagiotou
K.
. “
Catching the |$k$|-NAESAT Threshold
.”
Proc. 44th ACM Symposium on Theory of Computing
,
899
908
.
2012
.

17

Coja-Oghlan
A.
Going After the |$k$|-SAT Threshold
.”
Proc. 45th ACM Symposium on Theory of Computing
,
2013
,
to appear
.

18

Coja-Oghlan
A.
and
Zdeborová
L.
. “
The Condensation Transition in Random Hypergraph 2-Coloring
.”
Proc. 23rd ACM-SIAM Symposium on Discrete Algorithms
,
241
50
.
2012
.

19

Dani
V.
,
Moore
C.
, and
Olson
A.
. “
Tight Bounds on the Threshold for Permuted |$k$|-Colorability
.”
Proc. 16th International Workshop on Randomization and Computation
,
505
16
.
2012
.

20

Dyer
M.
,
Frieze
A.
, and
Greenhill
C.
.
“On the chromatic number of a random hypergraph.” (2012): preprint arXiv:1208.0812
.

21

P.
Erdös
and
Rényi
A.
. “
On the evolution of random graphs
.”
Magayar Tud. Akad. Mat. Kutató Int. Közl.
5
(
1960
):
17
61
.

22

Frieze
A.
and
McDiarmid
C.
. “
Algorithmic theory of random graphs
.”
Random Structures & Algorithms
10
, no.
1–2
(
1997
):
5
42
.

23

Frieze
A.
and
Wormald
N.
. “
Random |$k$|-Sat: a tight threshold for moderately growing |$k$|
.”
Combinatorica
25
, no.
3
(
2005
):
297
305
.

24

Gilbert
E.
Random graphs
.”
Annals of Mathematical Statistics
30
, no.
4
(
1959
):
1141
4
.

25

Grimmett
G.
and
McDiarmid
C.
. “
On colouring random graphs
.”
Mathematical Proceedings of the Cambridge Philosophical Society
77
, no.
2
(
1975
):
313
24
.

26

Janson
S.
,
Łuczak
T.
, and
Ruciński
A.
.
Random Graphs
.
New Jersey: Wiley
,
2000
.

27

Kemkes
G.
,
Pérez-Giménez
X.
, and
Wormald
N.
. “
On the chromatic number of random |$d$|-regular graphs
.”
Advances in Mathematics
223
, no.
1
(
2010
):
300
28
.

28

Krivelevich
M.
Coloring Random Graphs – An Algorithmic Perspective
.”
Proc. 2nd Colloquium on Mathematics and Computer Science
,
175
95
.
2002
.

29

Krivelevich
M.
and
Sudakov
B.
. “
Coloring random graphs
.”
Information Processing Letters
67
, no.
2
(
1998
):
71
4
.

30

Krzakala
F.
,
Montanari
A.
,
Ricci-Tersenghi
F.
,
Semerjian
G.
, and
Zdeborova
L.
. “
Gibbs states and the set of solutions of random constraint satisfaction problems
.”
Proceedings of National Academy of Sciences USA
104
, no.
25
(
2007
):
10318
23
.

31

Krzakala
F.
,
Pagnani
A.
, and
Weigt
M.
. “
Threshold values, stability analysis and high-|$q$| asymptotics for the coloring problem on random graphs
.”
Physical Review E
70
(
2004
):
046705
.

32

Łuczak
T.
The chromatic number of random graphs
.”
Combinatorica
11
, no.
1
(
1991
):
45
54
.

33

Łuczak
T.
A note on the sharp concentration of the chromatic number of random graphs
.”
Combinatorica
11
, no.
3
(
1991
):
295
7
.

34

Matula
D.
Expose-and-merge exploration and the chromatic number of a random graph
.”
Combinatorica
7
, no.
3
(
1987
):
275
84
.

35

Mézard
M.
and
Montanari
A.
.
Information, Physics and Computation
.
Oxford, UK
:
Oxford University Press
,
2009
.

36

Mézard
M.
,
Parisi
G.
, and
Zecchina
R.
. “
Analytic and algorithmic solution of random satisfiability problems
.”
Science
297
, no.
5582
(
2002
):
812
5
.

37

Molloy
M.
The Freezing Threshold for |$k$|-Colourings of a Random Graph
.”
Proc. 43rd ACM Symposium on Theory of Computing
,
921
30
.
2012
.

38

van Mourik
J.
and
Saad
D.
. “
Random graph coloring: a statistical physics approach
.”
Physical Review E
66
(
2002
):
056120
.

39

Mulet
R.
,
Pagnani
A.
,
Weigt
M.
, and
Zecchina
R.
. “
Coloring random graphs
.”
Physical Review Letters
89
(
2002
):
268701
.

40

Robertson
N.
,
Sanders
D.
,
Seymour
P.
, and
Thomas
R.
. “
The four-colour theorem
.”
Journal of Combinatorial Theory Series B
70
, no.
1
(
1997
):
2
44
.

41

Shamir
E.
and
Spencer
J.
. “
Sharp concentration of the chromatic number of random graphs |$G(n,p)$|
.”
Combinatorica
7
, no.
1
(
1987
):
121
9
.

42

Zdeborová
L.
and
Krzakala
F.
. “
Phase transition in the coloring of random graphs
.”
Physical Review E
76
(
2007
):
031131
.

Author notes

Communicated by Prof. Assaf Naor

An extended abstract version of this work appeared in the Proceedings of the 54th IEEE Symposium on Foundations of Computer Science (“FOCS”), 2013.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.