3
$\begingroup$

I am trying to determine whether a set of (slightly) dependent negatively correlated Bernoulli random variables satisfy a Central Limit Theorem (CLT).

Let $\mathcal{G}_{reg}$ denote the uniform distribution over all $d$-regular graphs on vertex set $V = \{1, \ldots, n\}$. Consider the following experiment: Choose a graph $G \sim \mathcal{G}_{reg}$ uniformly at random, and construct a multiset $S$ of $n/d$ nodes drawn uniformly and independently at random with replacement.

For each $j \in [n]$, define the Bernoulli random variable $X_j$ as $$ X_j = \begin{cases} 1 &\text{if $j$ has at least one edge to a node in $S$}, \\ 0 &\text{otherwise} \end{cases} $$ It is straightforward to show that $X_j \sim Bern(p_n)$ where $p_n = 1 - (1-\frac{d}{n})^{n/d}$. However, the $X_j$'s are not independent -- they seem to be negatively correlated, as conditioning on one node being connected to $S$ means that all other nodes have a slightly lower probability of being connected to $S$. Note that we are both randomizing over the choice of graph $G$ and the multiset $S$.

I would like to show that $X_1, \ldots, X_n$ satisfy a central limit theorem (CLT), i.e. that $\sum_{i=1}^n X_i$ converges to a normal distribution. Intuitively, since the $X_i$'s are negatively correlated Bernoullis, $\sum X_i$ should be at least as tight as $\text{Bin}(n, p_n)$ and thus converge to something that is at least as tight as a Gaussian. However, this intuition breaks down as there are several counterexamples where this is not the case, e.g. this post. I looked into existing literature and found a paper by Pruss and Szynal which gives the CLT if and only if the following condition holds: $$ \lim_{n \to \infty} \frac{1}{\sqrt{np_n(1-p_n)}} \sum_{j=1}^n \mathbb{E}[(X_j-p_n)e^{it(\sum_{k\neq j} (X_k-p_n))/\sqrt{np_n(1-p_n)}}] = 0 \qquad \forall t > 0 $$ I haven't been able to figure out whether this condition is satisfied, as I'm not sure how to deal with the expectation.

Does the CLT hold for $X_1, \ldots, X_n$? Any help would be greatly appreciated.

Edit: I initially set the size of $S$ to be a constant, but it is actually $n/d$ for the purpose of my application. I fixed this in the question above.

$\endgroup$
1
  • 1
    $\begingroup$ Note that $p$ depends on $n$ so it is probably clearer to write $p_n$ throughout to make this explicit. $\endgroup$ Commented Jul 18 at 10:46

1 Answer 1

2
$\begingroup$

Assuming $k$ is fixed, the sum converges to a point mass.

As $n \to \infty$, both the probability that $S$ contains any vertex twice as well as the probability that any two vertices in $S$ are connected by an edge goes to 0. This means that with probability approaching $1$ you'll obtain $\sum X_i = dk$, i.e. the limiting distribution is just a point mass at $dk$.

I am unsure if the preconditions of the theorem of Pruss & Szynal hold, especially it requires that inverse of the variance doesn't grow too quickly, which it might do here. But if it applies, we can evaluate the criteria you mentioned. Using the reasoning from above, we know that as $n \to \infty$, there will be $dk$ nodes that contribute 1 and $n - dk$ nodes that contribute zero, in this case we have

$$ \sum_{j=1}^n\mathbb{E}[(X_j-p_n)e^{it(\sum_{l\neq j} (X_l-p_n))/\sqrt{np_n(1-p_n)}}] \\= dk(1-p_n)e^{it(dk - 1 - (n-1)p_n)/ \sqrt{np_n(1-p_n)} } + (n - dk)(-p_n)e^{it(dk - (n-1)p_n)/ \sqrt{np_n(1-p_n)} } $$

Now we notice that $p_n \to 0$ and that

$$ \lim_{n \to \infty} n p_n = n - n\left( 1- \frac{d}{n}\right)^k = 0 $$

also, the absolute value of $e^{itx}$ is 1 regardless of $tx$

So the first term in the sum goes to a number with absolute value $dk$ and the second term goes to $0$.

This gives us $$ \lim_{n \to \infty} \left| \frac{1}{\sqrt{np_n(1-p_n)}} \sum_{j=1}^n \mathbb{E}[(X_j-p_n)e^{it(\sum_{l\neq j} (X_l-p_n))/\sqrt{np_n(1-p_n)}}]\right| \\= \lim_{n \to \infty} \frac{dk}{\sqrt{np_n(1-p_n)}} $$

which diverges because $np_n \to 0$.

$\endgroup$
2
  • $\begingroup$ Thanks! I think your answer makes sense in the case that $k$ is fixed. What if $k$ is not fixed, but, say, $n/d$? This is actually what I had in mind, but didn't include it in the question as I didn't think it made a difference... oops! In this case, $p_n = 1 - (1-\frac{d}{n})^{n/d}$ converges to $1 - e^{-1} > 0$. $\endgroup$ Commented Jul 18 at 16:17
  • $\begingroup$ Sorry, don't have time to delve into it too deep now. But I think the main condition (Eq 3 in the paper) should be easy to evaluate, realizing that $e^{ix}$ has absolute value 1 for all real $x$ and that $\sum X_i < dk = n$. I am less sure how to obtain that the prerequsities (Eq 1 and 2) hold for your case. $\endgroup$ Commented Jul 18 at 20:16

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.