1
$\begingroup$

Unbiased Variance Estimator

Let $x_1 , \ldots, x_N$ be iid sampled from X. Let Y(N) denote the N-mean estimator given by

$$ Y(N) = \frac{1}{N} \sum_{i=1}^N x_i $$

Let v(N) denote the unbiased N-variance estimator

$$ v(N) = \frac{1}{N-1} \sum_{i=1}^N (x_i - Y(N))^2 $$

Unbiased Variance Estimator with reused samples

I draw d additional samples $x_{N+1} , \ldots, x_{N+d}$ that are iid sampled from X. Similarly, let $Y(N+d)$ denote the $(N+d)$-mean estimator (with reused samples).

$$ Y(N+d) = \frac{1}{N+d} \sum_{i=1}^{(N+d)} x_i$$

Let v(N+d) denote the unbiased (N+d)-variance estimator with reused samples,

$$v(N+d) = \frac{1}{N+d-1} \sum_{i=1}^{(N+d)} (x_i - Y(N+d))^2$$

Covariance of the Variance Estimators with shared samples

My question is, what is the Covariance of these two estimators? $$\mathrm{Cov}[ v(N), v(N+d) ] = ??? $$

From E. Benhamou. “A few properties of sample variance" equation 10, I know that the variance of the estimators is given by $$\mathrm{Var}[ v(N) ] = \frac{1}{N}\left( m_4 - \frac{N-3}{N-1} m_2^2 \right)$$ where $m_2$ and $m_4$ are the second-order and fourth-order moments of X. I tried to use the fact that v is a U-statistic but I ended up with a quadruple summation that I wasn't sure how to simplify. Is this a standard result? Can I pull a citation from somewhere or is there something that I can draw from?

$\endgroup$
5
  • $\begingroup$ it's not difficult but very calculational. $\endgroup$ Commented Nov 28 at 17:46
  • $\begingroup$ @NN2 Is there a standard reference that I can use for this particular result? I would rather not spend a massive amount of time deriving something which is well known. $\endgroup$ Commented Nov 28 at 20:19
  • $\begingroup$ I don't have. I calculated something like this by myself in my work several years ago. You need to break down $Y(N)$ and rewrite $v(N)$ in term of $(x_i)_i$ only. $\endgroup$ Commented Nov 28 at 21:08
  • $\begingroup$ I think that the approach with U-statistics can work. We can decompose the centered versions of $v(N)$ and $v(N+d)$ as sums of pairwise orthogonal random variables. I will try to find time to write it down these days. $\endgroup$ Commented Nov 29 at 9:56
  • $\begingroup$ @NN2 thank you for the response. I took the time and I did the full breakdown. $\endgroup$ Commented Nov 29 at 22:33

1 Answer 1

0
$\begingroup$

We have the following formula: \begin{equation} \begin{aligned} \left(N+d-1\right)v(N+d) = & \left(N-1\right)v(N) +\sum\limits_{i=N+1}^{N+d} \left(x_i - \frac{1}{d}\sum\limits_{j=N+1}^{N+d} x_{j} \right)^2 \\ &+\dfrac{d N}{(N+d)}\left(Y(N)-\frac{1}{d}\sum\limits_{j=N+1}^{N+d} x_{j}\right)^2 \end{aligned} \end{equation} The third term of this expression is independent of the data used in $v(N)$, and thus the covariance with respect to $v(N)$ is equal to zero. The fourth term of this expression is a quadratic that may be expanded in the following manner \begin{equation} \left(Y(N)-\frac{1}{d}\sum\limits_{j=N+1}^{N+d} x_{j}\right)^2 = Y^2(N) - 2\left(\frac{1}{d}\sum\limits_{j=N+1}^{N+d} x_{j}\right)Y(N) + \left(\frac{1}{d}\sum\limits_{j=N+1}^{N+d} x_{j}\right)^2. \end{equation} The third term of this quadratic is independent of the data used in $v(N)$, and thus the covariance with respect to $v(N)$ is equal to zero. Similarly, by independence, we may factor out the $\frac{1}{d}\sum\limits_{j=N+1}^{N+d} x_{j}$-summation when considering the covariance with respect to $v(N)$. As this summation is an unbiased estimator of the mean, its expectation is equal to $\mu$. Consequently, taking the covariance of the expression above with respect to $v(N)$ and applying bilinearity yields \begin{equation} \begin{aligned} \left(N+d-1\right){\mathrm{Cov}}[v(N+d),v(N)] = & \left(N-1\right){\mathrm{Var}}[v(N)]+\dfrac{d N}{(N+d)}{\mathrm{Cov}}[Y^2(N),v(N)] \\ &- \dfrac{2\mu d N}{(N+d)}{\mathrm{Cov}}[Y(N),v(N)]. \end{aligned} \end{equation} We may use the definition of the covariance, and the unbiased nature of the estimators $Y$ and $v$, to deduce that \begin{equation} {\mathrm{Cov}}[Y^2(N),v(N)] = \mathbb{E}[Y^2(N)v(N)] - \mathbb{E}[Y^2(N)]\sigma^2, \qquad {\mathrm{Cov}}[Y(N),v(N)] = \mathbb{E}[Y(N)v(N)] - \mu \sigma^2. \end{equation} Substituting these expressions into the above we have that \begin{equation} \begin{aligned} \left(N+d-1\right){\mathrm{Cov}}[v(N+d),v(N)] = & \left(N-1\right){\mathrm{Var}}[v(N)] +\dfrac{d N}{(N+d)}\mathbb{E}[(Y^2(N)-2\mu Y(N))v(N)] \\ &- \dfrac{d N}{(N+d)}\mathbb{E}[Y^2(N)-2\mu^2]\sigma^2. \end{aligned} \end{equation} We may further simplify these expressions using linearity and the unbiased nature of the estimators: \begin{equation} \begin{aligned} \mathbb{E}[(Y^2(N)-2\mu Y(N))v(N)] &= \mathbb{E}[(Y(N)-\mu)^2 v(N)] - \mu^2 \sigma^2, \\ \mathbb{E}[Y^2(N)-2\mu^2]\sigma^2 &= \mathbb{E}[Y^2(N)-2\mu Y(N)]\sigma^2 = \mathbb{E}[(Y(N)-\mu)^2]\sigma^2 - \mu^2 \sigma^2. \end{aligned} \end{equation} Substituting these expressions into our formula for the covariance implies that \begin{equation} \begin{aligned} \left(N+d-1\right){\mathrm{Cov}}[v(N+d),v(N)] = & \left(N-1\right){\mathrm{Var}}[v(N)] +\dfrac{d N}{(N+d)}\mathbb{E}[(Y(N)-\mu)^2 (v(N)-\sigma^2)]. \end{aligned} \end{equation} To evaluate this expression, we decompose both estimators to their summations with respect to $x_i$. The variance estimator may be expressed in the following manner: \begin{equation} \begin{split} (N-1)v(N) & = \sum\limits_{i=1}^N ((x_i-\mu)-(Y(N)-\mu))^2= \sum\limits_{i=1}^N (x_i-\mu)^2- N(Y(N)-\mu)^2, \\ (Y(N)-\mu)^2 v(N) &=\dfrac{1}{N-1} (Y(N)-\mu)^2\sum\limits_{i=1}^N (x_i-\mu)^2- \dfrac{N}{N-1}(Y(N)-\mu)^4. \end{split} \end{equation} Similarly, the expectation estimator satisfies \begin{equation} \begin{split} (Y(N)-\mu)^2 =& \dfrac{1}{N^2}\sum\limits_{i=1}^N (x_i-\mu)^2 + \dfrac{2}{N^2}\sum\limits_{i=1}^N \sum\limits_{j<i} (x_i-\mu)(x_j-\mu), \\ (Y(N)-\mu)^2\sum\limits_{k=1}^N (x_k-\mu)^2 =& \dfrac{1}{N^2}\left(\sum\limits_{i=1}^N (x_i-\mu)^2\right)^2 + \dfrac{2}{N^2}\sum\limits_{i=1}^N \sum\limits_{j<i} \sum\limits_{k=1}^N (x_i-\mu)(x_j-\mu)(x_k-\mu)^2, \\ (Y(N)-\mu)^2\sum\limits_{k=1}^N (x_k-\mu)^2 =& \dfrac{1}{N^2}\sum\limits_{i=1}^N (x_i-\mu)^4 +\dfrac{2}{N^2}\sum\limits_{i=1}^N \sum\limits_{k<i} (x_i-\mu)^2(x_k-\mu)^2 \\ &+ \dfrac{2}{N^2}\sum\limits_{i=1}^N \sum\limits_{j<i} (x_i-\mu)^3(x_j-\mu)+ \dfrac{2}{N^2}\sum\limits_{i=1}^N \sum\limits_{j<i} (x_i-\mu)(x_j-\mu)^3 \\ &+ \dfrac{2}{N^2}\sum\limits_{i=1}^N \sum\limits_{j<i}\sum\limits_{\substack{ k\neq i \\ k\neq j}} (x_i-\mu)(x_j-\mu)(x_k-\mu)^2. \end{split} \end{equation} As a consequence of independence, linearity of expectation, and $\mathbb{E}[x-\mu]=0$, we have that \begin{equation} \begin{split} \mathbb{E}\left[(Y(N)-\mu)^2\sum\limits_{k=1}^N (x_k-\mu)^2\right] =& \dfrac{1}{N^2}\sum\limits_{i=1}^N \mathbb{E}[(x_i-\mu)^4] +\dfrac{2}{N^2}\sum\limits_{i=1}^N \sum\limits_{k<i} \mathbb{E}[(x_i-\mu)^2]\mathbb{E}[(x_k-\mu)^2] , \\ \mathbb{E}\left[(Y(N)-\mu)^2\sum\limits_{k=1}^N (x_k-\mu)^2\right] =& \dfrac{\mu_4}{N}+\dfrac{(N-1)}{N}\sigma^4. \end{split} \end{equation} Additionally, the expectation estimator satisfies \begin{equation} \begin{split} N^4(Y(N)-\mu)^4 =& \sum\limits_{i=1}^N (x_i-\mu)^4 + 4\sum\limits_{i\neq j} (x_i-\mu)^3(x_j-\mu) + 6 \sum\limits_{i < j} (x_i-\mu)^2(x_j-\mu)^2 \\ & + 12 \sum\limits_{\substack{i,j,k \\ {\mathrm{distinct}}}} (x_i-\mu)^2(x_j-\mu)(x_k-\mu)+ 24 \sum\limits_{\substack{i,j,k,l \\ {\mathrm{distinct}}}} (x_i-\mu)(x_j-\mu)(x_k-\mu)(x_l-\mu). \end{split} \end{equation} Thus, using the same logic as before, we have that \begin{equation} \begin{split} \mathbb{E}[(Y(N)-\mu)^4] =& \dfrac{\mu_4}{N^3}+ \dfrac{3(N-1)}{N^3} \sigma^4. \end{split} \end{equation} Consequently we have that \begin{equation} \begin{split} \mathbb{E}[(Y(N)-\mu)^2 v(N)] =& \dfrac{1}{N-1} \left(\dfrac{\mu_4}{N}+\dfrac{(N-1)}{N}\sigma^4\right)- \dfrac{N}{N-1}\left(\dfrac{\mu_4}{N^3}+ \dfrac{3(N-1)}{N^3} \sigma^4\right) \\ =& \left(\dfrac{\mu_4}{N^2}+\dfrac{N-3}{N^2}\sigma^4\right). \end{split} \end{equation} To complete this expression we recall the expansion for $(Y(N)-\mu)^2$ and use the same logic as the previous set of equations to deduce that \begin{equation} \begin{split} \mathbb{E}[(Y(N)-\mu)^2 \sigma^2] =& \dfrac{1}{N^2}\sum\limits_{i=1}^N \mathbb{E}[(x_i-\mu)^2]\sigma^2 + \dfrac{2}{N^2}\sum\limits_{i=1}^N \sum\limits_{j<i} \mathbb{E}[(x_i-\mu)]\mathbb{E}[(x_j-\mu)]\sigma^2 \\ =& \dfrac{1}{N}\sigma^4. \end{split} \end{equation} Hence we have that \begin{equation} \begin{split} \mathbb{E}[(Y(N)-\mu)^2 (v(N)-\sigma^2)] =& \dfrac{1}{N^2}\left(\mu_4-3\sigma^4\right), \\ \left(N+d-1\right){\mathrm{Cov}}[v(N+d),v(N)] = & \left(N-1\right){\mathrm{Var}}[v(N)] +\dfrac{d}{N(N+d)}\left(\mu_4-3\sigma^4\right). \end{split} \end{equation} Using this result completes the proof of the expression above.

Taking the covariance of the above expression with respect to $v(N)$, we may use bilinearity and independence to deduce \begin{equation} \begin{aligned} \left(N+d-1\right)\mathrm{Cov}[v(N+d),v(N)] = \left(N-1\right)\mathrm{Var}[v(N)] +\dfrac{d N}{(N+d)}\mathrm{Cov}\left(\left(Y(N)-\frac{1}{d}\sum\limits_{j=N+1}^{N+d} x_{j}\right)^2,v(N)\right). \end{aligned} \end{equation} Expanding the quadratic and again using independence yields \begin{equation} \begin{aligned} \left(N+d-1\right)\mathrm{Cov}[v(N+d),v(N)] = \left(N-1\right)\mathrm{Var}[v(N)] \\+\dfrac{d N}{(N+d)}\mathrm{Cov}\left(Y^2(N),v(N)\right)-\dfrac{2d N}{(N+d)}\mu \mathrm{Cov}\left(Y(N),v(N)\right). \end{aligned} \end{equation} Isolating both covariance terms we have \begin{equation} \begin{aligned} \mathrm{Cov}\left(Y^2(N),v(N)\right) =& \mathbb{E}\left[Y^2(N)v(N)\right] - \mathbb{E}\left[Y^2(N)\right]\mathbb{E}\left[v(N)\right] \\ =& \mathbb{E}\left[Y^2(N)v(N)\right] - \left(\frac{\sigma^2}{N}+\mu^2\right)\sigma^2, \\ \mathrm{Cov}\left(Y(N),v(N)\right) =& \mathbb{E}\left[Y(N)v(N)\right] - \mathbb{E}\left[Y(N)\right]\mathbb{E}\left[v(N)\right] \\ =& \mathbb{E}\left[Y(N)v(N)\right] - \mu \sigma^2. \end{aligned} \end{equation} Substituting these expressions into the covariance formula, we obtain \begin{equation} \begin{aligned} \left(N+d-1\right)\mathrm{Cov}[v(N+d),v(N)] = \left(N-1\right)\mathrm{Var}}[v(N)] \\+\dfrac{d N}{(N+d)}\left(\mathbb{E}[Y^2(N)v(N)]-2\mu \mathbb{E}[Y(N)v(N)] - \frac{\sigma^4}{N}+\mu^2 \sigma^2\right). \end{aligned} \end{equation} To simplify the expression further, \begin{equation} \mathbb{E}[Y^2(N)v(N)]-2\mu \mathbb{E}[Y(N)v(N)] = \mathbb{E}[(Y(N)-\mu)^2v(N)] - \mu^2\sigma^2. \end{equation} Therefore, \begin{equation} \begin{aligned} \left(N+d-1\right)\mathrm{Cov}[v(N+d),v(N)] = \left(N-1\right)\mathrm{Var}}[v(N)] +\dfrac{d N}{(N+d)}\left(\mathbb{E}[(Y(N)-\mu)^2v(N)] - \frac{\sigma^4}{N}\right). \end{aligned} \end{equation} The variance estimator decomposition allows us to express $v(N)$ as \begin{equation} (N-1)v(N) = \sum\limits_{i=1}^N (x_i-Y(N))^2 = \sum\limits_{i=1}^N ((x_i-\mu)-(Y(N)-\mu))^2. \end{equation} Expanding the square gives \begin{equation} \begin{split} \sum\limits_{i=1}^N ((x_i-\mu)-(Y(N)-\mu))^2 &= \sum\limits_{i=1}^N (x_i-\mu)^2 - 2\sum\limits_{i=1}^N (x_i-\mu)(Y(N)-\mu) + \sum\limits_{i=1}^N (Y(N)-\mu)^2 \\ &= \sum\limits_{i=1}^N (x_i-\mu)^2 - 2 N (Y(N)-\mu)^2 + N (Y(N)-\mu)^2 \\ &= \sum\limits_{i=1}^N (x_i-\mu)^2 - N (Y(N)-\mu)^2. \end{split} \end{equation} Hence, the sample variance can be rewritten as \begin{equation} v(N) = \frac{1}{N-1} \left( \sum\limits_{i=1}^N (x_i-\mu)^2 - N (Y(N)-\mu)^2 \right). \end{equation} Now, taking the expectation of $(Y(N)-\mu)^2 v(N)$ using this decomposition yields \begin{equation} \begin{split} \mathbb{E}[(Y(N)-\mu)^2 v(N)] &= \frac{1}{N-1} \left( \mathbb{E}[(Y(N)-\mu)^2 \sum\limits_{i=1}^N (x_i-\mu)^2] - N \mathbb{E}[(Y(N)-\mu)^4] \right) \\ &= \frac{1}{N-1} \left( N \mathbb{E}[(Y(N)-\mu)^2 (x_1-\mu)^2] - N \mathbb{E}[(Y(N)-\mu)^4] \right) \\ &= \frac{N}{N-1} \left( \mathbb{E}[(Y(N)-\mu)^2 (x_1-\mu)^2] - \mathbb{E}[(Y(N)-\mu)^4] \right). \end{split} \end{equation} Because of the i.i.d. assumption, \begin{equation} \mathbb{E}[(Y(N)-\mu)^2 (x_1-\mu)^2] = \frac{1}{N^2} \mathbb{E}[(x_1-\mu)^4] + \frac{2}{N^2} \sum_{j>1} \mathbb{E}[(x_1-\mu)^2 (x_j-\mu)^2]. \end{equation} Noting that $\mathbb{E}[(x_1-\mu)^2 (x_j-\mu)^2] = \sigma^4$ for $j \neq 1$, and $\mathbb{E}[(x_1-\mu)^4] = \mu_4$, we have \begin{equation} \mathbb{E}[(Y(N)-\mu)^2 (x_1-\mu)^2] = \frac{\mu_4}{N^2} + \frac{N-1}{N^2} \sigma^4. \end{equation} Substituting back, we obtain \begin{equation} \mathbb{E}[(Y(N)-\mu)^2 v(N)] = \frac{N}{N-1} \left( \frac{\mu_4}{N^2} + \frac{N-1}{N^2} \sigma^4 - \frac{\mu_4}{N^3} - \frac{3(N-1)}{N^3} \sigma^4 \right) = \frac{\mu_4}{N^2} + \frac{N-3}{N^2} \sigma^4. \end{equation} Then the covariance between the two variance estimators is given by \begin{equation} \begin{split} {\mathrm{Cov}}[v(N+d),v(N)] = & \frac{N-1}{N+d-1}\frac{1}{N}\left(\mu_4 - \dfrac{N-3}{N-1}\sigma^4\right) +\dfrac{d}{N(N+d)(N+d-1)}\left(\mu_4-3\sigma^4\right). \end{split} \end{equation} 
$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.