Given probability space $(\Omega, \mathcal{F}, \mathbb{P})$, I can understand the definition of conditional expectation $\mathbb{E}[X\mid \mathcal{G}]$, where $\mathcal{G}$ is a sub $\sigma$-algebra of $\mathcal{F}$.
If we define $\mathbb{P}(A\mid\mathcal{G}) := \mathbb{E}[\mathbb{I}_A\mid\mathcal{G}]$, we get $\mathbb{P}(A\mid\mathcal{G})$ as a conditional probability.
So, why do we need to define regular conditional probability?
Can I explain the reason like this?
The conditional probability $\mathbb{P}(A\mid\mathcal{G})$ satisfies:
- $\mathbb{P}(\Omega\mid\mathcal{G})=1$ a.s.;
- $\mathbb{P}(A\mid\mathcal{G})\ge 0$ a.s.;
- $\mathbb{P}(\sum_n A_n\mid\mathcal{G}) = \sum_n \mathbb{P}(A_n\mid\mathcal{G})$ a.s. for non-intersecting sets $\{A_n: n\ge 1\}$.
Taking $\mathbb{P}(\omega, A)$ as a representative of $\mathbb{P}(A\mid\mathcal{G})$ (since it is an equivalence class of random variable), we hope that $\mathbb{P}(\omega, \cdot):\mathcal{F}\to\mathbb{R}$ is almost surly a probability measure. It easily satisfies $\mathbb{P}(\omega, \Omega)=1$ and $\mathbb{P}(\omega, A)\ge 0$ a.s., but has difficulty on countable-additivity.
To satisfy $\mathbb{P}(\omega, \sum_n A_n) = \sum_n\mathbb{P}(\omega, A_n)$ a.s., we must exclude a series of null sets $\{N_n: n\ge 1\}$; and for another series of $\{B_n: n\ge 1\}$, to satisfy $\mathbb{P}(\omega, \sum_n B_n) = \sum_n\mathbb{P}(\omega,B_n)$ a.s., we must exclude another series of null sets $\{M_n: n\ge 1\}$... Then totally we need to exclude a big set $$ \left(\bigcup_{n=1}^\infty N_n\right) \cup \left(\bigcup_{n=1}^\infty M_n\right) \cup \cdots $$ which may no longer be a null set.
To deal with this difficult, we introduce the concept of regular conditional probability $\mathbb{P}(\cdot, \cdot): \Omega\times\mathcal{G}\to[0,1]$, such that
- $\mathbb{P}(\omega,\cdot)$ is a probability measure on $\mathcal{G}$ for every $\omega\in\Omega$.
- $\mathbb{P}(\cdot,A)$ is a measurable function on $(\Omega,\mathcal{G})$ for every $A\in\mathcal{G}$, and $\mathbb{P}(\omega,A)=\mathbb{P}(A\mid\mathcal{G})$ a.s.
Since we can tolerance difference between $\mathbb{P}(\omega,A)$ and $\mathbb{P}(A\mid\mathcal{G})$ on null sets, we can just directly remove the "almost surly" suffix of $\mathbb{P}(\omega,\Omega)=1$ and $\mathbb{P}(\omega,A)\ge 0$.
Please point out my mistakes.