Chapter 1 Measure Space
Sample space, $\sigma$-algebra, set function.
Probability Space: $(\Omega, \mathcal{F}, \mathbb{P})$
- $\Omega$: Sample Space
- $\mathcal{F}$: “Information”
- $\mathbb{P}$: “Probability”
We start with defining an essential information structure in measure theory, that is algebra and $\sigma$-algebra.
Definition 1 (Algebra)
A family of subsets of $\Omega$, $\mathcal{F}_0$, that satisfies to the following properties, is called an Algebra:
- $\Omega \in \mathcal{F}_0$
- $A \in \mathcal{F}_0 \implies A^c \in \mathcal{F}_0$
- $A, B \in \mathcal{F}_0 \implies A \cup B \in \mathcal{F}_0$
Additionally, 2. and 3. $\implies A \cap B \in \mathcal{F}_0$ because of the De Morgan’s laws.
Examples of Algebra
- “Power Set”: $2^\Omega$
- “Trivial algebra”: $\lbrace\Omega, \emptyset\rbrace$
$\sigma$-algebra allows us to “take the limit of a sequence of events”, and the definition is as follows.
Definition 2 ($\sigma$-algebra)
A family of subsets of $\Omega$, $\mathcal{F}$, that satisfies to the following properties, is called a $\sigma$-algebra:
- $\Omega \in \mathcal{F}$
- $A \in \mathcal{F} \implies A^c \in \mathcal{F}$
-
Let $(A_n)_{n \in \mathbb{N}}$ be a sequence of events,
\[\forall n \in \mathbb{N} : A_n \in \mathcal{F} \implies \bigcup_{n\in \mathbb{N}} {A_n} \in \mathcal{F}\]
Same additionally, 2. and 3. $\implies \bigcap_{n \in \mathbb{N}} A_n \in \mathcal{F}$ because of the De Morgan’s laws. Thus, a $\sigma$-algebra on $\Omega$ is a family of subsets of $\Omega$ “stable under any countable collection of set operations”.
Definition 3 (Set Functions on Algebras)
Let $\Omega$ be a set, let $\mathcal{F}_0$ be an algebra on $\Omega$, and let $\mu_0$ be a non-negative set function on $\mathcal{F}_0$, that is $\mu_0: \mathcal{F}_0 \rightarrow [0, \infty]$, if it satisfies the following properties:
- $\mu_0(\emptyset) = 0$
-
“Finite Additivity”: For a sequence of events $(A_n)_{n=1}^m$, $\forall n=1,…,m : A_n \in \mathcal{F}_0$, and $A_n$ are mutually disjoint,
\[\mu_0\left(\bigcup_{n=1}^{m} A_n\right) = \sum_{n=1}^{m} \mu_0(A_n)\]
Furthermore, we call set functions on $\sigma$-algebras measures on $\sigma$-algebras.
Definition 4 (Measure of $\sigma$-algebra)
Let $\Omega$ be a set, let $\mathcal{F}$ be a $\sigma$-algebra on $\Omega$, and let $\mu$ be a non-negative set function on $\mathcal{F}$, that is $\mu: \mathcal{F} \rightarrow [0, \infty]$, if it satisfies the following properties:
- $\mu(\emptyset) = 0$
-
“Countable Additivity”: For a sequence of events $(A_n)_{n \in \mathbb{N}}$, $\forall n \in \mathbb{N} : A_n \in \mathcal{F}$, and $A_n$ are mutually disjoint,
\[\mu\left(\bigcup_{n=1}^{\infty} A_n\right) = \sum_{n=1}^{\infty} \mu(A_n)\]
Definition 5 (Concerning Measures)
- Finite measure: $\mu(\Omega) < \infty$
-
$\sigma$-finite measure:
\[\exists (A_n)_{n \in \mathbb{N}} : \Omega = \bigcup_{n=1}^{\infty} A_n \text{ and } \mu(A_n) < \infty (\forall n \in \mathbb{N})\]- E.g., let $\Omega = \mathbb{N}$, let $\mathcal{F} = 2^{\Omega} = 2^{\mathbb{N}}$, and let $\mu(A) = \lvert A\rvert$ and $\mu(\Omega) = \infty$, but for $A_n = \lbrace n\rbrace, \mu(A_n) < \infty$, that is $\mu$ is $\sigma$-finite measure.
- Probability measure: $\mu(\Omega) = 1$. In fact, there’s no intrinsic but only scaling difference. We denote probability measure as $\mathbb{P}$.
Definition 6 (Concerning Spaces)
- Measurable space: $(\Omega, \mathcal{F})$
- Measure space: $(\Omega, \mathcal{F}, \mu)$
- Probability space (probability triple): $(\Omega, \mathcal{F}, \mathbb{P})$
For now, we have defined a delicate and important information structure $\sigma$-algebra on $\Omega$, usually denote as $\mathcal{F}$. To measure the information in a trivial manner, we defined a non-negative set function $\mu$ that follows additivity for the unions of events, which can be recognised as the measure of $\mathcal{F}$, and esspecially it’s called probability measure $\mathbb{P}$ when $\mu(\Omega)=1$.
However, $\sigma$-algebra is so complicated that it is usually impossible to write down the typical element of a $\sigma$-algebra. Hence, a simpler “information” structure is needed, which could be used to generate a $\sigma$-algebra.
$\mathcal{A}:=$ a collection of subsets of $\Omega$
We are looking at the $\sigma$-algebra generated by collection $\mathcal{A}$.
Definition 7 (Generation of $\sigma$-algebra)
Let $\mathcal{A}$ be a collection of subsets of $\Omega$. Then $\sigma(\mathcal{A})=\mathcal{F}$, the $\sigma$-algebra generated by $\mathcal{A}$, is the smallest $\sigma$-algebra $\mathcal{F}$ on $\Omega$ such that $\mathcal{A}\subseteq\mathcal{F}$, that is:
- $\mathcal{F}\supseteq\mathcal{A}$
- For all $\sigma$-algebra $\mathcal{F}^{\prime}$, $\mathcal{F}^{\prime}\supseteq\mathcal{A}\implies\mathcal{F}^{\prime}\supseteq\mathcal{F}$
The generation of $\sigma$-algebra can also be extended to the generation of other information structures.
In particular, $\sigma(\mathcal{A})$ is the intersection of all $\sigma$-algebras on $\Omega$ which have $\mathcal{A}$ a subclass, that is \(\sigma(\mathcal{A})=\bigcap_{\mathcal{A}\subset\mathcal{F}}\mathcal{F}\) where $\mathcal{F}$ is all of the $\sigma$-algebras on $\Omega$ which have $\mathcal{A}$ a subclass, and the resulting structure is a $\sigma$-algebra. (Try to prove it!!)
Example of the generation of $\sigma$-algebra (Borel Algebra)
Let $\Omega$ be a topological space. Take into account the open sets. $\mathcal{B}(\Omega)=\sigma(\text{all open sets})$, where $\mathcal{B}$ is called Borel algebra.
For the moral of $\sigma$-algebras are “difficult”, but $\pi$-systems and $\lambda$-systems are “easy”, so we aim to work with the latter.
Definition 8 ($\pi$-system)
Let $\Omega$ be a set. Let $\mathcal{C}$ be a $\pi$-system on $\Omega$, that is a collection of subsets of $\Omega$ stable under finite intersections:
\[A_1,A_2\in\mathcal{C}\implies A_1\cap A_2\in\mathcal{C}\]Example of $\pi$-system
Let $\Omega=(0,1]$, and let $\mathcal{C}=\{(x,1]:x\in(0,1]\}$ is a $\pi$-system, but not $\sigma$-algebra.
Definition 9 ($\lambda$-system)
Let $\Omega$ be a set. Let $\mathcal{L}$ be a $\lambda$-system on $\Omega$, that is a collection of subsets of $\Omega$ stable under the followings:
- $\Omega\in\mathcal{L}$
- $A,B\in\mathcal{L},A\subset B\implies B\setminus A\in\mathcal{L}$
-
Let $(A_n)_{n\in \mathbb{N}}\in \mathcal{L}$, then $\forall n\in\mathbb{N}$,
\[A_n\subset A_{n+1}\implies \bigcup_{n=1}^{\infty}A_n\in\mathcal{L}\]
Theorem 1 ($\pi-\lambda$ theorem) / Dynkin’s Lemma
Let $\mathcal{C}$ is a $\pi$-system, then
\[\sigma(\mathcal{C})=\lambda(\mathcal{C})\]where $\lambda(\mathcal{C})$ is the smallest $\lambda$-system that contains $\mathcal{C}$.
Theorem 2 (Corollary)
Let $\mathcal{C}$ be a $\pi$-system on $\Omega$. Let $\mu_1$ and $\mu_2$ are two measures such that $\mu_1=\mu_2$ on $\mathcal{C}$ ($\mu_1(\Omega),\mu_2(\Omega)<\infty$). Then, $\mu_1=\mu_2$ on $\sigma(\mathcal{C})$.
Proof of theorem 2
Let $\mathcal{L}$ be a collection of subsets of $\Omega$, and $\mathcal{L}=\lbrace A\in\sigma(\mathcal{L}): \mu_1(A)=\mu_2(A) \rbrace$. We claim that $\mathcal{L}$ is a $\lambda$-system (Prove it!!). Since $\mathcal{L}$ is a $\lambda$-system contains $\mathcal{C}$, it should contain the smallest $\lambda$-system containing $\mathcal{C}$, that is $\mathcal{L}\supset\lambda(\mathcal{C})=\sigma(\mathcal{C})$. Therefore, every set in the $\sigma$-algebra generated by $\mathcal{C}$ belongs to $\mathcal{L}$.
Theorem 2 can be implied that if two probability measures agree on a $\pi$-system, then they agree on the $\sigma$-algebra generated by that $\pi$-system.
Theorem 3 (Caratheodory’s Extension Theorem)
Let $\Omega$ be a set, let $\mathcal{F}_0$ be an algebra on $\Omega$, and let $\mathcal{F}=\sigma(\mathcal{F}_0)$. If $\mu_0$ is a countable additive set function $\mu_0: \mathcal{F}_0\rightarrow[0,\infty]$, then there exists a measure $\mu$ on $(\Omega, \mathcal{F}):\mu=\mu_0$ on $\mathcal{F}_0$.
Example of Caratheodory’s Extension Theorem (Lebesgue measure)
Let $\Omega=(0,1]$. Let
\[\mathcal{F}_0=\lbrace E:E=\bigcup_{i=1}^n(a_i,b_i],0\leq a_1\leq b_1\leq\cdots\leq a_n\leq b_n\leq1\rbrace.\]Then, $\mathcal{F}_0$ is an algebra on $(0,1]$, and $\mathcal{F}:=\sigma(\mathcal{F}_0)=\mathcal{B}(0,1]$. Let
\[\mu_0(E)=\sum_{i=1}^n{(b_i-a_i)}.\]Then, $\mu_0$ is well-defined and additive on $\mathcal{F}_0$. Moreover, $\mu_0$ is countable additive on $\mathcal{F}_0$ (Prove it!!). Hence, by Theorem 3, there exists a unique measure $\mu$ on $((0,1],\mathcal{B}(0,1])$ extending $\mu_0$ on $\mathcal{F}_0$. This measure $\mu$ is called Lebesgue measure on $((0,1],\mathcal{B}(0,1])$.
Theorem 4 (Monotone-Convergence of Measure)
- If $(A_n)\in\mathcal{F}\ (n\in\mathbb{N})$ and $A_n\uparrow A$, then $\mu(A_n)\uparrow\mu(A)$. Notes, $A_n\uparrow A$ means: $A_n\subseteq A_{n+1}\ (\forall n\in\mathbb{N}), \bigcup A_n=A$.
- If $(B_n)\in\mathcal{F}\ (n\in\mathbb{N})$, $B_n\downarrow B$ and $\mu(B_k)<\infty$ for some $k$, then $\mu(B_n)\downarrow\mu(B)$.
Example of Monotone-convergence of measure
Let $\Omega=\mathbb{R}$, let $\mu=Leb$, and let $A_n=(n,\infty)$. Then, $A_n\to\emptyset\ (n\to\infty)$, and $\mu(\emptyset)=0$.
Definition 10 (limsup & liminf)
We define
\[\begin{aligned} (E_n,i.o.)&:=(E_n\text{ infinitely often})\\ &:=\lim\sup{E_n}:=\bigcap_{n=1}^\infty\bigcup_{m=n}^\infty{E_m}\\ &=\{w: w\in E_n \text{ for infinitely many n}\}, \end{aligned}\] \[\begin{aligned} (E_n,\text{eventually})&:=\lim\inf E_n:=\bigcup_{n=1}^\infty\bigcap_{m=n}^\infty{E_n}\\ &=\{w:w\in E_n\text{ for all large n}\}. \end{aligned}\]Also, $\lim_n{A_n}$ exists, if $\lim\sup_n{A_n}=\lim\inf_n{A_n}$, and we can prove that $\lim\sup_n{A_n}=\lim\inf_n{A_n}=\lim_nA_n$.
Theorem 5 (Reverse Fatou Lemma)
Let $\mathbb{P}$ denotes the probability measure. We have
\[\mathbb{P}(\lim\sup{E_n})\geq\lim\sup\mathbb{P}(E_n).\]Proof of reverse Fatou lemma
Define $G_m:=\bigcup_{n\geq m}{E_n}$, we want to define $\lim_m{G_m}$. Noticing $G_m$ is a decreasing sequence, $G_m\downarrow G$, where $G:=\bigcap_{n\geq m}{E_n}=\lim\sup{E_n}$. Using the monotone convergence theorem, $\mathbb{P}(G_m)\downarrow\mathbb{P}(G)$. What’s more, we have $\mathbb{P}(G_m)\geq\sup_{n\geq m}{\mathbb{P}(E_n)}$. Take the limit on both sides, we have
\[\lim_m{\mathbb{P}(G_m)}\geq\lim_m\sup_{n\geq m}\mathbb{P}(E_n)\implies\mathbb{P}(G)\geq\lim\sup\mathbb{P}(E_n).\]Theorem 6 (First Borel-Cantelli Lemma, BC1)
Let $(E_n)_{n\in\mathbb{N}}$ be a sequence of events. Then, we have
\[\sum_{n\geq 1}{\mathbb{P}(E_n)<\infty}\implies \mathbb{P}(E_n,\ i.o.)=0.\]Proof of BC1
$\sum_{m\geq n}\mathbb{P}(E_n)\to0\quad(m\to\infty)$.
$\mathbb{P}(G)\leq\mathbb{P}(G_m)=\mathbb{P}(\bigcup_{n\geq m}{E_n})\leq\sum_{n\geq m}\mathbb{P}(E_n)\to 0\quad(m\to\infty)$.
Theorem 7 (Fatou Lemma)
(It doesn’t require $\mu$ to be $\mathbb{P}$, compared to the reverse Fatou lemma)
Let $\mu$ be a measure. $\mu(\lim\inf{E_n})\leq\lim\inf\mu(E_n)$.