Topic 4: Statistical Independence

## Definition

People generally have an idea of the concept of independence, but we will formalize it for probability theory here.

Definition $\qquad$ Given (S,F,P) and A, B ∈ F, events A and B are statistically indendent iff

$P(A\cap B) = P(A)P(B)$

It may seem more intuitive if we consider that, from the definition above, we can say that A and B are statistically independent iff P(A|B)P(B) = P(A)P(B), which means P(A|B) = P(A)

Notation $\qquad$ A, B are independent ⇔

$A \perp\!\!\!\perp B$

Note $\qquad$ We will usually refer to statistical independence as simply independence in this course.

Note $\qquad$ If A and B are independent, then A' and B are also independent.

Proof:

\begin{align} B&=(A\cap B)\cup(A'\cap B) \\ \Rightarrow P(B)&=P(A\cap B)+P(A'\cap B) \\ &=P(A)P(B) + P(A'\cap B) \end{align}
\begin{align} \Rightarrow P(A'\cap B) &= P(B) - P(A)P(B) \\ &= P(B)[1-P(A)] \\ &= P(B)P(A') \end{align}

Definition $\qquad$ Events $A_1,A_2,...,A_n$ are statistically independent iff for any $k=1,2,...,n$, and any 1 ≤ $j_1,j_2,...,j_k$ ≤ n, for any finite n, we have that

$P(A_{j_1}\cap...\cap A_{j_k}) = \prod_{i=1}^k P(A_{j_i})$

Note that to show that a collection of sets is disjoint, we only need to consider pairwise intersections of the sets. But to show that a collection of sets is independent, we need to consider every possible combination of sets from the collection. Then we say that the events are mutually independent. Mutual independence implies pairwise independence but the converse is not true.
Note that for a given n, there are $2^n-(n+1)$ combinations. To see this, use the Binomial Theorem:

$\sum_{k=0}^n {n\choose k}a^kb^{n-k} = (a+b)^n, \;\;\mbox{with}\; a=b=1$

Then

\begin{align} \sum_{k=0}^n {n\choose k}a^kb^{n-k} &= \sum_{k=0}^n {n\choose k} \\ &=2^n \end{align}

So,

\begin{align} \sum_{k=2}^n {n\choose k} &= {n\choose 2} + {n\choose 3} +...+{n\choose n} \\ &=2^n-{n\choose 0} - {n\choose 1} \\ &=2^n-(n+1) \end{align}

So there are $2^n-(n+1)$ combinations of sets that need to be checked for independence.
In practice, generally we do not check $2^n-(n+1)$ combinations for every n. Often, we just assume independence based on intuition and see if results are reasonable. We cannot however, use this approach in this class.

## Combined Experiments

Consider two experiments with probability spaces ($S_1,F_1,$P$_1$) and ($S_2,F_2,$P$_2$). Consider a combined experiment with probability space (S,F,P), where

\begin{align} \mathcal S &= \mathcal S_1 \times \mathcal S_2 \\ &= \{(\omega_1,\omega_2):\;\omega_1\in\mathcal S_1,\;\omega_2\in\mathcal S_2 \} \end{align}

where S is the cartesian product of $S_1$ and $S_2$.
What should F be?
Let

$\mathcal F = \{A\times B:\;A\subset\mathcal S_1,\;B\subset\mathcal S_2\}$

Or, let

$\mathcal F = \{A\times B:\;A\in\mathcal F_1,\;B\in\mathcal F_2\}$

Note that for discrete S, these are typically the same. But ∃ subsets of S = $S_1$×$S_2$ that are not in F for both of these definitions, some of which might be events we care about.

Example
Experiment 1: Toss a coin
Experiment 2: Roll a die

\begin{align} \mathcal S_1 &= \{H,T\} \\ \mathcal S_2 &= \{1,2,3,4,5,6\} \\ \Rightarrow\mathcal S &= \mathcal S_1\times\mathcal S_2 \\ &=\{(i,j):\;i=H,T; \; j=1,2,...,6\} \end{align}
\begin{align} \mathcal F_1 &= \{ \{H\},\; \{T\},\;\mathcal S_1=\{H,T\},\;\varnothing \} \\ \mathcal F_2 &= \mathcal{P(S_2)} \end{align}

If we let

$\mathcal F= \{A\times B:\;A\in \mathcal F_1;\;\;B\in \mathcal F_2\}$

then, the event $\{(H,2),(T,6)\}$ is not in F, since $\{(H,2),(T,6)\} $$AxB for any A ∈ F_1 , B ∈ F_2 . Instead, we let \mathcal F = \sigma(\{ A\times B:\;A\in\mathcal F_1;B\in\mathcal F_2 \}) This F contains the events we want. For example, consider \{(H,2),(T,6)\} . Let A = {H} ∈ F_1 , B = {2} ∈ F_2 , C = {T} ∈ F_1 , D = {6} ∈ F_2 Then, AxB ={(H,2)} ∈ F and CxD ={(T,6)} ∈ F. Since F is a \sigma -field, {(H,2)} ∪ {(T,6)} = {(H,2),(T,6)} ∈ F. Now, what about P? We need P(C) ∀C ∈ F. Start with P(AxS_2)=P_1(A) ∀A ∈ F_1 since P(AxS_2) is the event that A occurs in Experiment 1 and a "don't care" in Experiment 2. Also, let P(S_1xB)=P_2(B) ∀B ∈ F_2 . What about other Cs? We will assume that the two experiments are independent in the sense that A$$ F_1$, $B$$F_2$

$A\times\mathcal S_2 \perp\!\!\!\perp \mathcal S_1\times B$

Then, we have

\begin{align} P(A\times B) &= P((A\times \mathcal S_2)\cap(\mathcal S_2\times B)) \\ &= P(A\times \mathcal S_2)P(\mathcal S_1\times B) \\ &= P_1(A)P_2(B) \end{align}

So now we have that P(C) if

• C = Ax$S_2$ or C = $S_1$xB for some A ∈ $F_1$ or B ∈ $F_2$
• C = AxB for A ∈ $F_1$, B ∈ $F_2$.

Any $C$ that cannot be written as $AxB$ for some A ∈ $F_1$ and B ∈ $F_2$ can be written in terms of unions, intersections and complements of such $AxB$ sets, so the probability axioms can be used to define $P(C)$ in that case.
Note that the above discussion can be easily extended for n combined experiments.

## Bernoulli Trials

Bernoulli trials are an important combined experiment that models repeated independent trials of an experiment with one particular event of interest. Consider an experiment with probability space ($S_0,F_0,$P$_0$), and an event A ∈ $F_0$ with P(A) = p. For this combined experiment, let S = $S_0$ × ... × $S_0$ (n terms) ⇒ S = {n-tuples with each component an element of $S_0$}, where n is the number of trials. We are interested in the probability that A occurs k times in the n trials for k = 0,1,...,n.
Let $p_n(k)$ ≡ P(A occurs k times in n trials).
Then, it can be shown that

$p_n(k) = {n\choose k}p^k(1-p)^{n-k}$

Overview of proof: From combinatorics, There are $^nC_k$ (n choose k) ways to have k successes (A occurs) in n trials. Due to the independence of the trials, the probability that a give n combination occurs is $p^k(1-p)^{n-k}$. Since the collection of all combinations is disjoint,

\begin{align} p_n(k) &= \sum_{j=1}^{n\choose k}p^k(1-p)^{n-k} \\ &= {n\choose k} p^k(1-p)^{n-k} \end{align}

Example:
Consider a binary communication system.
Let A = {bit error} = {send 0, decode 1} ∪ {send 1, decode 0}
Let p = P(A) = $10^{-3}$
Transmit each bit 3 times and use the majority vote to decode.
Then, P(wrong decision) = P(2 or more bit errors) = $p_3(2)+p_3(3)$, which is approximately $3 x 10^{-6}$