Topic 2: Probability Spaces

## Definitions and Notation

Definition $\qquad$ An outcome is a possible result of a random experiment. Different outcomes are mutually exclusive; only one outcome can occur on each trial. The collection of all possible outcomes forms the sample space. e.g. When you roll a six-sided fair die and record which surface faces up, the sample space is S = {1,2,3,4,5,6} e.g. when you flip two coins, the sample space is S = {HH,HT,TH,TT}

Definition $\qquad$ An event is a set of outcomes of an experiment to which a probability is assigned. A single outcome can be an element in several different outcomes. E.g. in the die rolling experiment, the set {1,2} describes the event that a number less than three is rolled. The set {2,4,6} describes the event that an even number is rolled. The number 2 is a possible outcome in both events Typically, when S is finite, we can use to the power set of S, denoted P(S) as the set of all events. Then, an event is a subset of S. If S contains n outcomes, then the power set contains $2^n$ events including the empty set and S itself. This is not true however for all S for example when S is uncountable. We may want an event space that is smaller than the power set of such an S. We will examine this in more detail in our discussion of event spaces. An event may contain only one outcome. This event is then called an elementary event. It is important to be able to differentiate between an event and an outcome. The event {1} is not the same thing as the outcome 1.

The probability space is a mathematical construct built to model a random experiment. It consists of three parts:

1. The set of all possible outcomes, the sample space, S.
2. The set of events F or F(S), which is the collection of events to which we will assign probabilities
3. P the function mapping probabilities to the events.

## The Sample Space S

The sample space is a non empty set of elements called outcomes. We will call the sample space S. Note that S is not required to have any structure. This allows us to model any experiment as long as it has at least one outcome. The sample space may be discrete. In that case, S may be finite or countable. For example Experiment: Toss a coin ⇒ S={H,T} Experiment: count cars passing a certain point ⇒ S={0,1,2,...} In either case, we can list the elements of S as $\{\omega_1,\omega_2,...\}$

The sample space may be continuous such that S is uncountable. For example, in the experiment measuring the temperature in a room, S = [$T_{min},T_{max}$], where S is a closed interval on the real number line.

Note that our examples cover each type of set, finite, countable and uncountable respectively.

## The Event Space

The event space F or script F is the collection of subsets of S to which we will assign probabilities. Not all collections of subsets of S are valid event spaces. We require the event space to be a nonempty collection of subsets of S satisfying the following properties:
1. If $A$F, then $A^c$F(S).
2. If for finite n, $A_i$F, for all i = 1,2,...n, then

$\bigcup_{i=1}^n A_i \in \mathcal{F}$

3. If $A_i$F, for all i = 1,2,..., then

$\bigcup_{i=1}^{\infty} A_i \in \mathcal{F}$

Property 1 ensures that if we assign a probability to the occurrence of event $A$, then there will also be a probability of event $A$ not occurring. Properties 2 and 3 ensure that we can find the probability that at least one of a set of events occurs if each event has a probability associated with it.

In general, a collection of sets satisfying the properties above is referred to as a $\sigma$-field. A nonempty collection of sets satisfying properties 1 and 2 only is called a field. Another term used for $\sigma$-field in math is $\sigma$-algebra.

Note: it can be shown that property 3 implies property 2, so it is sufficient to show only properties 1 and 3 (in addition to the nonempty requirement) to show that a collection is a $\sigma$-field.

Equivalently, a collection of sets F forms a $\sigma$-field if:
1. F is nonempty ⇔ ∃ A⊂S such that A ∈ F.
2. If $A$F, then $A^c$ F(S); i.e. F is closed under complementation.
3. If $A_i$F, for all i = 1,2,..., then

$\bigcup_{i=1}^{\infty} A_i \in \mathcal{F}$

i.e. F is closed under countable unions, including finite unions.

Some properties of a $\sigma$-field (and hence our event space):
1. for any $\sigma$-field F, Ø ∈ F and SF
Proof: let $A$F (we know such an $A$ exists because F is nonempty). Then $A^c$ ∈ F by property 1 of the former definition of $\sigma$-fields and $A $$A^c F. So SF, and by property 1, S^c = Ø ∈ F. 2. If A,B F, then A$$ B$F.
Proof: by De Morgan's law,

\begin{align} A\cap B &= ((A\cap B)^c)^c \\ &= (A^c \cup B^c)^c \end{align}

but if $A,B$F, then $A^c, B^c$F so ($A^c $$B^c ) ∈ F and (A^c$$ B^c)^c$F. Thus $A $$B F. One can also show that A_i\in\mathcal{F}\;\forall i=1,2,...\;\; \Rightarrow \bigcap_{i=1}^{\infty}A_i \in \mathcal{F} (proof) Examples of event spaces: 1. F = {Ø,S} is a valid (though not very interesting) event space. It is the smallest possible space since F must be non empty. 2. For any space S, the collection of all subsets of S is a valid \sigma -field. This set is called the power set denoted \mathcal{P(S)} \; or \;2^{\mathcal{S}} For a finite or countable S, the power set is a valid event space, although it might not always be the best choice. However, for an uncountable S, the power set can be problematic. ## Event Space for an Uncountable Sample Space There are some rules we will impose on our probability mappings (to be discussed shortly in the section on probability mapping). It can be shown that if S is uncountable, we cannot satisfy those rules if the event space is the power set of S. We will construct an event space for S in the reals, to address these problems. We will not address the general case of uncountable sample spaces, but the construction here will be easily adaptable if S is an interval on the real number line or if S=R ^n for finite n. Definition \qquad given S, consider a family of subsets of S, G, where G = \{A_i, \; A_i \subset \mathcal{S}\;\forall i \in I\} Note that G is not necessarily a \sigma -field. The \sigma -field generated by G, denoted \sigma(G) , is the smallest \sigma -field containing every element of G. In other words, for any \sigma -field \mathcal{G} containing G, \sigma(G) \subset \mathcal{G} We can also write this as \sigma(G) = \bigcap_{i\in I_G} \mathcal{F}_i where { F_i, i$$ I_G$} is the set of all $\sigma$-fields containing G.
When S is the set of reals, we will let G = {open intervals in the set of reals}; ie

$G = \{(a,b): \; a\in\mathbb{R},\; b\in\mathbb{R},\; a<b\}$

then let

$\mathcal{F} = \sigma(G)$

The $\sigma$-field $\sigma(G)$, where G is the collection of all open intervals in the set of reals, is called the Borel field. We will always use the Borel field as our event space if S is the set of all real and denote it as

$B(\mathbb{R})$

So,

$\mathcal{S} = \mathbb{R} \; \Rightarrow \;\mathcal{F}=B(\mathbb{R})$

Note: We will also refer to

$B(\mathbb{R}^n),\; B((a,b)), \; B([a,b])$

where [a,b] and (a,b)are real intervals, to be respective event spaces when the sample space is

$\mathbb{R}^n,\; (a,b), \;and\; [a,b]$

Note that the collection of all open intervals in the set of reals, which we are calling G, is not a $\sigma$-field. So what does $\sigma(G)$ contain besides open intervals? Some others include:
\begin{align} \bullet\;\;&\mathcal{S} = \bigcup_{n=1}^{\infty}(-n,n)\;\in\;B(\mathbb{R}) \\ \bullet\;\;&\varnothing = \mathcal{S}^c\in B(\mathbb{R}) \\ \bullet\;\;&\forall \; a,b\in\mathbb{R}, \; a<b \\ &\circ\;\;(-\infty,b)=\bigcup_{n=1}^{\infty}(-n,b)\in B(\mathbb{R}) \\ &\circ\;\;(a,\infty) = \bigcup_{n=1}^{\infty}(a,n) \in B(\mathbb{R}) \\ &\circ\;\;\{a\} = \bigcap_{n=1}^{\infty}(a-\frac{1}{n},a+\frac{1}{n}) \in B(\mathbb{R}) \\ &\circ\;\;(-\infty,b] = (-\infty,b)\cup \{b\}\in B(\mathbb{R}) \\ &\circ\;\;[a,b] = (a,b)\cup \{a\}\cup\{b\}\in B(\mathbb{R}) \\ &\circ\;\;\mathbb{Q} = \bigcup_{n=1}^{\infty} \{q\};\;q_n\;\mbox{is the n}^{th}\;\mbox{rational}\;\Rightarrow \mathbb{Q}\in B(\mathbb{R}) \\ \end{align}

i.e. if we can write it as a union or complement or intersection of sets/elements in the Borel field, then it is in the Borel field.

Note that

\begin{align} b\leq -n \; &\Rightarrow \;(-n,b) = \varnothing \\ a\geq n \; &\Rightarrow \;(a,n) = \varnothing \\ \end{align}

Note that all of these sets are in the Borel field because the Borel field is a $\sigma$-field.

## Probability Mapping

Every event in the event space will have a probability assigned to it. More formally, the probability measure P, assigns a real number P(A) to every event A ∈ F. The mapping

$P : F \; \rightarrow \; \mathbb{R}$

must satisfy the following axioms of probability:
1. P(A)≥ 0 for all A ∈F
2. P(S) = 1
3. If $A_1,A_2$F are disjoint then P($A_1$$A_2$) = P($A_1$) + P($A_2$). This can be extended by induction (proof) to show that

$P(\bigcup_{i=1}^n A_i) = \sum_{i=1}^n P(A_i)$

if $A_1,...A_n$ are disjoint for any finite n.
4. If $A_1,A_2,...$is a (countably) infinite sequence of disjoint events, then

$P(\bigcup_{i=1}^{\infty} A_i) = \sum_{i=1}^{\infty} P(A_i)$

Property three can be derived from four (proof: use A$_1$ = ø for all but finitely many i in property 4). Properties 1, 2 and 4 together are often called the Kolmogorov axioms. These are the properties that we will require a probability mapping to have. This framework for probability theory is sometimes referred to as the axiomatic approach.

Note that the probability of event A occurring, given P(A), is supposed to represent the "likelihood" that one of the outcomes in A occurs. However, the axioms do not seem to address this. Some approaches to probability that do:
the counting approach: let |A| be the "size" of set A, e.g. the number of elements in A. Then

$\forall A \in \mathcal{F}, \; P(A) = \frac{|A|}{|\mathcal{S}|}$

• This approach works well for finite sample spaces with equally likely outcomes.
• But what if S is the entire real number line? what would then be the "size" of S? if A = (-∞,0], then P(A) is undefined since according to the counting approach,
$P(A) = \frac{\infty}{\infty}$

So uncountability is problematic for this approach to handle

the relative frequency approach: consider running n trials of a random experiment. Let $N_A$ be the number of times event A occurs in n trials. Then let $P(A)=N_A/n$. Note that $N_A$ depends on n. In general,

• Is well defined for any sample space
• Problems with estimation of low probability events
• P(A) depends on n. Can let
$P(A) = \lim_{n\rightarrow\infty}\frac{N_A}{n}$

But does this converge and if so, then how do we find the limit?

Both the counting approach and the relative frequency approach make sense intuitively, probably more so than the axiomatic approach, but they are limited and do not generalize well.
Fortunately, the counting and relative frequency approaches both fit into the axiomatic framework (both satisfy the axioms). But the axioms are much more general and provide a very rich framework.
So, in general, how do we assign probabilities that satisfy the axioms?
For example, count the number of cars passing a given point over a given period of time. Then S = {0,1,2,...}, and

$\mathcal{F(S) = P(S)}$

Some events are:

• A = {20} ie 20 cars pass
• A = {11,12,...,18,19} more than 10 but fewer than 20 cars pass
• A={0,2,4,6,...}, an even number of cars pass

It is difficult to assign probabilities to satisfy the axioms directly. Instead, we can use the following general approach:

Let S be discrete (is a discrete subset of Euclidean metric space R countable?) and F be the power set of S. Let p(.) be a function

$p:\mathcal{S}\rightarrow\mathbb{R}$

such that
$1.\;\; p(\omega)\geq 0\;\forall\omega\in\mathcal{S}$
$2.\;\; \sum_{\omega\in\mathcal{S}}p(\omega)=1$
Let

$P(A) = \sum_{\omega\in A}p(\omega)\;\forall A\in\mathcal{F}$

Then we can show that P is a valid probability measure (i.e. it satisfies the axioms) (proof)
A function satisfying the two properties above is called a probability mass function (pmf).

We will often specify probability mappings using a pmf. For example, for many counting problems,

$p(k) = \frac{\lambda^ke^{-k}}{k!},\;\;k = 0,1,2,...$

where $\lambda$ is a parameter, is a good pmf (more on this later when we discuss discrete random variables).

If S is uncountable, in particular if S is the set of reals, then

$\mathcal F = B(\mathbb{R})$

Let $f$ be a function such that

$f:\mathcal S \rightarrow \mathbb{R} \;\Rightarrow\; f:\mathbb R \rightarrow \mathbb{R}$

satisfying two properties:
$1.\;\; f(x)\geq 0\;\forall x\in\mathbb{R}$
$2.\;\; \int_{\mathbb{R}}f(x)dx=1$
Let

$P(A) = \int_A f(x)dx\;\forall A\in B(\mathbb{R})$

Then we can show that P(A) satisfies the axioms (proof) for example the additive one uses the linearity of integration)
A function satisfying the two properties above is called a probability density function (pdf). We will use this kind of function when we discuss continuous random variables.

The Riemann integral cannot be used to evaluate ∫$_A$f(r)dr because the Riemann integral (RI) does not exist for some A∈F. For example, let

\begin{align} &A \in \mathcal{F}=B(\mathcal{R}) \\ &f(r) = \begin{cases} 1 & \mbox{if }r\in[0,1] \\ 0 & \mbox{else} \end{cases} \end{align}

This f is a valid pdf. Now consider that A is the set of all rationals. Then, using RI gives

\begin{align} P(A) &= \int_0^1 1.I_{\mathbb Q}(x)dx \\ &= \int_0^1 I_{\mathbb Q}(x)dx \end{align}

Note that for any set B in the set of reals,

$I_B(x)= \begin{cases} 1 & \mbox{if }x\in B \\ 0 & \mbox{else} \end{cases}$

So,

\begin{align} P(B) &= \int_B f(x)I_B(x)dx \\ P(A) &= \lim_{N\rightarrow\infty}\sum_{k=1}^N \Delta_k I_{\mathbb Q}(\hat{x}_k)dx,\quad \mbox{where } \hat{x}\in[x_k,x_k+\Delta+k] \end{align}

let

$\underline{sum_N}$

be the lower bound of

$\sum_{k=1}^N \Delta_kI_{\mathbb Q}(\hat{x}_k)$

Then,

\begin{align} \underline{sum_N} &= \sum_{k=1}^N \Delta_k . min \;I_{\mathbb Q}(\hat{x}_k) \quad \mbox{where } x\in[x_k,x_k+\Delta+k] \\ &=0 \end{align}

Similarly, the upperbound is

\begin{align} \overline{sum_N} &= \sum_{k=1}^N \Delta_k . max \;I_{\mathbb Q}(\hat{x}_k) \quad \mbox{where } x\in[x_k,x_k+\Delta+k] \\ &=1 \end{align}

Since

$\lim_{N\rightarrow\infty}\underline{sum_N}=0\neq 1 = \lim_{N\rightarrow\infty}\overline{sum_N}$

The RI does not exist. Alternatively, you may be familiar with the idea that a function with uncountably many discontinuities such as I$_Q$ (which has discontinuities between each irrational) is not Riemann integrable. This type of problem led to the development of the Lebesgue integral (LI). Integration in probability theory is assumed to be LI, because it can be shown that the LI exists for all A in the Borel field.

If the RI of a function exists, then the LI exists, and the two are equal. So in this class, we will be able to perform integration using the tools we already know, since all integrals we will encounter in this course will be Reimann integrable.

## Some Properties of P

These properties (and numerous others) follow from the axioms:
1. P(Ø) = 0
Proof:
P(S) = 1 from the axioms
S ∩ Ø = Ø ⇒ P(S ∪ Ø) = P(S) + P(Ø)
But, S ∪ Ø = S ⇒ P(S) = P(S) + P(Ø)
Thus, 1 = 1 + P(Ø) ⇒ P(Ø) = 0

2. P(A') = 1 - P(A) for any event A
Proof:
A ∩ A' = Ø
So, P(A ∪ A') = P(A) +P(A') But, A ∪ A' = S
So, P(A) +P(A') = P(S) = 1 ⇒ P(A') = 1 - P(A)

3. If A ⊂ B, then P(A) ≤ P(B) for any event A,B in F
Proof:
A ⊂ B ⇒ P(A ∩ B) = P(A)
⇒ P(B) = P(B\A) + P(B ∩ A) = P(B\A) + P(A) ≥ P(A)
since by the 1st Kolmogorov axiom, P(B\A)≥ 0
⇒ P(A) ≥ P(B)