**Things to Know before we start**

Before we go into the proofs, here are a few definitions and concept to know so you don't get confused when we talk about the proof of this formula. I am giving fairly simple explanations here, just enough so you can understand what it is. If you want to learn more about it, you can find resources about the topic in the "More Sources" page.

**Probability density function:** a function that models the probability of a random variable being a certain value, usually denoted as $ P(X=A) $ where X is the random variable and A is the outcome we are looking for. The integrals of all probability density functions are equal to 1.

**Standard Deviation:** is used to measure how spread out the numbers in a sample are. Usually denoted by σ. The standard deviation is the square root of variance.

**Variance:** the average of the squared difference from the mean. Usually denoted by σ^{2}. Variance is the square of the standard deviation. It is used in calculation more often because variance is much easier to manipulate without the loss of data. It is also used because it weighs outliers much more than standard deviation which is important when used by investors and stock traders.

**Theta(θ):** Is used in statistics to represent any unknown parameter of interest. In a continuous probability function, it can be used as the likelihood that event X occurs.

$ P(x=A)=θ $

θ denotes the probability that event A occurs within the probability function $ P(X) $

**Expected Value:** the weighted average of a probability function. Denoted as $ E(x) $. In simple terms, the most likely event to occur in a probability function.

**Likelihood Function:** is used to predict how close a statistical model is. Denoted as $ L(θ) $

**Score:** is the gradient, or vectors of the partial derivatives, of the natural log of ($ L(θ) $) where $ L(θ) $is a likelihood function of some parameter