Lecture 02: Probability as a Model

2/20/23

📋 Lecture Outline

  • Why statistics?
  • 🧪 A simple experiment
  • Some terminology
  • 🎰 Random variables
  • 🎲 Probability
  • 📊 Probability Distribution
  • Probability Distribution as a Model
  • Probability Distribution as a Function
  • Probability Mass Functions
  • Probability Density Functions
  • 🚗 Cars Model
  • Brief Review
  • A Simple Formula
  • A Note About Complexity

Setup

Why statistics?


We want to understand something about a population.

We can never observe the entire population, so we draw a sample.

We then use a model to describe the sample.

By comparing that model to a null model, we can infer something about the population.

Here, we’re going to focus on statistical description, aka models.

🧪 A simple experiment

We take ten cars, send each down a track, have them brake at the same point, and measure the distance it takes them to stop.

Question: how far do you think it will take the next car to stop?

Question: what distance is the most probable?

But, how do we determine this?

Some terminology

🎰 Random Variables

Two types of random variable:

  1. Discrete random variables often take only integer (non-decimal) values.
    • Examples: number of heads in 10 tosses of a fair coin, number of victims of the Thanos snap, number of projectile points in a stratigraphic level, number of archaeological sites in a watershed.
  2. Continuous random variables take real (decimal) values.
    • Examples: cost in property damage of a superhero fight, kilocalories per kilogram, kilocalories per hour, ratio of isotopes
    • Note: for continuous random variables, the sample space is infinite!

Probability Distributions

🎲 Probability

Let \(X\) be the number of heads in two tosses of a fair coin. What is the probability that \(X=1\)?

Probability Distribution as a Model

Has two components:

Central-tendency or “first moment”

  • Population mean (\(\mu\)). Gives the expected value of an experiment, \(E[X] = \mu\).
  • Sample mean (\(\bar{x}\)). Estimate of \(\mu\) based on a sample from \(X\) of size \(n\).

Dispersion or “second moment”

  • Population variance (\(\sigma^2\)). The expected value of the squared difference from the mean.
  • Sample variance (\(s^2\)). Estimate of \(\sigma^2\) based on a sample from \(X\) of size \(n\).
  • Standard deviation (\(\sigma\)) or \(s\) is the square root of the variance.

Probability Distribution as a Function

These can be defined using precise mathematical functions:

A probability mass function (PMF) for discrete random variables.

  • Examples: Bernoulli, Binomial, Negative Binomial, Poisson
  • Straightforward probability interpretation.

A probability density function (PDF) for continuous random variables.

  • Examples: Normal, Chi-squared, Student’s t, and F
  • Harder to interpret probability:
    • What is the probability that a car takes 10.317 m to stop? What about 10.31742 m?
    • Better to consider probability across an interval.
  • Requires that the function integrate to one (probability is the area under the curve).

Probability Mass Functions (PMF)

Bernoulli

Df. distribution of a binary random variable (“Bernoulli trial”) with two possible values, 1 (success) and 0 (failure), with \(p\) being the probability of success. E.g., a single coin flip.

\[f(x,p) = p^{x}(1-p)^{1-x}\]

Mean: \(p\)
Variance: \(p(1-p)\)

Binomial

Df. distribution of a random variable whose value is equal to the number of successes in \(n\) independent Bernoulli trials. E.g., number of heads in ten coin flips.

\[f(x,p,n) = \binom{n}{x}p^{x}(1-p)^{1-x}\]

Mean: \(np\)
Variance: \(np(1-p)\)

Poisson

Df. distribution of a random variable whose value is equal to the number of events occurring in a fixed interval of time or space. E.g., number of orcs passing through the Black Gates in an hour.

\[f(x,\lambda) = \frac{\lambda^{x}e^{-\lambda}}{x!}\]

Mean: \(\lambda\)
Variance: \(\lambda\)

Probability Density Functions (PDF)

Normal (Gaussian)

Df. distribution of a continuous random variable that is symmetric from positive to negative infinity. E.g., the height of actors who auditioned for the role of Aragorn.

\[f(x,\mu,\sigma) = \frac{1}{\sqrt{2\pi\sigma^2}}\;exp\left[-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2\right]\]

Mean: \(\mu\)
Variance: \(\sigma^2\)

Bringing it all together

🚗 Cars Model

Let’s use the Normal distribution to describe the cars data.

  • \(Y\) is stopping distance for population
  • \(Y\) is normally distributed, \(Y \sim N(\mu, \sigma)\)
  • Experiment is a random sample of size \(n\) from \(Y\) with \(y_1, y_2, ..., y_n\) observations.
  • Sample statistics (\(\bar{y}, s\)) approximate population parameters (\(\mu, \sigma\)).

Sample statistics:

  • Mean (\(\bar{y}\)) = 10.54 m

Sample statistics:

  • Mean (\(\bar{y}\)) = 10.54 m

This is our approximate expectation

  • \(E[Y] = \mu \approx \bar{y}\)

Sample statistics:

  • Mean (\(\bar{y}\)) = 10.54 m

But, there’s error, \(\epsilon\), in this estimate.

  • \(\epsilon_i = y_i - \bar{y}\)

Sample statistics:

  • Mean (\(\bar{y}\)) = 10.54 m

The average squared error is the variance:

  • \(s^2 = \frac{1}{n-1}\sum \epsilon_{i}^{2}\)

Sample statistics:

  • Mean (\(\bar{y}\)) = 10.54 m
  • S.D. (\(s\)) = 5.353 m

This is our uncertainty, how big we think any given error will be.

Sample statistics:

  • Mean (\(\bar{y}\)) = 10.54 m
  • S.D. (\(s\)) = 5.353 m

So, here is our probability model.

\[Y \sim N(\bar{y}, s)\] This is only an estimate of \(N(\mu, \sigma)\)!

Sample statistics:

  • Mean (\(\bar{y}\)) = 10.54 m
  • S.D. (\(s\)) = 5.353 m

With it, we can say, for example, that the probability that a random draw from this distribution falls within one standard deviation (dashed lines) of the mean (solid line) is 68.3%.

A Simple Formula

This gives us a simple formula

\[y_i = \bar{y} + \epsilon_i\] where

  • \(y_i\): stopping distance for car \(i\), data
  • \(\bar{y} \approx E[Y]\): expectation, predictable
  • \(\epsilon_i\): error, unpredictable

This gives us a simple formula

\[y_i = \bar{y} + \epsilon_i\]

If we subtract the mean, we have a model of the errors centered on zero:

\[\epsilon_i = 0 + (y_i - \bar{y})\]

This gives us a simple formula

\[y_i = \bar{y} + \epsilon_i\]

If we subtract the mean, we have a model of the errors centered on zero:

\[\epsilon_i = 0 + (y_i - \bar{y})\]

This means we can construct a probability model of the errors centered on zero.

Probability Model of Errors

Note that the mean changes, but the variance stays the same.

Summary

Now our simple formula is this:

\[y_i = \bar{y} + \epsilon_i\] \[\epsilon \sim N(0, s) \]

  • Again, \(\bar{y} \approx E[Y] = \mu\).
  • For any future outcome:
    • The expected value is deterministic
    • The error is stochastic
  • Must assume that the errors are iid!
    • independent = they do not affect each other
    • identically distributed = they are from the same probability distribution
  • The distribution is now a model of the errors!