PMF and CDF

The Probability Mass Function (PMF) and Cumulative Distribution Function (CDF) are both ways to assess the probability of outcomes given a probability distribution. The PMF gives the probability of a particular outcome, while the CDF gives the total probability of that outcome combined with the probabilities of previous outcomes. This can look different for different problems. For example, a fair die has six sides that have an equal probability of being rolled. The PMF rolling any given value is 1/6 (think uniform distribution), while the CDF for rolling 1, 2, or 3 is: \[CDF(1,2,3) = PMF(1) + PMF(2) + PMF(3)\] \[~ or ~\]

\[1/6 + 1/6 + 1/6 = 3/6 = 1/2\]

In more mathematical terms, this can be represented as: \[\sum_{i=1}^{3} PMF(i)\]

For the Binomial Distribution, the same logic applies. Say for example we have a fair coin with heads and tails. Taking ‘heads’ as successes, we want to calculate the PMF for 1, 2, and 3 successes out of 10 trials. To do so we can use the binomial PMF function:

\[P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}\] Let’s calculate the PMF for our three cases: \[P(X = 1) = \binom{10}{1} 0.5^1 (1 - 0.5)^{10 - 1} \approx 0.009\] \[P(X = 2) = \binom{10}{2} 0.5^2 (2 - 0.5)^{10 - 2} \approx 0.043\] \[P(X = 3) = \binom{10}{3} 0.5^3 (3 - 0.5)^{10 - 3} \approx 0.117\] As an exercise, try generating these values using the dbinom() function in R. (Giving the approximate because I didn’t totally feel like rounding)

With these values in mind, the CDF will give us the combined probability of getting 1, 2, or 3 total heads over the course of 10 flips. In other words, it gives us the probability of getting up to 3 heads. By summing up the three PMF values above we arrive at \(CDF = 0.171\). This is exactly how we arrive at the formula for CDF that you see on our September 3rd slides, under the binomial probability
distribution: \[CDF = \sum_{i=0}^{y} \binom{n}{i} p^i (1 - p)^{n - i}\] Notice how the function after the summation term is the same as our PMF equation. This is why we call it the Cumulative Distribution Function, because it adds the probability of our chosen outcome the probabilities of all of the previous outcomes. This is why the CDF will only increase in value as you go down the line of possible outcomes. For example:

pbinom(1,10,.50)
## [1] 0.01074219
pbinom(2,10,.50)
## [1] 0.0546875
pbinom(3,10,.50)
## [1] 0.171875
pbinom(4,10,.50)
## [1] 0.3769531
pbinom(5,10,.50)
## [1] 0.6230469
pbinom(6,10,.50)
## [1] 0.828125
pbinom(7,10,.50)
## [1] 0.9453125
pbinom(8,10,.50)
## [1] 0.9892578
pbinom(9,10,.50)
## [1] 0.9990234
pbinom(10,10,.50)
## [1] 1

The CDF can also be used to give the probability that a range of outcomes will occur. For example, if we wanted to find the probability that there will be 3-5 heads out of 10 coin flips, we can calculate that as: \[CDF(3-5) = (PMF(1) + PMF(2) + PMF(3) + PMF(4) + PMF(5)) - (PMF(1) + PMF(2)) \\ = PMF(3) + PMF(4) + PMF(5)\]

We can again use the CDF function in R to accomplish this:

pbinom(5,10,.5) - pbinom(2,10,.5)
## [1] 0.5683594