Chapter 5: Conditional Probability
5.1 What Changes When You Have More Information
Probability depends on what you know.
Example: rolling a die
A die has been rolled, but we have not yet seen the outcome.
Q: What is the probability that the result is 6?
Answer: $\dfrac{1}{6}$
Now suppose someone says, “It is even.”
Q: What is the probability that the result is 6?
The even faces are $\{2, 4, 6\}$ — three outcomes. The face 6 is one of them.
Answer: $\dfrac{1}{3}$
The probability changed because we received additional information.
Note: in this example the probability went up, but information can also decrease the probability or leave it unchanged. For instance, learning “the result is odd” lowers the probability of getting 6 from $\dfrac{1}{6}$ to $0$.
5.2 Definition of Conditional Probability
The probability that A occurs given that B has occurred is called the conditional probability of A given B, written $P(A|B)$.
A way to remember it
“Once we know B has occurred,
the denominator is the probability of B (the new whole),
and the numerator is the probability that A also occurs within B.”
A concrete example: the die in a Venn diagram
Restating the previous example as a Venn diagram gives Figure 5.2. After we learn that B has occurred, only the green disc (B) matters; the conditional probability is the proportion of B in which A also occurs.
How to read the diagram
- Blue disc (A): rolling a 6, i.e., $\{6\}$
- Green disc (B): even faces, i.e., $\{2, 4, 6\}$
- Overlap (A∩B): rolling a 6 (which is even), i.e., $\{6\}$
- P(A|B) = 1/3: among even outcomes (B), the probability of getting 6 (A)
Important: once we condition on B, the green disc (B) becomes the new sample space. The conditional probability is the relative size of the overlap inside B.
Verification with formulas
- $A$: rolling 6 → $P(A) = \dfrac{1}{6}$
- $B$: rolling an even number → $P(B) = \dfrac{3}{6} = \dfrac{1}{2}$
- $A \cap B$: rolling 6 (which is even) → $P(A \cap B) = \dfrac{1}{6}$
The information “the result is even” raised the probability of rolling a 6 from $\dfrac{1}{6}$ to $\dfrac{1}{3}$.
The general picture
5.3 Worked Examples
Example 1: two dice
Two dice are rolled. We are told the sum is at least 8. What is the probability that the sum is exactly 10?
Solution
- $B$: sum is at least 8
- $A$: sum is 10
Count the favourable outcomes for each event.
Outcomes with sum at least 8
- Sum 8: $(2,6),(3,5),(4,4),(5,3),(6,2)$ → 5 outcomes
- Sum 9: $(3,6),(4,5),(5,4),(6,3)$ → 4 outcomes
- Sum 10: $(4,6),(5,5),(6,4)$ → 3 outcomes
- Sum 11: $(5,6),(6,5)$ → 2 outcomes
- Sum 12: $(6,6)$ → 1 outcome
Total: $5+4+3+2+1 = 15$ outcomes.
Outcomes with sum 10: 3 outcomes.
$$P(A|B) = \dfrac{\color{green}{3}}{\color{blue}{15}} = \color{orange}{\dfrac{1}{5}}$$Example 2: a deck of cards
Draw one card from a standard 52-card deck. Given that the card is a face card (J, Q, or K), what is the probability that it is a heart?
Solution
- $B$: face card → 12 cards
- $A$: heart → 13 cards
- $A \cap B$: heart face card → 3 cards
5.4 The Multiplication Rule
Rearranging the definition of conditional probability gives:
When $P(B) > 0$,
$$P(A \cap B) = P(B) \times P(A|B).$$Likewise, when $P(A) > 0$,
$$P(A \cap B) = P(A) \times P(B|A).$$By definition,
$$P(A|B) = \dfrac{P(A \cap B)}{P(B)}.$$Multiplying both sides by $P(B)$,
$$P(B) \times P(A|B) = P(A \cap B).$$The second identity follows analogously from $P(B|A) = \dfrac{P(A \cap B)}{P(A)}$.
In words: “the probability that A and B both occur” equals “the probability of B” times “the probability of A given B.”
Example: drawing lottery tickets
A box has 10 tickets, of which 3 are winners. Two tickets are drawn in succession (without replacement). What is the probability that both are winners?
Solution
- Probability the 1st ticket wins: $\dfrac{3}{10}$
- Given the 1st was a winner, probability the 2nd also wins: $\dfrac{2}{9}$
5.5 The Law of Total Probability
If the sample space can be split into mutually exclusive events $B_1, B_2, \ldots, B_n$:
Let $B_1, B_2, \ldots, B_n$ be a partition of the sample space $S$ (mutually exclusive with union $S$) such that $P(B_i) > 0$ for every $i$. Then
$$P(A) = \sum_{i=1}^{n} P(B_i)P(A|B_i) = P(B_1)P(A|B_1) + P(B_2)P(A|B_2) + \cdots + P(B_n)P(A|B_n).$$Since $B_1, B_2, \ldots, B_n$ partition $S$,
$$S = B_1 \cup B_2 \cup \cdots \cup B_n, \quad B_i \cap B_j = \emptyset \; (i \neq j).$$Decomposing $A$ along the partition,
$$A = A \cap S = A \cap (B_1 \cup B_2 \cup \cdots \cup B_n) = (A \cap B_1) \cup (A \cap B_2) \cup \cdots \cup (A \cap B_n).$$Because the $B_i$ are mutually exclusive, so are the $A \cap B_i$.
By additivity,
$$P(A) = P(A \cap B_1) + P(A \cap B_2) + \cdots + P(A \cap B_n).$$Applying the multiplication rule $P(A \cap B_i) = P(B_i)P(A|B_i)$ to each term,
$$P(A) = P(B_1)P(A|B_1) + P(B_2)P(A|B_2) + \cdots + P(B_n)P(A|B_n).$$Example: two bags
Bag 1 contains 3 red and 2 white balls. Bag 2 contains 4 red and 6 white balls.
A coin is tossed: heads selects bag 1, tails selects bag 2. One ball is then drawn from the chosen bag.
What is the probability that the drawn ball is red?
Solution
- $B_1$: bag 1 chosen → $P(B_1) = \dfrac{1}{2}$
- $B_2$: bag 2 chosen → $P(B_2) = \dfrac{1}{2}$
- $P(\text{red}|B_1) = \dfrac{3}{5}$
- $P(\text{red}|B_2) = \dfrac{4}{10} = \dfrac{2}{5}$
5.6 Bayes' Theorem
Given that A occurred, Bayes' theorem gives the probability that the cause was $B_i$.
Let $B_1, B_2, \ldots, B_n$ be a partition of the sample space $S$ with $P(B_i) > 0$ for every $i$, and let $P(A) > 0$. Then
$$P(B_i|A) = \dfrac{P(B_i)P(A|B_i)}{P(A)} = \dfrac{P(B_i)P(A|B_i)}{\sum_{j=1}^{n} P(B_j)P(A|B_j)}.$$How to read the formula: the numerator is $\color{#1976D2}{P(B_i)}$ (prior) × $\color{#388E3C}{P(A|B_i)}$ (likelihood). The denominator $P(A)$ is exactly the law of total probability of §5.5, summing the numerator-shape $P(B_j)P(A|B_j)$ over $B_1, \ldots, B_n$. In other words, Bayes' theorem is just the multiplication rule combined with the law of total probability.
By the definition of conditional probability,
$$P(B_i|A) = \dfrac{P(A \cap B_i)}{P(A)}.$$The multiplication rule gives $P(A \cap B_i) = P(B_i)P(A|B_i)$, so
$$P(B_i|A) = \dfrac{P(B_i)P(A|B_i)}{P(A)}.$$Substituting the law of total probability for $P(A)$,
$$P(B_i|A) = \dfrac{P(B_i)P(A|B_i)}{\sum_{j=1}^{n} P(B_j)P(A|B_j)}.$$- $P(B_i)$: prior probability (before observing $A$)
- $P(B_i|A)$: posterior probability (after observing $A$)
- $P(A|B_i)$: likelihood (probability of $A$ assuming $B_i$)
Bayes' theorem updates beliefs about a cause $B_i$ in light of the observed result $A$.
Mental picture of Bayes' theorem
“See the result, update the cause.”
Example: a positive test result → how likely is the disease really?
- Prior: prevalence of the disease before the test
- Posterior: probability after taking the test result into account
Example: defective products
Two factories produce a part:
- Factory A produces 60% of the parts and has a 2% defect rate.
- Factory B produces 40% of the parts and has a 5% defect rate.
A given part is defective. What is the probability it was made at factory A?
Solution
- $P(A) = 0.6$ (made at factory A)
- $P(B) = 0.4$ (made at factory B)
- $P(\text{defective}|A) = 0.02$
- $P(\text{defective}|B) = 0.05$
First, compute the probability of a defective part using the law of total probability:
$$P(\text{defective}) = 0.6 \times 0.02 + 0.4 \times 0.05 = 0.012 + 0.02 = 0.032.$$By Bayes' theorem,
$$P(A|\text{defective}) = \dfrac{0.6 \times 0.02}{0.032} = \dfrac{0.012}{0.032} = \dfrac{12}{32} = \dfrac{3}{8} = 0.375.$$The probability the part came from factory A is 37.5%.
Example: DNA evidence and miscarriages of justice
A real-world misuse
In past trials it has been argued that “the DNA matched at 99.9%, so the suspect is guilty.” But applied correctly, Bayes' theorem can show that the actual probability of guilt is far smaller. Knowing this could have prevented wrongful convictions.
Suppose a crime has occurred. The suspect's DNA is compared with evidence from the scene. The DNA test has the following accuracy:
- If the person is the perpetrator, the test matches with probability 99.9% (sensitivity).
- For an unrelated person, the test matches by chance with probability 0.1% (false-positive rate).
The region has 1,000,000 residents, exactly one of whom is the perpetrator. Given that the test matched, what is the probability that the suspect is the actual perpetrator?
Common intuitive answer: “A 99.9% match means a 99.9% probability of guilt!”
However, the correct calculation says otherwise.
Bayesian computation
- $B_1$: the suspect is the perpetrator → $P(B_1) = \dfrac{1}{1{,}000{,}000}$ (prior)
- $B_2$: the suspect is unrelated → $P(B_2) = \dfrac{999{,}999}{1{,}000{,}000}$
- $A$: DNA test matches
- $P(A|B_1) = 0.999$ (the perpetrator's DNA matches)
- $P(A|B_2) = 0.001$ (an unrelated person's DNA matches by chance)
By Bayes' theorem, the probability that the suspect is truly the perpetrator given that the DNA matched is
$$P(B_1|A) = \dfrac{P(B_1) \cdot P(A|B_1)}{P(B_1) \cdot P(A|B_1) + P(B_2) \cdot P(A|B_2)}.$$Plugging in the numbers,
$$P(B_1|A) = \dfrac{\dfrac{1}{1{,}000{,}000} \times 0.999}{\dfrac{1}{1{,}000{,}000} \times 0.999 + \dfrac{999{,}999}{1{,}000{,}000} \times 0.001}.$$Multiplying numerator and denominator by $1{,}000{,}000$,
$$= \dfrac{0.999}{0.999 + 999.999} = \dfrac{0.999}{1000.998} \approx 0.000998 \approx 0.1\%.$$Answer: approximately 0.1% (about 1 in 1000).
Why isn't “DNA matches” the same as “99.9% guilty”?
Out of 1,000,000 residents:
- The 1 perpetrator: matches with probability 99.9% → about 1 person.
- The 999,999 unrelated residents: each matches with probability 0.1% → about 1,000 people.
So about 1,001 people would match. Of those, only one is the actual perpetrator.
A DNA match alone therefore identifies the true perpetrator with probability of only about 1 in 1001 — roughly 0.1%.
Take-aways
- Test accuracy alone is not enough: “99.9% match” describes the test, not the probability of guilt.
- The prior (base rate) matters: a prior of 1 in 1,000,000 cannot be ignored.
- Count false positives in absolute terms: even at high accuracy, false positives accumulate when the population is large.
- Combine with other evidence: a DNA match should be weighed alongside alibi, motive, and other corroborating evidence.
Implications for trials
If additional evidence (eyewitness testimony, motive, presence at the scene) raises the prior far above 1 in 1,000,000, a DNA match increases the probability of guilt substantially. Without other evidence, however, a DNA match alone is insufficient to support a conviction. Understanding Bayes' theorem allows evidence to be evaluated correctly.
Confusing the conditional probability of a match with the probability of guilt — ignoring the prior (base rate) — is a classic statistical mistake at the intersection of probability and law, known as the prosecutor's fallacy (a special case of the base rate fallacy).
5.7 Independent Events
Two events $A$ and $B$ are independent if the occurrence of one does not change the probability of the other.
Definition: independent events
Two events $A$ and $B$ are independent if
$$P(A \cap B) = P(A) \times P(B).$$When $P(A) > 0$ and $P(B) > 0$, the following are equivalent:
- $P(A \cap B) = P(A) \times P(B)$
- $P(B|A) = P(B)$
- $P(A|B) = P(A)$
(1) ⇔ (2):
By the definition of conditional probability,
$$P(B|A) = \dfrac{P(A \cap B)}{P(A)}.$$Hence $P(B|A) = P(B)$ is equivalent to
$$\dfrac{P(A \cap B)}{P(A)} = P(B) \iff P(A \cap B) = P(A) \times P(B).$$(1) ⇔ (3) follows analogously.
Example: tossing a coin twice
The outcome of the first toss and the outcome of the second toss are independent.
$$P(\text{both heads}) = \dfrac{1}{2} \times \dfrac{1}{2} = \dfrac{1}{4}.$$A dependent example: drawing tickets without replacement
A box has 10 tickets, of which 3 are winners. Two tickets are drawn without replacement.
- $A$: the 1st ticket is a winner → $P(A) = \dfrac{3}{10}$
- $B$: the 2nd ticket is a winner
The outcome of the first draw changes the probability of the second:
- If the 1st was a winner: 9 tickets remain, 2 of them winners → $P(B|A) = \dfrac{2}{9}$.
- If the 1st was a loser: 9 tickets remain, 3 of them winners → $P(B|\overline{A}) = \dfrac{3}{9} = \dfrac{1}{3}$.
Since $P(B|A) \neq P(B)$, $A$ and $B$ are not independent.
Compare: with replacement, the probability that the 2nd ticket wins is always $\dfrac{3}{10}$ regardless of the 1st draw, so the two draws are independent.
Does buying more lottery tickets change the per-ticket expected value?
Lottery tickets are drawn without replacement, so the events “ticket 1 wins” and “ticket 2 wins” are not independent. Surprisingly, however, the expected value per ticket is the same regardless of how many you buy.
Concrete example: a typical large lottery
Suppose a lottery sells tickets at 300 yen each with a prize-fund payout of about 50% of total sales. Then the expected value per ticket is about 150 yen.
Buying 1 ticket
- Expected value: $E[X_1] = 150$ yen.
- Expected profit: $150 - 300 = -150$ yen.
The 2nd ticket when 2 tickets are bought
After buying ticket 1, you buy ticket 2. The probability that ticket 2 wins depends on ticket 1 (they are not independent), but what about its expected value?
By symmetry, the expected value of the 2nd ticket is also 150 yen:
- All issued tickets are symmetric (no ticket is privileged).
- The first ticket is not special.
- Whichever ticket you buy, the expected value per ticket is the same.
Conclusion: no matter how many tickets you buy, the expected value per ticket is 150 yen.
Why doesn't the expected value change?
The reason is the linearity of expectation, which holds even when events are not independent:
$$E[X_1 + X_2] = E[X_1] + E[X_2].$$The total prize fund is fixed and is distributed symmetrically among all tickets, so the expected value per ticket is the same.
Does buying many tickets help?
It is tempting to think “more tickets = better odds = better deal.” That intuition is misleading.
- 1 ticket: expected profit = 150 - 300 = -150 yen.
- 10 tickets: expected profit = 1500 - 3000 = -1500 yen.
- 100 tickets: expected profit = 15,000 - 30,000 = -15,000 yen.
Buying more tickets does increase the chance of winning, but the expected loss grows in proportion to the number of tickets.
Caution: independence is not the same as mutual exclusivity
“Independent” and “mutually exclusive” are often confused, but they are completely different concepts. Side-by-side Venn diagrams make the contrast obvious (Figure 5.12).
| Concept | Meaning | Venn diagram | Formula |
|---|---|---|---|
| Independent | Occurrence of one does not affect the other | Usually overlap | $P(A \cap B) = P(A) \times P(B)$ |
| Mutually exclusive | Cannot occur together | No overlap | $P(A \cap B) = 0$ |
Important caveat
Mutually exclusive events with $P(B) > 0$ are not independent.
If $A$ occurs we know $B$ cannot occur, so the probability of $B$ has changed.
5.8 Chapter Summary
Formula sheet
| Name | Formula | Meaning |
|---|---|---|
| Conditional probability | $$P(A|B) = \dfrac{P(A \cap B)}{P(B)}$$ | Probability of A given B |
| Multiplication rule | $$P(A \cap B) = P(B) \times P(A|B)$$ | Joint probability via two stages |
| Law of total probability | $$P(A) = \sum_{i} P(B_i)P(A|B_i)$$ | Sum the cases of a partition |
| Bayes' theorem | $$P(B_i|A) = \dfrac{P(B_i)P(A|B_i)}{P(A)}$$ | Recover the cause from the result |
| Independence | $$P(A \cap B) = P(A) \times P(B)$$ | Events do not influence each other |
Keyword recap
$P(A|B)$: probability of A given B
Joint probabilities computed in two stages
Sum probabilities over a partition
Recover the cause from the result
Probabilities before and after observing
Events do not change each other's probabilities
Glossary
📚 Open glossary
- Conditional probability $P(A|B)$
- Probability that A occurs given that B has occurred. Treat B as the new sample space.
- Multiplication rule
- $P(A \cap B) = P(B) \times P(A|B)$. Computes the joint probability in two stages.
- Law of total probability
- $P(A) = \sum_i P(B_i)P(A|B_i)$. Sums the probabilities over a partition.
- Bayes' theorem
- $P(B_i|A) = \dfrac{P(B_i)P(A|B_i)}{P(A)}$. Recovers the cause from the result.
- Prior probability
- Probability before observing evidence. Denoted $P(B_i)$ in Bayes' theorem.
- Posterior probability
- Probability after observing the result A. Denoted $P(B_i|A)$ in Bayes' theorem.
- Likelihood
- Probability $P(A|B_i)$ that A occurs assuming the cause is $B_i$.
- Independent events
- Events satisfying $P(A \cap B) = P(A) \times P(B)$ — one's occurrence does not change the other's probability.
- Mutually exclusive events
- $P(A \cap B) = 0$. Events that cannot occur together. Different from independence.
Exercises
Problem 1
A die is rolled. Given that the result is at least 3, what is the probability that it is even?
Problem 2
Two cards are drawn from a deck (without replacement). Given that the first is a spade, what is the probability that the second is also a spade?
Problem 3
From a group of 6 men and 4 women, 2 people are selected. Given that the first is a man, what is the probability that the second is also a man?
Problem 4
A medical test has the following properties:
- 1% of the population has the disease.
- If a person has the disease, the test is positive 99% of the time.
- If a person does not have the disease, the test is positive 2% of the time (false positive).
Given a positive test result, what is the probability that the person actually has the disease?
Problem 5 (independence check)
Two dice are rolled. Consider the events:
- $A$: the first die is even.
- $B$: the second die is at least 3.
- $C$: the sum is 7.
(1) Are $A$ and $B$ independent?
(2) Are $A$ and $C$ independent?
Solution to Problem 1
Solution
Idea: condition on $B$ = “the result is at least 3” and find the probability of $A$ = “the result is even.”
- $B$: at least 3 → $\{3, 4, 5, 6\}$, four outcomes.
- $A$: even → $\{2, 4, 6\}$.
- $A \cap B$: at least 3 and even → $\{4, 6\}$, two outcomes.
By the definition of conditional probability,
$$P(A|B) = \dfrac{P(A \cap B)}{P(B)} = \dfrac{2/6}{4/6} = \dfrac{2}{4} = \dfrac{1}{2}.$$Answer: $\dfrac{1}{2}$.
Solution to Problem 2
Solution
Idea: after the first card (a spade) is removed, 51 cards remain, of which 12 are spades.
- Total: 52 cards (13 per suit).
- 1st is a spade → 51 cards remain.
- Spades remaining: $13 - 1 = 12$.
Answer: $\dfrac{4}{17}$.
Solution to Problem 3
Solution
Idea: if the first selected person is a man, 9 people remain and 5 of them are men.
- Initial group: 6 men and 4 women (10 people).
- 1st is a man → 9 people remain.
- Men remaining: $6 - 1 = 5$.
Answer: $\dfrac{5}{9}$.
Solution to Problem 4
Solution
Idea: use Bayes' theorem — recover the cause “disease” from the result “positive.”
- $P(\text{disease}) = 0.01$ (prior).
- $P(\text{positive}|\text{disease}) = 0.99$ (sensitivity).
- $P(\text{positive}|\text{healthy}) = 0.02$ (false-positive rate).
By the law of total probability,
$$P(\text{positive}) = 0.01 \times 0.99 + 0.99 \times 0.02 = 0.0099 + 0.0198 = 0.0297.$$Bayes' theorem then gives
$$P(\text{disease}|\text{positive}) = \dfrac{0.01 \times 0.99}{0.0297} = \dfrac{0.0099}{0.0297} \approx 0.333.$$Answer: about 33% (roughly 1 in 3).
Important: even at 99% sensitivity, the rarity of the disease (1%) means only about a third of positive tests correspond to actual disease — a classic illustration of the importance of the prior.
Intuitive check: imagine 1000 people. About 10 have the disease and 990 are healthy.
- Diseased 10 × 99% ≈ 9.9 test positive (true positives).
- Healthy 990 × 2% ≈ 19.8 test positive (false positives).
- Total positives: 9.9 + 19.8 ≈ 29.7 people.
Of the positives, the fraction who actually have the disease is $\dfrac{9.9}{29.7} \approx 0.333$, about 33%, matching the formula above.
Solution to Problem 5
Solution
(1) Are $A$ and $B$ independent?
Idea: the two rolls are independent, so we expect $A$ and $B$ to be independent. Let's verify.
- $P(A) = \dfrac{3}{6} = \dfrac{1}{2}$ (first die even).
- $P(B) = \dfrac{4}{6} = \dfrac{2}{3}$ (second die at least 3).
- $P(A \cap B) = \dfrac{3}{6} \times \dfrac{4}{6} = \dfrac{12}{36} = \dfrac{1}{3}$.
- $P(A) \times P(B) = \dfrac{1}{2} \times \dfrac{2}{3} = \dfrac{1}{3}$.
Since $P(A \cap B) = P(A) \times P(B)$, the events are independent.
(2) Are $A$ and $C$ independent?
Idea: count outcomes carefully and check whether the parity of the first roll changes the probability of the sum being 7.
- $P(C) = \dfrac{6}{36} = \dfrac{1}{6}$ (sum 7: $(1,6),(2,5),(3,4),(4,3),(5,2),(6,1)$, six outcomes).
- $P(A \cap C)$: first die even and sum 7 → $(2,5),(4,3),(6,1)$, three outcomes → $\dfrac{3}{36} = \dfrac{1}{12}$.
- $P(A) \times P(C) = \dfrac{1}{2} \times \dfrac{1}{6} = \dfrac{1}{12}$.
Since $P(A \cap C) = P(A) \times P(C)$, the events are independent.
Why are they independent? No matter what the first die shows, there is exactly one second-die value that makes the sum equal to 7: if the first is 2 the second must be 5; if 4 then 3; if 6 then 1. Each of those joint outcomes has probability $\dfrac{1}{6} \times \dfrac{1}{6} = \dfrac{1}{36}$. So the joint count is $\dfrac{3}{36} = \dfrac{1}{12}$.
The same logic works when the first die is odd (1, 3, or 5): the second die only needs to take the matching values (6, 4, 2). The probability is again $\dfrac{3}{36} = \dfrac{1}{12}$. By symmetry the parity of the first roll does not change the probability of the sum being 7.
References
- Conditional probability — Wikipedia
- Bayes' theorem — Wikipedia
- Law of total probability — Wikipedia
- Independence (probability theory) — Wikipedia
- Base rate fallacy — Wikipedia (the misuse of priors highlighted in the DNA example, also known as the prosecutor's fallacy)