Stat Trek

Teach yourself statistics

Stat Trek

Teach yourself statistics


Hypergeometric Distribution

The probability distribution of a hypergeometric random variable is called a hypergeometric distribution. This lesson describes how hypergeometric random variables, hypergeometric experiments, hypergeometric probability, and the hypergeometric distribution are all related.

Notation

The following notation is helpful, when we talk about hypergeometric distributions and hypergeometric probability.

  • N: The number of items in the population.
  • k: The number of items in the population that are classified as successes.
  • n: The number of items in the sample.
  • x: The number of items in the sample that are classified as successes.
  • kCx: The number of combinations of k things, taken x at a time.
  • h(x; N, n, k): hypergeometric probability - the probability that an n-trial hypergeometric experiment results in exactly x successes, when the population consists of N items, k of which are classified as successes.

Hypergeometric Experiments

A hypergeometric experiment is a statistical experiment that has the following properties:

  • A sample of size n is randomly selected without replacement from a population of N items.
  • In the population, k items can be classified as successes, and N - k items can be classified as failures.

Consider the following statistical experiment. You have an urn of 10 marbles - 5 red and 5 green. You randomly select 2 marbles without replacement and count the number of red marbles you have selected. This would be a hypergeometric experiment.

Note that it would not be a binomial experiment. A binomial experiment requires that the probability of success be constant on every trial. With the above experiment, the probability of a success changes on every trial. In the beginning, the probability of selecting a red marble is 5/10. If you select a red marble on the first trial, the probability of selecting a red marble on the second trial is 4/9. And if you select a green marble on the first trial, the probability of selecting a red marble on the second trial is 5/9.

Note further that if you selected the marbles with replacement, the probability of success would not change. It would be 5/10 on every trial. Then, this would be a binomial experiment.

Hypergeometric Distribution

A hypergeometric random variable is the number of successes that result from a hypergeometric experiment. The probability distribution of a hypergeometric random variable is called a hypergeometric distribution.

Given x, N, n, and k, we can compute the hypergeometric probability based on the following formula:

Hypergeometric Formula.. Suppose a population consists of N items, k of which are successes. And a random sample drawn from that population consists of n items, x of which are successes. Then the hypergeometric probability is:

h(x; N, n, k) = [ kCx ] [ N-kCn-x ] / [ NCn ]

The hypergeometric distribution has the following properties:

  • The mean of the distribution is equal to n * k / N .
  • The variance is n * k * ( N - k ) * ( N - n ) / [ N2 * ( N - 1 ) ] .

Example 1

Suppose we randomly select 5 cards without replacement from an ordinary deck of playing cards. What is the probability of getting exactly 2 red cards (i.e., hearts or diamonds)?

Solution: This is a hypergeometric experiment in which we know the following:

  • N = 52; since there are 52 cards in a deck.
  • k = 26; since there are 26 red cards in a deck.
  • n = 5; since we randomly select 5 cards from the deck.
  • x = 2; since 2 of the cards we select are red.

We plug these values into the hypergeometric formula as follows:

h(x; N, n, k) = [ kCx ] [ N-kCn-x ] / [ NCn ]

h(2; 52, 5, 26) = [ 26C2 ] [ 26C3 ] / [ 52C5 ]

h(2; 52, 5, 26) = [ 325 ] [ 2600 ] / [ 2,598,960 ]

h(2; 52, 5, 26) = 0.32513

Thus, the probability of randomly selecting 2 red cards is 0.32513.

Cumulative Hypergeometric Probability

A cumulative hypergeometric probability refers to the probability that the hypergeometric random variable is greater than or equal to some specified lower limit and less than or equal to some specified upper limit.

For example, suppose we randomly select five cards from an ordinary deck of playing cards. We might be interested in the cumulative hypergeometric probability of obtaining 2 or fewer hearts. This would be the probability of obtaining 0 hearts plus the probability of obtaining 1 heart plus the probability of obtaining 2 hearts, as shown in the example below.

Example 2

Suppose we select 5 cards from an ordinary deck of playing cards. What is the probability of obtaining 2 or fewer hearts?

Solution: This is a hypergeometric experiment in which we know the following:

  • N = 52; since there are 52 cards in a deck.
  • k = 13; since there are 13 hearts in a deck.
  • n = 5; since we randomly select 5 cards from the deck.
  • x = 0 to 2; since our selection includes 0, 1, or 2 hearts.

We plug these values into the hypergeometric formula as follows:

h(x < x; N, n, k) = h(x < 2; 52, 5, 13)

h(x < 2; 52, 5, 13) = h(x = 0; 52, 5, 13) + h(x = 1; 52, 5, 13) + h(x = 2; 52, 5, 13)

h(x < 2; 52, 5, 13) = [ (13C0) (39C5) / (52C5) ] + [ (13C1) (39C4) / (52C5) ] + [ (13C2) (39C3) / (52C5) ]

h(x < 2; 52, 5, 13) = [ (1)(575,757)/(2,598,960) ] + [ (13)(82,251)/(2,598,960) ] + [ (78)(9139)/(2,598,960) ]

h(x < 2; 52, 5, 13) = [ 0.2215 ] + [ 0.4114 ] + [ 0.2743 ]

h(x < 2; 52, 5, 13) = 0.9072

Thus, the probability of randomly selecting at most 2 hearts is 0.9072.

Hypergeometric Calculator

As you surely noticed, the hypergeometric formula requires many time-consuming computations. The Stat Trek Hypergeometric Calculator can do this work for you - quickly, easily, and error-free. Use the Hypergeometric Calculator to compute hypergeometric probabilities and cumulative hypergeometric probabilities. The calculator is free. It can found in the Stat Trek main menu under the Stat Tools tab. Or you can tap the button below.

Hypergeometric Calculator