Normal Approximation to the Binomial

The normal approximation to the binomial distribution is a method used to estimate binomial probability when the sample size is large, and the probability of success (p) is not too close to 0 or 1.

Logic for Normal Approximation

This approximation relies on the fact that, under certain conditions, the binomial distribution can be closely approximated by a normal distribution. Here's the logic.

Binomial Distribution Characteristics

The binomial distribution models the number of successes (x) in a fixed number of independent trials (n), each with the same probability of success (P). The mean and standard deviation of the binomial distribution is given by:

Mean = μ = n * P
Standard deviation = σ = sqrt [ n * P * (1-P) }

The probability of having exactly x successes (out of n trials) is given by the binomial formula:

P(X = x)

\[ \binom{n}{x} \]

* P^x * (1 - P)^{n - x}

And the proportion of successes in the sample of n trials is p = x/n.

What Is a Combination?

A combination represents the number of ways to choose x items from a set of n items without regard to order. The notation for a combination is typically written as:

\[ \binom{n}{x} \]

_nC_x

This is read as "n choose x." And the formula for a combination is:

\[ \binom{n}{x} \]

_nC_x

n(n - 1)(n - 2) ... (n - x + 1)/x!

\[ \binom{n}{x} \]

_nC_x

n! / x!(n - x)!

where x! is the factorial of x, and (n−x)! is the factorial of (n−x).

The Central Limit Theorem

The Central Limit Theorem states that the sampling distribution of the sample mean (or sum) of a large number of independent and identically distributed random variables will approximate a normal distribution, regardless of the shape of the original population distribution.

For a binomial distribution:

Each trial can result in one of two independent outcomes (0 or 1).
The sum of n independent trials (each with probability P) is a binomial random variable x.
For large n, the distribution of x (or the sample proportion p = x/n) will approximate a normal distribution.

Because the binomial distribution approaches normality as sample size increases, we can use the normal distribution to approximate the binomial distribution in certain situations.

When to Use the Normal Approximation

It is safe to use the normal approximation to the binomial when the following conditions are true:

Population size (N) is at least 20 times sample size (n). (Some sources are ok if population size is only 10 times sample size.)
The sampling method is simple random sampling.
n * p ≥ 10, where p is the sample proportion.
n * (1 - p) ≥ 10.

By applying the normal approximation, you gain computational efficiency without sacrificing much accuracy, provided the necessary conditions are met. If you need highly precise results, use the exact binomial distribution instead (as described in the previous lesson).

How to Conduct the Analysis

There are two ways to conduct a normal approximation of the binomial distribution: (1) with a continuity correction and (2) without a continuity correction.

Normal Approximation With Continuity Correction

To conduct an analysis using the normal approximation with a continuity correction, follow these four steps: (1) compute the mean and standard deviation, (2) apply a continuity correction, (3) calculate the z-score, and (4) find the probability.

Compute the mean and standard deviation. For a binomial random variable (x), the mean (μ) and the standard deviation (σ) are:
μ = nP

σ = sqrt [n * P * (1-P) ]

where P is the probability of success on any given trial (i.e., the proportion of successes in the population), and n is the number of trials.
Apply a continuity correction. The binomial distribution is a discrete probability distribution, while the normal distributon is a continuous probability distribution. To define a normal probability that corresponds to a specified binomial probability, we adjust the value of the number of successes in the sample (x) as follows:
- Instead of P(X ≤ x), use P(X ≤ x + 0.5).
- Instead of P(X ≥ x), use P(X ≥ x - 0.5).
- Instead of P(X = x), use P(x - 0.5 ≤ X ≤ x + 0.5).
Calculate z-score. Enter the continuity-corrected value for the number of successes (x ± 0.5) into the z-score formula:
z = (x - μ) / σ

For example, if you want P(X ≥ x), use the continuity-corrected value (x - 0.5) in the z-score formula:

z = [ (x - 0.5) - μ ] / σ
Find the probability. Use a standard normal distribution table, statistical software, or an online calculator (like Stat Trek's Normal Distribution Calculator) to find a probability for the calculated z-score.

For a step-by-step example that shows how to conduct an analysis using the normal approximation with a continuity correction, see Problem 1.

Normal Approximation Without Continuity Correction

The normal approximation to the binomial without a continuity correction is identical to the normal approximation to the binomial with a continuity correction, except we skip the continuity step. To conduct the analysis without a continuity correction, follow these three steps: (1) compute the mean and standard deviation, (2) calculate the z-score, and (3) find the probability.

Compute the mean and standard deviation. For a binomial random variable (x), the mean (μ) and the standard deviation (σ) are:
μ = nP

σ = sqrt [n * P * (1-P) ]

where P is the probability of success on any given trial (i.e., the proportion of successes in the population), and n is the number of trials.
Calculate z-score. Enter the the number of successes (x) into the z-score formula:
z = (x - μ) / σ
Find the probability. Use a standard normal distribution table, statistical software, or an online calculator (like Stat Trek's Normal Distribution Calculator) to find a probability for the calculated z-score.

For a step-by-step example that shows how to conduct an analysis using the normal approximation without a continuity correction, see Problem 2.

Continuity Correction: Pros and Cons

The main appeal of the normal approximation with a continuity correction is accuracy. It generally produces a better approximation of the true binomial probabilities, especially when sample size is small or when the true population proportion is close to zero or one.

The main appeal of the normal approximation without a continuity correction is simplicity. Without a continuity correction, the calculations are less complicated; and skipping the continuity correction can save time.

In practice, the continuity correction is most beneficial when sample sizes are small and/or when the probability of success is very small (smaller than 0.1) or very big (bigger than 0.9). When the probability of success is not too extreme (neither very big nor very small) and the sample size is large, omitting the continuity correction does little harm and simplifies the process.

Test Your Understanding

In this section, we work through two problems to illustrate how to conduct an analysis using the normal approximation to the binomial. Problem 1 conducts the analysis with a continuity correction; Problem 2, without a continuity correction. For both problems, we use the Normal Distribution Calculator to compute probability.

Normal Distribution Calculator

The normal calculator solves common statistical problems, based on the normal distribution. The calculator computes cumulative probabilities, based on three simple inputs. Simple instructions guide you to an accurate solution, quickly and easily. If anything is unclear, frequently-asked questions and sample problems provide straightforward explanations. The calculator is free. It can found in the Stat Trek main menu under the Stat Tools tab. Or you can tap the button below.

Normal Distribution Calculator

Problem 1: Normal Approximation With Continuity Correction

Suppose it were possible to take a simple random sample of 120 newborns. Find the probability that no more than 48 of the newborns (40%) will be boys. Assume equal probabilities for the births of boys and girls. Assume also that the number of births in the population (N) is very large, essentially infinite.

To solve this problem, use the normal approximation to the binomial with a continuity correction.

Solution:

This problem satisfies the conditions that allow us to use the normal approximation to the binomial.

Population size (N = ∞) is at least 20 times sample size (n = 120).
The sampling method is simple random sampling.
n * p ≥ 10, where p is the sample proportion.
n * (1 - p) ≥ 10.

Therefore, we can use the normal approximation to the binomial to find probability. We'll use the version with a continuity correction.

Step 1. Compute the mean and standard deviation. In the population, 50% of births are boys, so the mean is:
μ = nP = 120 * 0.5 = 60

And the standard deviation is:

σ = sqrt [ n * P * (1-P) ]

σ = sqrt ( 120 * 0.5 * 0.5 ) = 5.477
Step 2. Apply a continuity correction to the number of successes (x) in the sample. For P(x ≤ 48), use P(Y ≤ 48.5).
Step 3. Calculate z-score. In this problem, x (with a continuity correction) is 0.48.5, and the z-score formula is:
z = (x - μ)/SD = (48.5 - 60)/5.477 = -2.1
Step 4. Find the probability. To find this probability, we use Stat Trek's Normal Distribution Calculator. Specifically, we enter the following inputs: -2.1, for the z-score; 0, for the mean; and 1, for the standard deviation. (It is not necessary to compute the mean or standard deviation of the z-score, because every z-score has a mean of 0 and a standard deviation of 1.)

The Calculator tells us that the probability that the number of male births in our sample will no greater than 48 is 0.01786. Not very likely.

Note: In the previous lesson, we explained how to solve this problem using the binomial formula. The binomial approach is actually the more exact analysis. When this problem is treated as a binomial experiment, we find a probability of 0.01766 (versus a probability of 0.01786 that we found using the normal distribution). Though not an exact match, the normal approximation to the binomial with a continuity correction is very close.

Problem 2: Normal Approximation Without Continuity Correction

Suppose it were possible to take a simple random sample of 120 newborns. Find the probability that no more than 48 of the newborns (40%) will be boys. Assume equal probabilities for the births of boys and girls. Assume also that the number of births in the population (N) is very large, essentially infinite.

To solve this problem, use the normal approximation to the binomial without a continuity correction.

Solution:

This problem satisfies the conditions that allow us to use the normal approximation to the binomial.

Population size (N = ∞) is at least 20 times sample size (n = 120).
The sampling method is simple random sampling.
n * p ≥ 10, where p is the sample proportion.
n * (1 - p) ≥ 10.

Therefore, we can use the normal approximation to the binomial to find probability. We'll use the version without a continuity correction.

Step 1. Compute the mean and standard deviation. In the population, 50% of births are boys, so the mean is:
μ = nP = 120 * 0.5 = 60

And the standard deviation is:

σ = sqrt [ n * P * (1-P) ]

σ = sqrt ( 120 * 0.5 * 0.5 ) = 5.477
Step 2. Calculate z-score. In this problem, x is 48, and the z-score formula is:
z = (x - μ)/SD = (48 - 60)/5.477 = -2.19
Step 3. Find the probability. To find this probability, we use Stat Trek's Normal Distribution Calculator. Specifically, we enter the following inputs: -2.19, for the z-score; 0, for the mean; and 1, for the standard deviation. (It is not necessary to compute the mean or standard deviation of the z-score, because every z-score has a mean of 0 and a standard deviation of 1.)

The Calculator tells us that the probability that the number of male births in our sample will no greater than 48 is 0.01426.

Note: In the previous lesson, we explained how to solve this problem using the binomial formula. The binomial approach is actually the more exact analysis. When this problem is treated as a binomial experiment, we find a probability of 0.01766. The analysis with the continuity correction (from Problem 1) found a probability of 0.01786, which was very close to the exact probability computed using the binomial formula. The present analysis without a continuity correction found a probability of 0.01426. The illustrates the improvement in accuracy that can result when the continuity correction is used.

Last lesson Next lesson