Teach yourself statistics

Teach yourself statistics

Confidence Interval of a Proportion

This lesson describes how to construct a confidence interval for a sample proportion, p, when the sample size is large.

Key Considerations

The approach described in this lesson is valid whenever the following conditions are met:

The sampling method is simple random sampling.
Population size is at least 20 times larger than sample size.
n * p ≥ 10, where p is the sample proportion and n is sample size.
n * (1 - p) ≥ 10.

The third and fourth conditions will be satisfied whenever the sample includes at least 10 successes and 10 failures. The minimum sample size required to produce at least 10 successes and at least 10 failures would be 20. But if the population proportion were extreme (i.e., close to 0 or 1), it is likely that a much larger random sample would be needed to produce at least 10 successes and 10 failures.

For example, imagine that the probability of success were 0.1, and the sample were selected using simple random sampling. In this situation, a sample size close to 100 might be needed to get 10 successes.

The Variability of the Sample Proportion

To construct a confidence interval for a sample proportion, we need to know the variability of the sample proportion. This means we need to know how to compute the standard deviation or the standard error of the sampling distribution.

Standard deviation: Suppose k possible samples of size n can be selected from the population. The standard deviation of the sampling distribution is the "average" deviation between the k sample proportions and the true population proportion, P. The standard deviation of the sample proportion (SD) is:
SD = sqrt[ P * ( 1 - P ) / n ] * sqrt[ ( N - n ) / ( N - 1 ) ]
where P is the population proportion, n is the sample size, and N is the population size. When the population size is much larger (at least 20 times larger) than the sample size, the standard deviation can be approximated by:
SD = sqrt[ P * ( 1 - P ) / n ]
Standard error: When the true population proportion P is not known, the standard deviation of the sampling distribution cannot be calculated. Under these circumstances, use the standard error. The standard error (SE) can be calculated from the equation below.
SE = sqrt[ p * ( 1 - p ) / n ] * sqrt[ ( N - n ) / ( N - 1 ) ]
where p is the sample proportion, n is the sample size, and N is the population size. When the population size at least 20 times larger than the sample size, the standard error can be approximated by:
SE = sqrt[ p * ( 1 - p ) / n ]

Alert

The Advanced Placement Statistics Examination only covers the "approximate" formulas for the standard deviation and standard error.

SD = sqrt[ P * ( 1 - P ) / n ]

SE = sqrt[ p * ( 1 - p ) / n ]

However, students are expected to be aware of the limitations of these formulas; namely, the approximate formulas should only be used when the population size is at least 20 times larger than the sample size and when the sampling method is simple random sampling.

How to Find the Confidence Interval for a Proportion

Previously, we described how to construct confidence intervals. For convenience, we repeat the five steps below.

Choose the confidence level. The confidence level describes the uncertainty of a sampling plan. Often, researchers choose 90%, 95%, or 99% confidence levels; but any percentage can be used.
Compute the standard deviation or standard error. The standard deviation (SD) of the sampling distribution for a proportion and the standard error (SE) can be computed from the following formulas:
SD = sqrt[ P * ( 1 - P ) / n ]

SE = sqrt[ p * ( 1 - p ) / n ]
Find the critical value. The critical value for a sample proportion will be a z-score. The value of the z-score depends on the confidence level. Common z-score critical values are 1.645 for a 90% confidence level, 1.96 for a 95% confidence level, and 2.576 for a 99% confidence level. Instructions for finding other z-score critical values are provided in the lesson on margin of error.
Find the margin of error. You can compute the margin of error (ME), based on one of the following equations.
ME = CV * SD

ME = CV * SE

where CV is the z-score critical value, SD is the standard deviation of the sampling distribution for the proportion, and SE is the standard error.
Specify the confidence interval. The uncertainty is denoted by the confidence level. And the range of the confidence interval (CI) is defined by the following equation.
CI = p ± ME

where p is the sample proportion and ME is the margin of error.

In the next section, we work through a problem that shows how to use this approach to construct a confidence interval for a proportion.

Sample Size Calculator

As you may have noticed, the five steps required to specify a confidence interval for a proportion can involve many time-consuming computations. Stat Trek's Sample Size Calculator does this work for you - quickly, easily, and error-free. In addition to constructing a confidence interval, the calculator creates a summary report that lists key findings and documents analytical techniques. Whenever you need to construct a confidence interval, consider using the Sample Size Calculator. The calculator is free. It can found in the Stat Trek main menu under the Stat Tools tab. Or you can tap the button below.

Sample Size Calculator

Test Your Understanding

Problem 1

The Daily Planet, a major metropolitan newspaper, selected a simple random sample of 1,600 readers from their list of 100,000 subscribers. They asked whether the paper should increase its coverage of local news. Forty percent of the sample wanted more local news. What is the 99% confidence interval for the proportion of readers who would like more coverage of local news?

(A) 0.30 to 0.50
(B) 0.32 to 0.48
(C) 0.35 to 0.45
(D) 0.37 to 0.43
(E) 0.39 to 0.41

Solution

The answer is (D). The approach that we used to solve this problem is valid when the following conditions are met.

The sampling method must be simple random sampling. This condition is satisfied; the problem statement says that we used simple random sampling.
The sample should include at least 10 successes and 10 failures. Suppose we classify a "more local news" response as a success, and any other response as a failure. Then, we have 0.40 * 1600 = 640 successes, and 0.60 * 1600 = 960 failures - plenty of successes and failures.
If the population size is much larger than the sample size, we can use an "approximate" formula for the standard deviation or the standard error. This condition is satisfied, so we will use an "approximate" formula to compute standard error.

Since the above requirements are satisfied, we can use the following five-step approach to construct a confidence interval.

Select a confidence level. In this analysis, the confidence level is defined for us in the problem. We are working with a 99% confidence level.
Compute the standard deviation or standard error. Since we do not know the population proportion, we cannot compute the standard deviation; therefore, we compute the standard error. And since the population is more than 20 times larger than the sample, we can use the following "approximate" formula to compute the standard error (SE) of the proportion:
SE = sqrt [ p(1 - p) / n ]

SE = sqrt [ (0.4)*(0.6) / 1600 ] = 0.012
Find critical value. The critical value (CV) is a factor used to compute the margin of error. We will express the critical value as a z-score.
Common z-score critical values are 1.645 for a 90% confidence level, 1.96 for a 95% confidence level, and 2.576 for a 99% confidence level. So, the z-score for this problem will be 2.576.
Find the margin of error (ME):
ME = CV * SE

ME = 2.576 * 0.012 = 0.03
Specify the confidence interval. The range of the confidence interval is defined by the sample proportion ± margin of error. And the uncertainty is denoted by the confidence level.

Therefore, the 99% confidence interval is the range defined by 0.4 ± 0.03. That is, the 99% confidence interval is 0.37 to 0.43. Here is what that actually means. If we replicated the study many times (i.e., used the same sampling plan with different samples), the sampling plan we used should produce a confidence interval that includes the true population proportion 99% of the time.

Note: You might also use shorthand notation to describe this confidence interval as (0.37, 0.43).

Last lesson Next lesson