Teach yourself statistics

Teach yourself statistics

Confidence Interval: Sample Mean

This lesson explains how to construct a confidence interval around a sample mean, x.

When to Use This Analysis

The approach described in this lesson is appropriate when the following conditions are met:

The sampling method is simple random sampling.
Population size is at least 20 times sample size.
The sampling distribution of the mean is normal or nearly normal.

Before beginning the analysis, ensure that the conditions listed above are met.

When is it normal?

Generally, it is safe to assume the sampling distribution of the mean will be approximately normal in shape when any of the following statements are true.

The population distribution is normal.
The sampling distribution of the mean is symmetric, unimodal , without outliers, and the sample size is 15 or less.
The sampling distribution of the mean is moderately skewed, unimodal, without outliers, and the sample size is between 16 and 29.
The sample size is 30 or more, without outliers.

The Variability of the Sample Mean

To construct a confidence interval for a sample mean, we need to know the variability of the sample mean. This means we need to know how to compute the standard deviation of the sampling distribution or the standard error.

Standard deviation: Suppose k possible samples of size n can be selected from a population of size N. The standard deviation of the sampling distribution is the "average" deviation between the k sample means and the true population mean, μ. The standard deviation (SD) of the sample mean is:
SD = σ * sqrt{ ( 1/n ) * ( 1 - n/N ) * [ N / ( N - 1 ) ] }
where σ is the standard deviation of the population, N is the population size, and n is the sample size. When the population size is much larger (at least 20 times larger) than the sample size, the standard deviation can be approximated by:
SD = σ / sqrt( n )
Standard error: When the standard deviation of the population σ is unknown, the standard deviation of the sampling distribution cannot be calculated. Under these circumstances, use the standard error. The standard error (SE) can be calculated from the equation below.
SE = s * sqrt{ ( 1/n ) * ( 1 - n/N ) * [ N / ( N - 1 ) ] }
where s is the standard deviation of the sample, N is the population size, and n is the sample size. When the population size is much larger (at least 20 times larger) than the sample size, the standard error can be approximated by:
SE = s / sqrt( n )

Note: In real-world analyses, the standard deviation of the population is seldom known. Therefore, the standard error is used more often than the standard deviation.

Alert

The Advanced Placement Statistics Examination only covers the "approximate" formulas for the standard deviation and standard error.

SD = σ / sqrt( n )

SE = s / sqrt( n )

However, students are expected to be aware of the limitations of these formulas; namely, the approximate formulas should only be used when the population size is at least 20 times larger than the sample size, and when the sampling method is simple random sampling.

The Critical Value

The critical value is a factor used to compute the margin of error around a statistic. When the statistic is a sample mean, the critical value can be expressed as a z-score or a t-score.

z-Score. When sample size is large (n ≥ 30) and the standard deviation of the population distribution is known, use a z-score.
t-Score. When the sample size is small (n < 30) or the standard deviation of the population is unknown, use a t-score.

Warning

If sample size is small (n < 30) and the population distribution is distinctly not normal (e.g., heavily skewed or contains outliers), do not express the critical value as a z-score or a t-score. (Such cases are not part of the AP Statistics curriculum and are beyond the scope of what we cover in this tutorial.)

How to Express Critical Value as t-Score

To express the critical value as a t-score, follow these steps.

Compute alpha (α): α = 1 - (confidence level / 100)
- When the confidence level is 99%, α is 1 - 99/100 or 0.01.
- When the confidence level is 95%, α is 1 - 95/100 or 0.05.
- When the confidence level is 90%, α is 1 - 90/100 or 0.1.
Find the critical probability (p*): p* = 1 - α/2
Find the degrees of freedom (df): df = n - 1 (for a mean score from a single sample)
Find the t-score having degrees of freedom equal to df and a cumulative probability equal to the critical probability (p*).

To find the critical t-score, use an online calculator (e.g.,Stat Trek's t Distribution Calculator), a graphing calculator, or a t-distribution statistical table (found in the appendix of most introductory statistics texts).

How to Express Critical Value as z-Score

Common z-score critical values are 1.645 for a 90% confidence level, 1.96 for a 95% confidence level, and 2.576 for a 99% confidence level. To express the critical value as a z-score when the confidence level is not 90%, 95%, or 99%, follow these steps.

Compute alpha (α): α = 1 - (confidence level / 100)
Find the critical probability (p*): p* = 1 - α/2
Find the z-score having a cumulative probability equal to the critical probability (p*).

To find the critical z-score, use an online calculator (e.g, Stat Trek's Normal Distribution Calculator), a graphing calculator, or a normal distribution statistical table (found in the appendix of most introductory statistics texts).

A Judgment Call

Technically, when the population standard deviation is unknown, you should express the critical value as a t-score rather than a z-score, regardless of sample size.

As a practical matter, though, the z-score and the t-score are almost identical when sample size is large (n ≥ 100). And the z-score is easier to use; since z-score critical values (e.g., 1.96 for 95% confidence, 2.576 for 99% confidence) do not change with sample size.

Bottom line: With larger samples (n ≥ 100), the choice between a z-score critical value and a t-score critical value is a judgment call. Analysts often choose the z-score for its ease of use.

How to Find the Confidence Interval for a Mean

Previously, we described how to construct confidence intervals . For convenience, we repeat the five steps below.

Choose the confidence level. The confidence level describes the uncertainty of a sampling plan. Often, researchers choose 90%, 95%, or 99% confidence levels; but any percentage can be used.
Compute the standard deviation or standard error. When the population size is at least 20 times bigger than the sample size, the standard deviation (SD) and the standard error (SE) of the sampling distribution of the mean can be computed from the following formulas:
SD = σ / sqrt( n )

SE = s / sqrt( n )
Find the critical value. Follow the instructions for finding z-score and t-score critical values provided above.
Find the margin of error. You can compute the margin of error, based on one of the following equations.
Margin of error = Critical value * Standard deviation of statistic

Margin of error = Critical value * Standard error of statistic
Specify the confidence interval. The uncertainty is denoted by the confidence level. And the range of the confidence interval is defined by the following equation.
Confidence interval = Sample mean ± Margin of error

In the next section, we work through a problem that shows how to use this approach to construct a confidence interval to estimate a population mean.

Sample Size Calculator

As you may have noticed, the five steps required to specify a confidence interval for a sample mean can involve many time-consuming computations. Stat Trek's Sample Size Calculator does this work for you - quickly, easily, and error-free. In addition to constructing a confidence interval, the calculator creates a summary report that lists key findings and documents analytical techniques. Whenever you need to construct a confidence interval, consider using the Sample Size Calculator. The calculator is free. It can found in the Stat Trek main menu under the Stat Tools tab. Or you can tap the button below.

Sample Size Calculator

Test Your Understanding

Problem 1

Suppose a simple random sample of 150 students is drawn from a population of 3000 college students. Among sampled students, the average IQ score is 115 with a sample standard deviation of 10. What is the 99% confidence interval for the students' IQ score?

(A) 115 + 0.01
(B) 115 + 0.82
(C) 115 + 2.1
(D) 115 + 2.6
(E) None of the above

Solution

The correct answer is (C). The approach that we used to solve this problem is valid when the following conditions are met.

The sampling method must be simple random sampling. This condition is satisfied; the problem statement says that we used simple random sampling.
The sampling distribution should be approximately normally distributed. Because the sample size is large, we know from the central limit theorem that the sampling distribution of the mean will be normal or nearly normal; so this condition is satisfied.

Since the above requirements are satisfied, we can use the following five-step approach to construct a confidence interval.

Select a confidence level. In this analysis, the confidence level is defined for us in the problem. We are working with a 99% confidence level.
Compute the standard deviation or standard error. Since we do not know the standard deviation of the population, we cannot compute the standard deviation of the sample mean; instead, we compute the standard error (SE). Because the sample size is much smaller than the population size, we can use the "approximate" formula for the standard error.
SE = s / sqrt( n ) = 10 / sqrt(150)

SE = 10 / 12.25 = 0.82
Find the critical value. The critical value is a factor used to compute the margin of error. Because we don't know the population standard deviation, we'll express the critical value as a t-score, following instructions for finding t-score critical values described earlier.
- Compute alpha (α):
  α = 1 - (confidence level / 100)
  
  α = 1 - 99/100 = 0.01
- Find the critical probability (p*):
  p* = 1 - α/2 = 1 - 0.01/2 = 0.995
- Find the degrees of freedom (df):
  df = n - 1 = 150 - 1 = 149
- The critical value is the t-score having 149 degrees of freedom and a cumulative probability equal to 0.995. From the t Distribution Calculator, we find that the critical value is about 2.61.
Find the margin of error (ME). We use the margin of error formula to find the margin of error.
ME = critical value * standard error

ME = 2.61 * 0.82 = 2.1
Specify the confidence interval (CI). The range of the confidence interval is defined by the sample mean + margin of error.
CI = x ± ME

CI = 115 ± 2.1
And the uncertainty is denoted by the confidence level (99%).

Therefore, the 99% confidence interval is 112.9 to 117.1. We say we are 99% confident that the true population mean is in the range defined by 115 ± 2.1. Here is what that actually means. If we replicated the study many times (i.e., used the same sampling plan with different samples), the sampling plan we used should produce a confidence interval that includes the true population mean 99% of the time.

Note: You might also use shorthand notation to describe this confidence interval as (112.9, 117.1).

Last lesson Next lesson