How to Test for Normality: Three Simple Tests

Many statistical techniques (regression, ANOVA, t-tests, etc.) rely on the assumption that data is normally distributed. For these techniques, it is good practice to examine the data to confirm that the assumption of normality is tenable.

With that in mind, here are three simple ways to test interval-scale data or ratio-scale data for normality.

Each option is easy to implement with Excel, as shown below.

Descriptive Statistics

Perhaps, the easiest way to test for normality is to examine several common descriptive statistics. Here's what to look for:

  • Central tendency. The mean and the median are summary measures used to describe central tendency - the most "typical" value in a set of values. With a normal distribution, the mean is equal to the median.
  • Skewness. Skewness is a measure of the assymmetry of a probability distribution. If observations are equally distributed around the mean, the skewness value is zero; otherwise, the skewness value is positive or negative. As a rule of thumb, skewness between -2 and +2 is consistent with a normal distribution.
  • Kurtosis. Kurtosis is a measure of whether observations cluster around the mean of the distribution or in the tails of the distribution. The normal distribution has a kurtosis value of zero. As a rule of thumb, kurtosis between -2 and +2 is consistent with a normal distribution.

Together, these descriptive measures provide a useful basis for judging whether a dataset satisfies the assumption of normality.

Example 1

To see how to compute descriptive statistics in Excel, consider the following dataset:

Sample data
1.5
2.5
3.0
3.5
3.9
4.1
4.6
5.1
5.5
5.9
6.2
7.3
7.7
8.1
9.5

Begin by entering data in a column or row of an Excel spreadsheet:

Next, from the navigation menu in Excel, click Data / Data analysis. That displays the Data Analysis dialog box. From the Data Analysis dialog box, select Descriptive Statistics and click the OK button:

Then, in the Descriptive Statistics dialog box, enter the input range, and click the Summary Statistics check box. The dialog box, with entries, should look like this:

And finally, to display summary statistics, click the OK button on the Descriptive Statistics dialog box. Among other outputs, you should see the following:

The mean is nearly equal to the median. And both skewness and kurtosis are between -2 and +2.

Conclusion: These descriptive statistics are consistent with a normal distribution.

Histogram

Another easy way to test for normality is to plot data in a histogram, and see if the histogram reveals the bell-shaped pattern characteristic of a normal distribution. With Excel, this is a a four-step process:

  • Enter data. This means entering data values in an Excel spreadsheet. The column, row, or range of cells that holds data is the input range.
  • Define bins. In Excel, bins are category ranges. To define a bin, you enter the upper range of each bin in a column, row, or range of cells. The block of cells that holds upper-range entries is called the bin range.
  • Plot the data in a histogram. In Excel, access the histogram function through: Data / Data analysis / Histogram.
  • In the Histogram dialog box, enter the input range and the bin range; and check the Chart Output box. Then, click OK.

If the resulting histogram looks like a bell-shaped curve, your work is done. The dataset is normal or nearly normal. If the curve is not bell-shaped, the data may not be normal.

Example 2

To see how to plot data for normality with a histogram in Excel, we'll use the same dataset (shown below) that we used in Example 1.

Sample data
1.5
2.5
3.0
3.5
3.9
4.1
4.6
5.1
5.5
5.9
6.2
7.3
7.7
8.1
9.5

Begin by entering data to define an input range and a bin range. Here is what data entry looks like in an Excel spreadsheet:

Next, from the navigation menu in Excel, click Data / Data analysis. That displays the Data Analysis dialog box. From the Data Analysis dialog box, select Histogram and click the OK button:

Then, in the Histogram dialog box, enter the input range, enter the bin range, and click the Chart Output check box. The dialog box, with entries, should look like this:

And finally, to display the histogram, click the OK button on the Histogram dialog box. Here is what you should see:

The plot is fairly bell-shaped - an almost-symmetric pattern with one peak in the middle. Given this result, it would be safe to assume that the data were drawn from a normal distribution. On the other hand, if the plot were not bell-shaped, you might suspect the data were not from a normal distribution.

Chi-Square Test

The chi-square goodness of fit test is another good option for determining whether a set of data was sampled from a normal distribution.

Note: All chi-square tests assume that the data under investigation was sampled randomly.

Hypothesis Testing

The chi-square test is an actual hypothesis test, where we examine observed data to choose between two statistical hypotheses:

  • Null hypothesis: Data is sampled from a normal distribution.
  • Alternative hypothesis: Data is not sampled from a normal distribution.

Like many other techniques for testing hypotheses, the chi-square test for normality involves computing a test-statistic and finding the P-value for the test statistic, given degrees of freedom and significance level. If the P-value is bigger than the significance level, we accept the null hypothesis; if it is smaller, we reject the null hypothesis.

How to Conduct the Chi-Square Test

The steps required to conduct a chi-square test of normality are listed below:

  • Specify the significance level.
  • Find the mean, standard deviation, sample size for the sample.
  • Define non-overlapping bins.
  • Count observations in each bin, based on actual dependent variable scores.
  • Find the cumulative probability for each bin endpoint.
  • Find the probability that an observation would land in each bin, assuming a normal distribution.
  • Find the expected number of observations in each bin, assuming a normal distribution.
  • Compute a chi-square statistic.
  • Find the degrees of freedom, based on sample size.
  • Find the P-value for the chi-square statistic, based on degrees of freedom.
  • Accept or reject the null hypothesis, based on P-value and significance level.

So you will understand how to accomplish each step, let's work through an example, one step at a time.

Example 3

To demonstrate how to conduct a chi-square test for normality in Excel, we'll use the same dataset (shown below) that we've used for the previous two examples. Here it is again:

Sample data
1.5
2.5
3.0
3.5
3.9
4.1
4.6
5.1
5.5
5.9
6.2
7.3
7.7
8.1
9.5

Now, using this data, let's check for normality.

Specify Significance Level

The significance level is the probability of rejecting the null hypothesis when it is true. Researchers often choose 0.05 or 0.01 for a significance level. For the purpose of this exercise, let's choose 0.05.

Find the Mean, Standard Deviation, and Sample Size

To compute a chi-square test statistic, we need to know the mean, standared deviation, and sample size. Excel can provide this information. Here's how:

  • Step 1. Enter data in a column or row of an Excel spreadsheet, like this:
  • Step 2. From the navigation menu in Excel, click Data / Data analysis. That displays the Data Analysis dialog box. From the Data Analysis dialog box, select Descriptive Statistics and click the OK button:
  • Step 3. In the Descriptive Statistics dialog box, enter the input range, and click the Summary Statistics check box. The dialog box, with entries, should look like this:
  • Step 4. To display the mean, standard deviation, and sample size, click the OK button on the Descriptive Statistics dialog box. Among other outputs, you should see the following:

Define Bins

To conduct a chi-square analysis, we need to define bins, based on dependent variable scores. Each bin is defined by a non-overlapping range of values.

For the chi-square test to be valid, each bin should hold at least five observations. With that in mind, we'll define three bins for this example, as shown below:

Bin 1 will hold dependent variable scores that are less than 4; Bin 2, scores between 4 and 6; and Bin 3, scores greater than 6.

Note: The number of bins is an arbitrary decision made by the experimenter. For this example, we chose to define three bins. Given the small sample, if we used more bins, at least one bin would have fewer than five observations.

Count Observed Data Points in Each Bin

The next step is to count the observed data points in each bin. The figure below shows sample observations allocated to bins, with a frequency count for each bin in the final row.

Note: We have five observed data points in each bin - the minimum required for a valid chi-square test of normality.

Find Cumulative Probability

A cumulative probability refers to the probability that a random variable is less than or equal to a specific value. In Excel, the NORMDIST function computes cumulative probabilities from a normal distribution.

Assuming our data follows a normal distribution, we can use the NORMDIST function to find cumulative probabilities for the upper endpoints in each bin. Here is the formula we use:

P j = NORMDIST (MAX j , X, s, TRUE)

where P j is the cumulative probability for the upper endpoint in Bin j , MAX j is the upper endpoint for Bin j , X is the mean of the dataset, and s is the standard deviation of the dataset.

When we execute the formula in Excel, we get the following results:

P 1 = NORMDIST (4, 5.23, 2.25, TRUE) = 0.292

P 2 = NORMDIST (6, 5.23, 2.25, TRUE) = 0.634

P 3 = NORMDIST (∞, 5.23, 2.25, TRUE) = 1.000

Note: For Bin 3, the upper endpoint is positive infinity, which is represented by ∞.

Find Bin Probability

Given the cumulative probabilities shown above, it is possible to find the probability that a randomly selected observation would fall in each bin, using the following formulas:

P( Bin = 1 ) = P 1 = 0.292

P( Bin = 2 ) = P 2 - P 1 = 0.634 - 0.292 = 0.342

P( Bin = 3 ) = P 3 - P 2 = 1.000 - 634 = 0.366

Find Expected Number of Observations

Assuming a normal distribution, the expected number of observations in each bin can be found by using the following formula:

Exp j = P( Bin = j ) * n

where Exp j is the expected number of observations in Bin j , P( Bin = j ) is the probability that a randomly selected observation would fall in Bin j , and n is the sample size

Applying the above formula to each bin, we get the following:

Exp 1 = P( Bin = 1 ) * 15 = 0.292 * 15 = 4.38

Exp 2 = P( Bin = 2 ) * 15 = 0.342 * 15 = 5.13

Exp 3 = P( Bin = 3 ) * 15 = 0.366 * 15 = 5.49

Compute Chi-Square Statistic

Finally, we can compute the chi-square statistic ( χ2 ), using the following formula:

χ2 = Σ [ ( Obs j - Exp j ) 2 / Exp j ]

where Obs j is the observed number of observations in Bin j , and Exp j is the expected number of observations in Bin j .

Therefore,

χ2 =
( 5-4.38 )2

4.38
+
( 5-5.13 )2

5.13
+
( 5-5.49 )2

5.49
χ2 = 0.133

Find Degrees of Freedom

Assuming a normal distribution, the degrees of freedom (df) for a chi-square goodness of fit test equals the sample size ( n ) minus three. Therefore,

df = n - 3 = 15 - 3 = 12

Find P-Value

The P-value is the probability of seeing a chi-square test statistic that is more extreme (bigger) than the observed chi-square statistic. For this problem, we found that the observed chi-square statistic was 0.133. Therefore, we want to know the probability of seeing a chi-square test statistic bigger than 0.133.

Use Stat Trek's Chi-Square Calculator to find that probability. Enter the degrees of freedom (12) and the observed chi-square statistic (0.133) into the calculator; then, click the Calculate button.

From the calculator, we see that P( X2 > 0.133 ) equals one. (Actually, the probability is something like 0.999. The calculator rounds it up to 1.000.)

Test Null Hypothesis

When the P-Value is bigger than the significance level, we cannot reject the null hypothesis. When it is smaller, we cannot accept the null hypothesis.

Here, the P-Value (1.00) is bigger than the significance level (0.05), so we accept the null hypothesis that the data tested follows a normal distribution.