HOME Module List Site Index About This Site glossary" Resources Innovative Technology Center UT Statistics Courses
Using SAS Choose Design ANOVA Compare Means Regression Examples
ANOVA Diagnostics Start     4 of 8

C. Check normality.

 Scroll the Results Viewer Window until you see this table, below the least squares means and above the equal variance diagnostic.


Shapiro-Wilk W value should be above 0.90 for normality. In the above example, the 0.804 value shows that the data are probably not normal. If you have a large data set, over 2000 observations, then Shapiro-Wilk is not printed, and you should use the Kolmogorov-Smirnov D test. A reasonable rule is to consider this to be 1-Shaprio-Wilk, so values above 0.10 (1-0.90) are suspicious, and above 0.20 (1-0.80) are very worrisome.

     If the W is between 0.8 and 0.9, then check the Pr < W. If it is
      > 0.05, then you would accept the null hypothesis that the data
      are normal. If it is < 0.05, then you should examine the
      normality plots.

     If the W is below 0.8, then a transformation is probably required.

Normality plots give visual representations of the normal distribution. The easiest plot is shown here, showing the symmetric bell-shaped normal curve in blue as compared to the black histogram of the actual data

     If the plots are symmetric, bell-shaped, with a single peak, then normality can be accepted despite having a low W value above. Here we see that the normality problem is most likely due to the outlier, which we know from Step A is observation 25 (circled). If that is corrected normality will be very acceptable.



A slightly more difficult to interpret plot is the Probability Plot, shown below. If the residual follow a normal distribution, the cricles (observed data) will lie close to the straight reference line. The difficulty is deciding what is "close enough". Generally, deviations will be "obvious", such as the observation 25 outlier that is circled.

In conclusion, based on this example we would conclude that there is a normality problem, but caused by a single outlier, observation 25.


next >> ( Some additional normality examples )



  H I N T S :
  A Shapiro-Wilk of 1.0 means perfect mathematical normality.
  If you have more than 2000 observations, the Shapiro-Wilk is not calculated.

Home | Contact us | Module list & summary | Glossary/Terms | About this site | Stats courses | Links | Index