HOME Module List Site Index About This Site glossary" Resources Innovative Technology Center UT Statistics Courses
Using SAS Choose Design ANOVA Compare Means Regression Examples
ANOVA Diagnostics Start     2 of 8  

Check for outliers, equal variance, normality, and model validity


A. Check for outliers.

 In the SAS Output Window, look at the plots below the Shapiro-Wilk and
     see if there are outliers.

     On the boxplot, potential outliers are given the * symbol outside the box (not the two that form part of the box and that indicate the median).

     The boxplot at right is an example where there are no outliers.

In the example below, one potential outlier can be seen on the boxplot. This observation corresponds to the extreme value on the Stem-Leaf plot, and can be seen in the Extreme Observations table at the top of the image. The Extreme Observations table identifies the potential outlier (121.433) as observation number 11.


  If you have an outlier, go to your dataset and count down to the observation indicated. Then make a scientific decision as to whether this point is believable or not. If the observation is invalid, then delete or correct it.
              One thing to realize is that unequal variance and non-normality can create apparent outliers
              (data points that you do not want to remove, because they are legitimate values). You can use
              transformations in Steps B and C to make these asterisks go away. Do not delete valid data.

If you do a transformation, re-run the ANOVA and the asterisk remains in the boxplot, that is stronger evidence for an outlier. At this point, you must decide whether to delete the extreme value or not. If scientifically invalid (data entry errors, experimental procedure errors), the outlier should be deleted from the analysis. Simply replace the value with a period and SAS will treat it as missing. This process is the same whether the data are in your SAS program or being imported from an external file.

If you have deleted an outlier(s), then eventually you will have to re-run your SAS analysis on the revised data set.

But first proceed to the next step and check for equal variance.


           To view examples of outliers and their correction


 next >> ( Step B: Check for Equal Variance )



  H I N T S :
  To number observation data lines in SAS
  An apparent or potential outlier is a data value that has been flagged as extreme in the statistical analysis. We
      use this as distinct from an "outlier" (a real or actual outlier), which is invalid data. Do not automatically
      discard apparent outliers.
  Mistakes in data entry, and errors and limitations in experimental procedure, can result in outliers in the dataset.       Biological variation typically does not result in outliers. Extreme individuals are not outliers but severely diseased       individuals, for instance, may be.
  If you rerun your SAS analysis, always check all diagnostics. For example, removing some outliers may cause
      other outliers to be identified.
  For a more detailed explanation of ANOVA output, consult a statistics text or the Using SAS module.
  This ANOVA Diagnostics module serves all experimental and treatment design combinations. After finishing it,
      you may need to return to your analysis module and re-run SAS (maybe many times).

Home | Contact us | Module list & summary | Glossary/Terms | About this site | Stats courses | Links | Index