HOME Module List Site Index About This Site glossary" Resources Innovative Technology Center UT Statistics Courses
Using SAS Choose Design ANOVA Compare Means Regression Examples
ANOVA Diagnostics Start     2 of 8  

A. Check for outliers.

 In the Results Viewer window, look at the boxplots near the bottom to
     see if there are outliers.

     Boxplots are produced for each factor in the experiment, as well as a single, overall boxplot for all residuals. A box plot displays a box that contains the middle 50% of the residuals, with the median shown as a line inside the box, and the mean as a diamond symbol. Fences extend out from the box, with a terminal bar drawn at the last observation that is no more than 1.5 times the box height away from the box. Observations more extreme than 1.5 times the box height are given the "o" symbol and are potential outliers, while severe cases (more than 3 times the box height away) are labeled with the observation number.

In the example below, boxplots are given for each of the Diet treatment levels. Fesc+Clo has no outliers, Fescue has one potential outlier, and Clover has a severe outlier, labeled as observation 25.

Additional information on extreme values is given in a table like this, which shows the 5 largest and 5 smallest residuals with observation number. Observation 25 can be seen to have a residual of 127.38, extremely larger than any other observation.

But %mmaov macro also examines residuals, and those with Studentized Residuals more extreme than 2.5, or influential observations with Residual Likelihood Distance (RLD) greater than 2, are printed as seen here. These tables give the observation number and the original data, helpful for the correction process.

Depending on your graphics settings, you may also find plots like below in the output, which are graphical displays of all the studentized residuals and RLD values. Observation 25 is circled in the graphs. Some of these graphs will also be used in the next step.

  If you have an outlier, go to your dataset and count down to the observation indicated. Then make a scientific decision as to whether this point is believable or not. If the observation is invalid, then delete or correct it.
              One thing to realize is that unequal variance and non-normality can create apparent outliers
              (data points that you do not want to remove, because they are legitimate values). You can use
              transformations in Steps B and C to correct many potential outliers. Do not delete valid data.

If you do a transformation, re-run the ANOVA and if the outlier remains in the boxplot, that is stronger evidence for an outlier. At this point, you must decide whether to delete the extreme value or not. If scientifically invalid (data entry errors, experimental procedure errors), the outlier should be deleted from the analysis. Simply replace the value with a period and SAS will treat it as missing.

If you have deleted/corrected an outlier(s), then in Step E you be instructed to re-run your SAS analysis on the revised data set.

But first proceed to the next step and check for equal variance.

 next >> ( Step B: Check for Equal Variance )

  H I N T S :
  To number observation data lines in SAS
  An apparent or potential outlier is a data value that has been flagged as extreme in the statistical analysis. We
      use this as distinct from an "outlier" (a real or actual outlier), which is invalid data. Do not automatically
      discard apparent outliers.
  Mistakes in data entry, and errors and limitations in experimental procedure, can result in outliers in the dataset.       Biological variation typically does not result in outliers. Extreme individuals are not outliers but severely diseased       individuals, for instance, may be.
  If you rerun your SAS analysis, always check all diagnostics. For example, removing some outliers may cause
      other outliers to be identified.
  For a more detailed explanation of ANOVA output, consult a statistics text or the Using SAS module.
  This ANOVA Diagnostics module serves all experimental and treatment design combinations. After finishing it,
      you may need to return to your analysis module and re-run SAS (maybe many times).

Home | Contact us | Module list & summary | Glossary/Terms | About this site | Stats courses | Links | Index