HOME Module List Site Index About This Site glossary" Resources Innovative Technology Center UT Statistics Courses
Using SAS Choose Design ANOVA Compare Means Regression Examples
Using SAS Using SAS: Table of Contents

Working with SAS datasets
Creating new variables

The DATA step in SAS provides a very complete set of mathematical and logical instructions for creating new variables from your observed data. The typical work flow is to read the data into SAS (see Accessing data), then use additional DATA steps to rearrange the data. This flow is illustrated by:
DATA one;
INPUT xx yy;
1 2

DATA two; SET one;
zz = xx+yy;

DATA One is created by reading in one observation for two variables. Then DATA Two is created by copying One with the SET statement, and creating a new variable ZZ which is the sum of the original two variables.

Some commonly used manipulations are listed here. Constants, variable names, and SAS functions can be used to describe any equation to create the new variable on the left side of the equation.

y=LOG(x); y=EXP(x);
Natural log and its inverse, exponentiation.

y=LOG10(x); y = 10**x;
Log base 10 and its inverse, 10 to the power x.

y = (2*x-5)/(6+1); y = 2*x-5/6+1;
The usual add, subtract, multiply and divide. Use parentheses to control the order of calculation. The two examples will give different answers because SAS does addition and subtraction last.

y = SIN(x); y=ARSIN(x);
Trig function and its inverse.

IF y=10 THEN x=SQRT(7);
Logical functions provide conditional processing, and also include ^= (NE), >= (GE), <= (LE), < (LT) and > (GT), where symbols or letter codes can be used. Also AND (&), NOT (!) and OR (|) are available.

IF y IN(10,11,12) THEN x=y/2;
If y equals 10 or 11 or 12, then x is set to half of y.
ELSE x=y;
Otherwise, x is equal to y.

y = MEAN (OF x1 x2 x3);
Calculate the mean of the listed variables. This differs from y = (x1+x2+x3)/3 if any of the x's have missing values. The latter will result in a missing value, but the MEAN function calculates the mean of only those variables without missing values.

y = RANUNI(0);
y is set to a random number from the uniform distribution, using a starting value of zero (the starting value is taken from the computer time in that case). A wide variety of other statistical random numbers and functions are available, but this particular example is often used to randomize experiments.

Generally SAS language follows the expected mathematical conventions, but look at the full documentation when in doubt.

Related Topics:                                                                                                     Using SAS: Table of Contents
 Merging datasets
 Creating new variables
Subsetting the dataset
bullet Miscellaneous

Home | Contact us | Module list & summary | Glossary/Terms | About this site | Stats courses | Links | Index