SAS - Sort Data Sets


Advertisements

Data sets in SAS can be sorted on any of the variables present in them. This helps both in data analysis and performing other options like merging etc. Sorting can happen on any single variable as well as multiple variables. The SAS procedure used to carry out the sorting in SAS data set is named PROC SORT. The result after sorting is stored in a new data set and the original data set remains unchanged.

Syntax

The basic syntax for sort operation in data set in SAS is −

PROC SORT DATA = original dataset OUT = Sorted dataset;
   BY variable name;

Following is the description of the parameters used −

  • variable name is the column name on which the sorting happens.

  • Original dataset is the dataset name to be sorted.

  • Sorted dataset is the dataset name after it is sorted.

Example

Let's consider the following SAS data set containing the employee details of an organization. We can sort the data set on salary by using the code given below.

DATA Employee; 
   INPUT empid name $ salary DEPT $ ; 
DATALINES; 
1 Rick 623.3	IT 		 
2 Dan 515.2 	OPS	
3 Mike 611.5 	IT 	
4 Ryan 729.1    HR 
5 Gary 843.25   FIN 
6 Tusar 578.6   IT 
7 Pranab 632.8  OPS
8 Rasmi 722.5   FIN 
;
RUN;

PROC SORT DATA = Employee OUT = Sorted_sal ;
   BY salary;
RUN ;
 
PROC PRINT DATA = Sorted_sal;
RUN ; 

When the above code is executed, we get the following output.

sort1

Reverse Sorting

The default sorting option is in ascending order, which means the observations are arranged as per the lower to higher value of the sorted variable. But we may also want the sort to happen in ascending order.

Example

In the below code reverse sorting is achieved by using DESCENDING statement.

DATA Employee; 
   INPUT empid name $ salary DEPT $ ; 
DATALINES; 
1 Rick 623.3	IT 		 
2 Dan 515.2 	OPS	
3 Mike 611.5 	IT 	
4 Ryan 729.1    HR 
5 Gary 843.25   FIN 
6 Tusar 578.6   IT 
7 Pranab 632.8  OPS
8 Rasmi 722.5   FIN 
;
RUN;

PROC SORT DATA = Employee OUT = Sorted_sal_reverse ;
   BY DESCENDING salary;
RUN ;
 
PROC PRINT DATA = Sorted_sal_reverse;
RUN ; 

When the above code is executed, we get the following output.

sort2

Sorting Multiple Variables

Sorting can be applied to multiple variables by using them with the BY statement. The variables get sorted with a priority from left to right.

Example

In the below code the data set is sorted first on the variable department name and next on the variable name salary.

DATA Employee; 
   INPUT empid name $ salary DEPT $ ; 
DATALINES; 
1 Rick 623.3	IT 		 
2 Dan 515.2 	OPS	
3 Mike 611.5 	IT 	
4 Ryan 729.1    HR 
5 Gary 843.25   FIN 
6 Tusar 578.6   IT 
7 Pranab 632.8  OPS
8 Rasmi 722.5   FIN 
;
RUN;

PROC SORT DATA = Employee OUT = Sorted_dept_sal ;
   BY salary DEPT;
RUN ;
 
PROC PRINT DATA = Sorted_dept_sal;
RUN ; 

When the above code is executed, we get the following output.

sort3
Advertisements