Wednesday, August 06, 2008

100 times faster engine behind SAS indexes

Sas datasets are kept in a very efficient dataformat. So how to make
it even faster?

One of the biggest potential performance increases of SAS base
programs hides behind the definition of indexes on your datasets. Here
is a little example.
First we select records from a pretty large table (about 20 millions
records) using filter on the CODSKL. This takes 1:50.00 min.

896 proc sql;
897 create table ctt_bto as
898 select *
899 from cma.cma_src_kcs
900 where CODSKL=238
901 ;
NOTE: Table WORK.CTT_BTO created, with 223 rows and 31 columns.

902 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:50.00
cpu time 57.39 seconds


Now let's define a simple index on the dataset and try again. Notice
that the definition of the index takes about the same time as the
previous step. This is however has to be done once. (unless you
desrtoy index with some brutal sort step.)

903 PROC DATASETS LIBRARY=cma nolist;
904 MODIFY CMA_SRC_KCS(alter=XXXXXX write=XXXXXX);
905 INDEX CREATE CODSKL;
NOTE: Simple index CODSKL has been defined.
906 QUIT;

NOTE: MODIFY was successful for CMA.CMA_SRC_KCS.DATA.
NOTE: PROCEDURE DATASETS used (Total process time):
real time 2:08.03
cpu time 1:28.96

SO now , lets try the first step again.


907
908
909 proc sql;
910 create table ctt_bto as
911 select *
912 from cma.cma_src_kcs
913 where CODSKL=238
914 ;
NOTE: Table WORK.CTT_BTO created, with 223 rows and 31 columns.

915 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.67 seconds
cpu time 0.01 seconds


Wow, 0.67 seconds. Not even a one secod. This is impressive increase
compared to almost 2 minitues. 1:50 min = 110 sec. SO 110/0.67=164
times faster. Impressive, is not it?
Well, now you do not have time anymore to drink a cup of coffee while
your programs are running...

4suc6

No comments: