Wednesday, August 06, 2008

100 times faster engine behind SAS indexes

Sas datasets are kept in a very efficient dataformat. So how to make
it even faster?

One of the biggest potential performance increases of SAS base
programs hides behind the definition of indexes on your datasets. Here
is a little example.
First we select records from a pretty large table (about 20 millions
records) using filter on the CODSKL. This takes 1:50.00 min.

896 proc sql;
897 create table ctt_bto as
898 select *
899 from cma.cma_src_kcs
900 where CODSKL=238
901 ;
NOTE: Table WORK.CTT_BTO created, with 223 rows and 31 columns.

902 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:50.00
cpu time 57.39 seconds


Now let's define a simple index on the dataset and try again. Notice
that the definition of the index takes about the same time as the
previous step. This is however has to be done once. (unless you
desrtoy index with some brutal sort step.)

903 PROC DATASETS LIBRARY=cma nolist;
904 MODIFY CMA_SRC_KCS(alter=XXXXXX write=XXXXXX);
905 INDEX CREATE CODSKL;
NOTE: Simple index CODSKL has been defined.
906 QUIT;

NOTE: MODIFY was successful for CMA.CMA_SRC_KCS.DATA.
NOTE: PROCEDURE DATASETS used (Total process time):
real time 2:08.03
cpu time 1:28.96

SO now , lets try the first step again.


907
908
909 proc sql;
910 create table ctt_bto as
911 select *
912 from cma.cma_src_kcs
913 where CODSKL=238
914 ;
NOTE: Table WORK.CTT_BTO created, with 223 rows and 31 columns.

915 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.67 seconds
cpu time 0.01 seconds


Wow, 0.67 seconds. Not even a one secod. This is impressive increase
compared to almost 2 minitues. 1:50 min = 110 sec. SO 110/0.67=164
times faster. Impressive, is not it?
Well, now you do not have time anymore to drink a cup of coffee while
your programs are running...

4suc6

Monday, August 04, 2008

running production sas code : error handling

While running different production sas programs I developed an error
handling practice that I find practical, easy to implement and
extensible. So, I like it to share it.

SAS base does not have built in jobs workflow management, this comes
with much more exepnsive extensions. So It is amazing how expensive
things come for free with a little bit of coding.

So the easiness comes as explained below:

Before every significant transformation step in sas code I assign the
same macro variable the name of this step. Just something simple but
meaningful. After the step I call error handling macro. The name
speaks for itself.

Example:

%LET STEP=JOIN WITH HANDELINGEN - TRANSACTIONS;
PROC SQL; CREATE table CM_SRC as
SELECT
a.DATCTTMMT
,a.ID_CTTMMT
,NUMTRA
....
whatever
;
quit;

%CHECK_ERROR;

Extensibility comes from the Macro. The standard macro variable STEP
allows processing all steps in the similar manner. A simple version of
CHECK_ERROR macro outputs in the LOG the name of the step and the
system error code. If the step completes without an error the code is
0. In case of a warning SAS assigns system error a specifiek value (do
not remember exactly what).
Let us take a look at the simple version of CHECK_ERROR.

%macro CHECK_ERROR(mstep=&STEP,merr=&SYSERR);
%put &MSTEP &MERR &SYSMSG;
%mend;

As you can see , the macro outputs the name of the step and two SAS
standard macro variables SYSERR and SYSMSG. In the log it is going to
look like this:


JOIN WITH HANDELINGEN - TRANSACTIONS 0

NOTE: There were 125142 observations read from the data set WORK.CMA_SRC_KCS.
NOTE: The data set WORK.CMA_SRC_KCS has 125142 observations and 22 variables.
NOTE: Compressing data set WORK.CMA_SRC_KCS decreased size by 25.06 percent.
Compressed is 1770 pages; un-compressed would require 2362 pages.
NOTE: PROCEDURE SORT used (Total process time):
real time 3.30 seconds
cpu time 0.73 seconds


As you can see there were no error because the code is 0. Otherwise
this would be something >0.

This version of the macro makes easier finding of steps in the log
and analysing them. You can also check error codes afterwards. As I
said this is a basic version.

I can think of making a mini access database and instead of pumping
messages via the log , send them as the insert transactions in the
access database and after that make a report. If anyone wants to
continue work on that and share ideas - be my guest :)

Maxim