Tuesday, July 29, 2008

Computing YTD with SAS

YTD computation method with sas base (year to date)

In one of the weekly reports that I make , the client asked to add
year to date (YTD) sum for all performance metrics. This is the some
for all weeks starting from week 1 etc. So in week 5 the YTD for a
metric N is N in week1+ N in week 2+ ...+ N in week 5.
This is something that SAS does not do automatically and there are
several ways to add this information at the front end. Option 1 is to
add YTD metric for every metric that has to be YTDated. Option 2 is to
add a single variable SCOPE- a mini dimension with two values W, Y.

We will mark all existing records (observations) in the report with W
which means weekly. Additionally we will add a single YTD record per
report row with the value Y (=YTD). Notice that this record should be
added in the report data and i will show how this can be done. So
basically we extend the dataset with extra observations that carry YTD
information. This extended dataset is further used for reporting in
SAS or Excel. using the added dimension SCOPE we can split all metrics
in Weekly or YTD column groups.

Now I will explain the sas code.

proc sort data=OVK_SUM;
by ISOJAAR CODMRK KLANT_SOORT RICHTING WR_CAT ISOWEEK;
quit;

First we need to sort the report data OVK_SUM by variables that are
used to group our metrics. in this case this is Year , code of the
brand, sort of the client, direction, product category and week
(respectively). Notice that the week has to be the last in the list.
The order of other group variables is not important.

data OVK_SUM;
set OVK_SUM;
by ISOJAAR CODMRK KLANT_SOORT RICHTING WR_CAT ISOWEEK;
if first.WR_CAT then do; N_YTD=0; PREM_YTD=0; end;
N_YTD+N; PREM_YTD+PREMIE;
SCALE="W";
OUTPUT;
if LAST.ISOWEEK then do; PREMIE=PREM_YTD; N=N_YTD; SCALE="Y"; OUTPUT; end;
drop PREM_YTD N_YTD;
run;

In this datastep we compute YTD per metric. So by variables have to be
in the same order as by the proc sort. At the first value of the
variable before the week ( in this case this is WR_CAT) we set
temporary YTD metrics N_YTD and PREM_YTD to zero. Next row, we
increment them in a way that SAS knows that they have to be retained.
(N_YTD+N increments the value of N_YTD with the metric we accumulate
but also the incremented value of N_YTD is taken to the next
observation)
SCALE="W" - update the value for our minidimension and we are ready to
output the observation.
The next line if last.isoweek..... is the main trick of this
algorithm. If the observation is the last in the group we will make
additional output of our YTD metrics. So we make our metrics equal to
the computed YTD metrics but to distinguish them from each other we
set SCALE="Y". Because there is the single YTD records per group the
YTD will be properly summed up.

Limitations of this method:
YTD is relevant only for the group of variables we selected. If the
group is extended YTD computation has to be adjusted.

Advantages:
1. dataset can have additional dimensions or observations. THe records
can be properly grouped in the repot. They do not have to be grouped
in advance.
2. extra mini dimension allows to spilit weekly and accumulated YTD
metrics easily in the report heading. As I use excel pivot tables a
lot, this is quit handy.
3. You can slightly change the method if you prefer to add YTD as a
metric instead of mini dimension. In the LAST.ISOWEEK part just
assign value to this YTD metric without extra output. Ready.

So here it is , YTD. Mail me if some quetions occur.

With kind regards
Maxim Ivashkov

Wednesday, July 23, 2008

Treatment for overlaping timestamps in the dimensions with the history

/********************************************************
in datasets with historical records (update type 2 )
the validity of the entity has the beging and the end timestamps.
If the timestamps overlap with eachother for the same entity, this will
duplicate records when you join the dimension with a fact table.
This program will mark all overlaping time stamps.
Replace my names with yours.
Example of the overlaping timestamps:
NUMKLT DATING DATEND
234 01JAN2006 31DEC2008
234 12DEC2006 .
********************************************************/
data srclib.wgvbnd;
set srclib.wgvbnd; * data where the historical records are saved;
* datend - end datestamp;
* dating - begin date stamp;
* NUMKLT - the entity - client number;

by numklt dating;
retain prev_end;
if first.numklt then do;
prev_end=datend;
N=0;
end;
else if prev_end>dating then weird=1;else weird=.;
N+1;
drop prev_end;
run;

Thursday, July 17, 2008

Grouping records with proc sql

I came across a weird thing in SAS Base while using proc sql to group records.
When you group records without using statistical functions gives
different results when you do use a statistical function. It seems to
me very strange. Example.

proc sql;
create table updt_pro as
select codmrk, coddvg, codkpr, codprd, count(*) as n
from ovk_pro_stg_vw
group by codmrk, coddvg, codkpr, codprd
;
quit;

gives me 367 records, and as expected the combinations of group
variables are unique.

the same code without count function gives 983450 records.
proc sql;
create table updt_pro as
select codmrk, coddvg, codkpr, codprd
from ovk_pro_stg_vw
group by codmrk, coddvg, codkpr, codprd
;
quit;

it is obligatory to use at least one aggregate statistical
function, otherwise SAS simply transform group by into order by.

(WARNING: A GROUP BY clause has been transformed into an ORDER BY
clause because neither the SELECT clause nor the optional HAVING
clause of the associated table-expression referenced a
summary function.
)

Tip: when you want to find unique combination of the group variables use select distinct:
proc sql;
create table updt_pro as
select distinct codmrk, coddvg, codkpr, codprdfrom ovk_pro_stg_vw
;
quit;


Maxim

Monday, July 14, 2008

SAS PROC DATASET WITH APPEND

I use PROC DATASETS with APPEND statement as a way to add transactions
to the base dataset. My base dataset has a combined key to prevent
dupplicate transactions beeing added. So far it worked fine while no
dupplication occured.

When I tryed to add duplicated transactions because of slight date
overlap, proc dataset started to hangout with no response. I would
expect behaviour similar to proc sql with insert command where the
dupplicate records are rejected and the rest is inserted.

It looks like PROC DATASET has problems with adding records that
violate uniqueness of the index. At least it takes too long to wait
till the end.

The advantage of APPEND is that you do not neet explicitly list all columns,
but this turns out to be an issue with indexed datasets.

http://www.4suc6.com
Share your goals!


--
Met vriendelijke groet,
Maxim

http://www.4suc6.com
Share your goals!