Tuesday, July 29, 2008

Computing YTD with SAS

YTD computation method with sas base (year to date)

In one of the weekly reports that I make , the client asked to add
year to date (YTD) sum for all performance metrics. This is the some
for all weeks starting from week 1 etc. So in week 5 the YTD for a
metric N is N in week1+ N in week 2+ ...+ N in week 5.
This is something that SAS does not do automatically and there are
several ways to add this information at the front end. Option 1 is to
add YTD metric for every metric that has to be YTDated. Option 2 is to
add a single variable SCOPE- a mini dimension with two values W, Y.

We will mark all existing records (observations) in the report with W
which means weekly. Additionally we will add a single YTD record per
report row with the value Y (=YTD). Notice that this record should be
added in the report data and i will show how this can be done. So
basically we extend the dataset with extra observations that carry YTD
information. This extended dataset is further used for reporting in
SAS or Excel. using the added dimension SCOPE we can split all metrics
in Weekly or YTD column groups.

Now I will explain the sas code.

proc sort data=OVK_SUM;
by ISOJAAR CODMRK KLANT_SOORT RICHTING WR_CAT ISOWEEK;
quit;

First we need to sort the report data OVK_SUM by variables that are
used to group our metrics. in this case this is Year , code of the
brand, sort of the client, direction, product category and week
(respectively). Notice that the week has to be the last in the list.
The order of other group variables is not important.

data OVK_SUM;
set OVK_SUM;
by ISOJAAR CODMRK KLANT_SOORT RICHTING WR_CAT ISOWEEK;
if first.WR_CAT then do; N_YTD=0; PREM_YTD=0; end;
N_YTD+N; PREM_YTD+PREMIE;
SCALE="W";
OUTPUT;
if LAST.ISOWEEK then do; PREMIE=PREM_YTD; N=N_YTD; SCALE="Y"; OUTPUT; end;
drop PREM_YTD N_YTD;
run;

In this datastep we compute YTD per metric. So by variables have to be
in the same order as by the proc sort. At the first value of the
variable before the week ( in this case this is WR_CAT) we set
temporary YTD metrics N_YTD and PREM_YTD to zero. Next row, we
increment them in a way that SAS knows that they have to be retained.
(N_YTD+N increments the value of N_YTD with the metric we accumulate
but also the incremented value of N_YTD is taken to the next
observation)
SCALE="W" - update the value for our minidimension and we are ready to
output the observation.
The next line if last.isoweek..... is the main trick of this
algorithm. If the observation is the last in the group we will make
additional output of our YTD metrics. So we make our metrics equal to
the computed YTD metrics but to distinguish them from each other we
set SCALE="Y". Because there is the single YTD records per group the
YTD will be properly summed up.

Limitations of this method:
YTD is relevant only for the group of variables we selected. If the
group is extended YTD computation has to be adjusted.

Advantages:
1. dataset can have additional dimensions or observations. THe records
can be properly grouped in the repot. They do not have to be grouped
in advance.
2. extra mini dimension allows to spilit weekly and accumulated YTD
metrics easily in the report heading. As I use excel pivot tables a
lot, this is quit handy.
3. You can slightly change the method if you prefer to add YTD as a
metric instead of mini dimension. In the LAST.ISOWEEK part just
assign value to this YTD metric without extra output. Ready.

So here it is , YTD. Mail me if some quetions occur.

With kind regards
Maxim Ivashkov

1 comment:

M.Ivashkov said...

There is one issue with this algorithm. If there are missing week than YTD will not be outputed. See the corrected algorithm for this issue.



/* compute YTD */
data OVK_SUM;
set OVK_SUM;
RETAIN W_P_N W_P_PREMIE W_P_WEEK;
by ISOJAAR CODMRK KLANT_SOORT RICHTING WR_CAT ISOWEEK;
if first.WR_CAT then do; N_YTD=0;PREM_YTD=0; W_P_N=0; W_P_PREMIE=0; W_P_WEEK=0;end;

N_YTD+N; PREM_YTD+PREMIE;
SCALE="W";
OUTPUT;

if LAST.ISOWEEK then do;
/* output YTD in any case */
PREMIE=PREM_YTD; N=N_YTD; SCALE="Y";
OUTPUT;

/* check if there are missing weeks and additional output YTD for them */
WEEK_TO=ISOWEEK;
WEEK_FROM=W_P_WEEK+1;
do while (WEEK_FROM < WEEK_TO);
PREMIE=W_P_PREMIE; N=W_P_N; SCALE="Y";
ISOWEEK=WEEK_FROM;
OUTPUT;
WEEK_FROM=WEEK_FROM+1;
end;
W_P_N=N_YTD; W_P_PREMIE=PREM_YTD;
W_P_WEEK=WEEK_TO;
end;

/* If the last week for the group is less than the current week (LWK) we also output YTD
for the weeks between W_P_WEEK and LWK.
*/
IF LAST.WR_CAT and ISOWEEK<&LWK THEN DO;
WEEK_TO=&LWK;
WEEK_FROM=W_P_WEEK+1;
do while (WEEK_FROM <= WEEK_TO);
PREMIE=W_P_PREMIE; N=W_P_N; SCALE="Y";
ISOWEEK=WEEK_FROM;
OUTPUT;
WEEK_FROM=WEEK_FROM+1;
end;
END;
drop PREM_YTD N_YTD W_P_N W_P_PREMIE W_P_WEEK week_to week_from;
run;

/* Issue - if nothing is registered in a week than this week is not present in the dataset.
thereof the YTD will not be outputed for this week.
We need to duplicate the last outputed week for the missing weeks.
We will retain
W_P_WEEK - the last outputed week for YTD
W_P_N - the last accumulated group count
W_P_PREMIE - the last accumulated premie
Every time the last week is found, YTD is outputed,
and then the curren ISOWEEK is compared to the previous outputed YTD week W_P_WEEK. If the
difference is > 1 than we have missing weeks and we will output YTD in the loop for the
missing weeks between WEEK_FROM and WEEK_TO. */