Thursday, July 17, 2008

Grouping records with proc sql

I came across a weird thing in SAS Base while using proc sql to group records.
When you group records without using statistical functions gives
different results when you do use a statistical function. It seems to
me very strange. Example.

proc sql;
create table updt_pro as
select codmrk, coddvg, codkpr, codprd, count(*) as n
from ovk_pro_stg_vw
group by codmrk, coddvg, codkpr, codprd
;
quit;

gives me 367 records, and as expected the combinations of group
variables are unique.

the same code without count function gives 983450 records.
proc sql;
create table updt_pro as
select codmrk, coddvg, codkpr, codprd
from ovk_pro_stg_vw
group by codmrk, coddvg, codkpr, codprd
;
quit;

it is obligatory to use at least one aggregate statistical
function, otherwise SAS simply transform group by into order by.

(WARNING: A GROUP BY clause has been transformed into an ORDER BY
clause because neither the SELECT clause nor the optional HAVING
clause of the associated table-expression referenced a
summary function.
)

Tip: when you want to find unique combination of the group variables use select distinct:
proc sql;
create table updt_pro as
select distinct codmrk, coddvg, codkpr, codprdfrom ovk_pro_stg_vw
;
quit;


Maxim

No comments: