cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
modelFit
Level II

Summarize Function

I use the summarize function quite a bit in the course of my work.  Oftentimes, I want a frequency distribution of a variable. To do this I use Summarize( ex=By(dt:var), exCnt = Count(dt:var)). ex contains the levels of the variable and exCnt contains the frequency distribution. This works fine when the var is numeric, though not when the var is string. The work around I use is to create a dummy sequence(1, N Rows(dt), 1) numeric variable and run this syntax, Summarize( ex=By(dt:var), exCnt = Count(dt:dummy)). 

This works fine, but was wondering if there was a way to accomplish what I need without the extra dummy column?

 

Sincerely,

Matt

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Summarize Function

I would most likely use private Summary table, get values and then close it.

Names Default To Here(1);

dt = Open("$SAMPLE_DATA/Big Class.jmp");

dt_summary = dt << Summary(Group(:sex), Freq("None"), Weight("None"), Link to original data table(0), private);

ex = Column(dt_summary, 1) << get values;
exCnt = Column(dt_summary, 2) << get values;

Close(dt_summary, no save);

Show(ex, exCnt);
-Jarmo

View solution in original post

4 REPLIES 4
jthi
Super User

Re: Summarize Function

I would most likely use private Summary table, get values and then close it.

Names Default To Here(1);

dt = Open("$SAMPLE_DATA/Big Class.jmp");

dt_summary = dt << Summary(Group(:sex), Freq("None"), Weight("None"), Link to original data table(0), private);

ex = Column(dt_summary, 1) << get values;
exCnt = Column(dt_summary, 2) << get values;

Close(dt_summary, no save);

Show(ex, exCnt);
-Jarmo
modelFit
Level II

Re: Summarize Function

@jthi, thanks for your quick reply. That was what I was doing prior to inserting the dummy column. Its not ideal, but also not bad.

jthi
Super User

Re: Summarize Function

You could also directly add count column to the table you have. I think getting the unique values with Associative array should work

 

Names Default To Here(1);

dt = Open("$SAMPLE_DATA/Big Class.jmp");

new_col = dt << New Column("Counts", Numeric, Continuous, << Set Each Value(
	Col Number(:name, :name)
));

aa = Associative Array(:name << get values, :Counts << get values);
//dt << Delete Columns(new_col);
show(aa);

but depending on the data, this could be slower than just using Summary table.

 

You could also make wish list item regarding this and provide a list of statistics which should be provided for character columns (first, last, mode, count...?) if this would be useful (and most likely it would be). There is one which has been now archived, but the request isn't that well structured Summerize/Summary aggregation function for character data. Concat Charcter Vector into a Scalar (by) but you could also comment your ideas to that.

 

When I did script Analyse Columns add-in I did use at least Summary table, Summarize function, Distribution platform and matrix operations to calculate the different statistics depending on which was the fastest (I think I didn't try Query()).

 

 

 

-Jarmo
hogi
Level XI

Re: Summarize Function

I added some ideas and posted a new wish:

Summary and Tabulate: add aggregation option for Character columns 
Let's cross the fingers that it gets accepted

with such an option, Tabulate will get extremely useful!
Big benefit compared to an additional column: the column is static. You have to define the aggregation level - and then it's fixed.

A Character aggregation which lives in a Summary Table - or in Tabulate is much more dynamic.
Depending on the additional structure, it can produce completely different results.

If there is one day per row, it will aggregate by day, if it's a month, it will aggregate by month.