Solved: Summarize Function

modelFit · Jul 14, 2023 10:31 AM

I use the summarize function quite a bit in the course of my work. Oftentimes, I want a frequency distribution of a variable. To do this I use Summarize( ex=By(dt:var), exCnt = Count(dt:var)). ex contains the levels of the variable and exCnt contains the frequency distribution. This works fine when the var is numeric, though not when the var is string. The work around I use is to create a dummy sequence(1, N Rows(dt), 1) numeric variable and run this syntax, Summarize( ex=By(dt:var), exCnt = Count(dt:dummy)).

This works fine, but was wondering if there was a way to accomplish what I need without the extra dummy column?

Sincerely,

Matt

jthi · Jul 14, 2023 10:43 AM

I would most likely use private Summary table, get values and then close it.

Names Default To Here(1);

dt = Open("$SAMPLE_DATA/Big Class.jmp");

dt_summary = dt << Summary(Group(:sex), Freq("None"), Weight("None"), Link to original data table(0), private);

ex = Column(dt_summary, 1) << get values;
exCnt = Column(dt_summary, 2) << get values;

Close(dt_summary, no save);

Show(ex, exCnt);

-Jarmo

View solution in original post

jthi · Jul 14, 2023 10:43 AM

I would most likely use private Summary table, get values and then close it.

Names Default To Here(1);

dt = Open("$SAMPLE_DATA/Big Class.jmp");

dt_summary = dt << Summary(Group(:sex), Freq("None"), Weight("None"), Link to original data table(0), private);

ex = Column(dt_summary, 1) << get values;
exCnt = Column(dt_summary, 2) << get values;

Close(dt_summary, no save);

Show(ex, exCnt);

-Jarmo

modelFit · Jul 14, 2023 11:59 AM

@jthi, thanks for your quick reply. That was what I was doing prior to inserting the dummy column. Its not ideal, but also not bad.

jthi · Jul 15, 2023 02:23 AM

You could also directly add count column to the table you have. I think getting the unique values with Associative array should work

Names Default To Here(1);

dt = Open("$SAMPLE_DATA/Big Class.jmp");

new_col = dt << New Column("Counts", Numeric, Continuous, << Set Each Value(
	Col Number(:name, :name)
));

aa = Associative Array(:name << get values, :Counts << get values);
//dt << Delete Columns(new_col);
show(aa);

but depending on the data, this could be slower than just using Summary table.

You could also make wish list item regarding this and provide a list of statistics which should be provided for character columns (first, last, mode, count...?) if this would be useful (and most likely it would be). There is one which has been now archived, but the request isn't that well structured Summerize/Summary aggregation function for character data. Concat Charcter Vector into a Scalar (by) but you could also comment your ideas to that.

When I did script Analyse Columns add-in I did use at least Summary table, Summarize function, Distribution platform and matrix operations to calculate the different statistics depending on which was the fastest (I think I didn't try Query()).

-Jarmo

hogi · Jul 17, 2023 01:36 PM

Interesting that summarize is not able to count non-numeric entries.

Maybe intentionally - to make it fast for numbers?

Besides "counting" -- you can do much more with character columns - and I hope Tables/Summary will get such super powers in a future release: Summary and Tabulate: add aggregation option for Character columns
Let's cross the fingers that it gets accepted ...

Summarize Function

Re: Summarize Function

Re: Summarize Function

Re: Summarize Function

Re: Summarize Function

Re: Summarize Function

Recommended Articles