cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
modelFit
Level II

Summarize Function

I use the summarize function quite a bit in the course of my work.  Oftentimes, I want a frequency distribution of a variable. To do this I use Summarize( ex=By(dt:var), exCnt = Count(dt:var)). ex contains the levels of the variable and exCnt contains the frequency distribution. This works fine when the var is numeric, though not when the var is string. The work around I use is to create a dummy sequence(1, N Rows(dt), 1) numeric variable and run this syntax, Summarize( ex=By(dt:var), exCnt = Count(dt:dummy)). 

This works fine, but was wondering if there was a way to accomplish what I need without the extra dummy column?

 

Sincerely,

Matt

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Summarize Function

I would most likely use private Summary table, get values and then close it.

Names Default To Here(1);

dt = Open("$SAMPLE_DATA/Big Class.jmp");

dt_summary = dt << Summary(Group(:sex), Freq("None"), Weight("None"), Link to original data table(0), private);

ex = Column(dt_summary, 1) << get values;
exCnt = Column(dt_summary, 2) << get values;

Close(dt_summary, no save);

Show(ex, exCnt);
-Jarmo

View solution in original post

4 REPLIES 4
jthi
Super User

Re: Summarize Function

I would most likely use private Summary table, get values and then close it.

Names Default To Here(1);

dt = Open("$SAMPLE_DATA/Big Class.jmp");

dt_summary = dt << Summary(Group(:sex), Freq("None"), Weight("None"), Link to original data table(0), private);

ex = Column(dt_summary, 1) << get values;
exCnt = Column(dt_summary, 2) << get values;

Close(dt_summary, no save);

Show(ex, exCnt);
-Jarmo
modelFit
Level II

Re: Summarize Function

@jthi, thanks for your quick reply. That was what I was doing prior to inserting the dummy column. Its not ideal, but also not bad.

jthi
Super User

Re: Summarize Function

You could also directly add count column to the table you have. I think getting the unique values with Associative array should work

 

Names Default To Here(1);

dt = Open("$SAMPLE_DATA/Big Class.jmp");

new_col = dt << New Column("Counts", Numeric, Continuous, << Set Each Value(
	Col Number(:name, :name)
));

aa = Associative Array(:name << get values, :Counts << get values);
//dt << Delete Columns(new_col);
show(aa);

but depending on the data, this could be slower than just using Summary table.

 

You could also make wish list item regarding this and provide a list of statistics which should be provided for character columns (first, last, mode, count...?) if this would be useful (and most likely it would be). There is one which has been now archived, but the request isn't that well structured Summerize/Summary aggregation function for character data. Concat Charcter Vector into a Scalar (by) but you could also comment your ideas to that.

 

When I did script Analyse Columns add-in I did use at least Summary table, Summarize function, Distribution platform and matrix operations to calculate the different statistics depending on which was the fastest (I think I didn't try Query()).

 

 

 

-Jarmo
hogi
Level XII

Re: Summarize Function

Interesting that summarize is not able to count non-numeric entries.

Maybe intentionally - to make it fast for numbers?

 

Besides "counting" -- you can do much more with character columns - and I hope Tables/Summary will get such super powers in a future release: Summary and Tabulate: add aggregation option for Character columns 
Let's cross the fingers that it gets accepted ...