cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
lwx228
Level VIII

How to use JSL to classify and summarize the data in the middle of the matrix branch, and get the summary results in the form of matrix?

For "Big Class.jmp", the "height" and "weight" columns are written into the matrix.
Then the data in rows 6 to 35 of the matrix are classified and summarized: the average weight of each height is summarized.This result only needs to be stored in the in-memory matrix.

2021-04-30_10-53-12.png

 

Thanks Experts!

  • I wonder if this is possible?

dt=Open("$SAMPLE_DATA/Big Class.jmp");
ar=dt<<GetAsMatrix({4,5});
Summarize(ar[6::35,0],exg=By(?,?),exm=sum(?));

 

1 ACCEPTED SOLUTION

Accepted Solutions
ian_jmp
Level X

Re: How to use JSL to classify and summarize the data in the middle of the matrix branch, and get the summary results in the form of matrix?

I'm not sure it's the best way to get your end result, but if you wanted to do it using just matrices you can:

NamesDefaultToHere(1);
// Sample data
dt = Open("$SAMPLE_DATA/Big Class.jmp");
// Matrix with all 'height' and 'weight' values
ar = dt << GetAsMatrix({4,5});
// Discard rows
ar = ar[6::35, 0];
// Find out the distinct levels of 'height'
hl = Matrix(AssociativeArray(ar[0,1]) << getKeys);
// Build the average 'weight' for each level into a new matrix 'rm'
rm = J(NRow(hl), 2, .);
for(h=1, h<=NRow(hl), h++,
	rm[h, 1] = hl[h];
	rm[h, 2] = VMean(ar[Loc(ar[0, 1] == hl[h]), 2]);
	);
Print(rm);

View solution in original post

4 REPLIES 4
ian_jmp
Level X

Re: How to use JSL to classify and summarize the data in the middle of the matrix branch, and get the summary results in the form of matrix?

I'm not sure it's the best way to get your end result, but if you wanted to do it using just matrices you can:

NamesDefaultToHere(1);
// Sample data
dt = Open("$SAMPLE_DATA/Big Class.jmp");
// Matrix with all 'height' and 'weight' values
ar = dt << GetAsMatrix({4,5});
// Discard rows
ar = ar[6::35, 0];
// Find out the distinct levels of 'height'
hl = Matrix(AssociativeArray(ar[0,1]) << getKeys);
// Build the average 'weight' for each level into a new matrix 'rm'
rm = J(NRow(hl), 2, .);
for(h=1, h<=NRow(hl), h++,
	rm[h, 1] = hl[h];
	rm[h, 2] = VMean(ar[Loc(ar[0, 1] == hl[h]), 2]);
	);
Print(rm);
lwx228
Level VIII

Re: How to use JSL to classify and summarize the data in the middle of the matrix branch, and get the summary results in the form of matrix?

Thank ian! This is the way this cycle works.

 

ar=dt<<GetAsMatrix({4,5});ar=ar[6::35,0];
hl=Matrix(AssociativeArray(ar[0,1])<<getKeys);rm=J(NRow(hl),2,.);
for(h=2,h<=NRow(hl)-1,h++,
	rm[h,1]=hl[h];
	rm[h,2]=Vsum(ar[Loc(ar[0,1]==hl[h-1]),2])+Vsum(ar[Loc(ar[0,1]==hl[h]),2])+Vsum(ar[Loc(ar[0,1]==hl[h+1]),2]);
	);
r=rm[contains(rm[0,2],max(rm[0,2])),1];

Re: How to use JSL to classify and summarize the data in the middle of the matrix branch, and get the summary results in the form of matrix?

Here is an approach that avoids looping, although it creates invisible tables. If the performance is adequate for your needs, this approach is simple to use and to understand.

 

Cheers,

Brady

dt = open("$Sample_Data/Big Class.jmp");

dtSub = dt << subset(rows(6::35), invisible);

dtsum = dtsub << summary(group(:height), mean(weight), invisible);

mat = dtSum << get as Matrix;

close(dtSum, nosave);
close(dtSub, nosave);

show(mat);

Re: How to use JSL to classify and summarize the data in the middle of the matrix branch, and get the summary results in the form of matrix?

EDIT:

I noticed a typo in the original version of the code below, which I have corrected. In the last line, the typo was:

 

nrow (s)

 

Instead, use:

 

nrow (h)

 

Of course, nrow (n) or nrow(m) would also work since each of the vectors h, n and m have the same number of rows.

 

Sorry about that.

 

Cheers,

Brady

 

 

/////////////////////////////Original post:

 

In looking into the performance of the table-based approach, it is much slower than matrix-based approaches, which might offset its simplicity, depending on your table.

 

Here is another matrix-based approach; while it performs similarly to the one Ian posted, I include it for a few reasons: a) to introduce the idea of a "selected" column, which flags 1 if a row is selected and 0 otherwise, b) to show a "hack" of the summarize function and c) because this avoids looping and the potential indexing pitfalls associated therewith.

 

First: the "Selected" column. In the first line of the code below, we add a new column to the table, which sets itself to 1 in selected rows, and 0 elsewhere. This allows us to create an additional grouping column, with the goal of using table-based platforms and functions like Summary and Summarize, without having to create new tables.

 

Next, the Summarize () hack: note that we group by :selected and :height. While the h, n and m values are returned in matrices, the by variable(s) values are placed in lists of strings... even if they COULD be placed into matrices. So in this case, b is a list of two lists of strings. Thus, if we want any of these values in matrix form, we can either convert the desired list of strings into a matrix (which is expensive if the list is large) or we can compute a "cheap" statistic, like Min or Max, on the very same column, to get those values... which is what we do below for the height column.

 

Cheers,

Brady

 

 

dt << New Column( "Selected", formula( Selected( Row State() ) ) );

dt << Select Rows( 6 :: 35 );

Summarize( dt, b = by( :Selected, :height ), h = max( :height), n = Count( :weight ), m = Mean( :weight ) ); 

sumMat = (h || n || m)[contains(b[1], "1"):: nrow(h),0];  //only use the results where selected == 1