Hi all,
I'm writing a jsl script to find the repeated string entries (i.e. rows) in a column. The column is basically listing different tests that usually is repeated.
What I'm trying to find is that if I can find the number of times those rows have been repeated, select the top N repeated entries, and then insert those selected entries into a list to be analyzed by graph builder or tabulate.
Any help is appreciated.
Here is how I would approach this
names default to here(1);
dt=current data table();
// Get each different string value and the number of repeated strings
summarize(dt, byGroup = by(:sex), number = count(:height));
show( byGroup, number );
// To find which rows that have the same values
theRows = dt << get rows where( :sex == byGroup[1] );
show(theRows);
You can use Summarize to find count of items and then to find top 5 you can for example use Reverse(Rank()) on the count and use those found indices on your groups
Names Default To Here(1);
dt = Open("$SAMPLE_DATA/Big Class.jmp");
// Get each different string value and the number of repeated strings
Summarize(dt, uniq = by(:age), count = Count(:height)); // some column which is numeric to Count
r = Reverse(Rank(count));
top5 = Try(uniq[r][1::5], uniq[r]);
show(uniq, count, top5);
Here is how I would approach this
names default to here(1);
dt=current data table();
// Get each different string value and the number of repeated strings
summarize(dt, byGroup = by(:sex), number = count(:height));
show( byGroup, number );
// To find which rows that have the same values
theRows = dt << get rows where( :sex == byGroup[1] );
show(theRows);
thank you @txnelson for your solution. This works perfectly in find the duplicate columns and how many times they were duplicated. The issue I'm having now is that I'm trying to find the top N (ex: N =5) repeated strings and throw them in a list for further evaluation. I was only being able to find the top item in the list.
My question, is there are a way to find the top N items where N can be any integer? Thanks!
I have modified the code as seen below:
names default to here(1);
dt=current data table();
// Get each different string value and the number of repeated strings
summarize(dt, byGroup = by(:Temp_), number = count(:Site));
list = {};
list[1] = 0;
// To find which rows that have the same values
for(i=2, i <N Items(byGroup),i++,
list[i] = N Items ( theRows = dt << get rows where( :Temp_ == byGroup[i] ) ) ; // creating a list of how many times those items were repeated.
//numoffail = N items(list);
buf = max(list); // finding how many times the top repeated items were repeated.
highest =byGroup[loc(list,buf)]; // match the number of repetition to the repeated string to find the most repeated
);
Could you provide example (and solution to the example) so it is easier to understand what you are trying to achieve as at least three different cases comes to my mind.
Hi @jthi.
An example of that would be Sex column in big class jmp data. There is F and M in the column, and it is repeated multiple times. I would like to write a jsl script that counts how many times F or M was repeated. The code that @txnelson provided works perfectly in this case. Does this make more sense ? Thanks
You can use Summarize to find count of items and then to find top 5 you can for example use Reverse(Rank()) on the count and use those found indices on your groups
Names Default To Here(1);
dt = Open("$SAMPLE_DATA/Big Class.jmp");
// Get each different string value and the number of repeated strings
Summarize(dt, uniq = by(:age), count = Count(:height)); // some column which is numeric to Count
r = Reverse(Rank(count));
top5 = Try(uniq[r][1::5], uniq[r]);
show(uniq, count, top5);