cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
RA899
Level III

Selecting duplicated row entries, save their index, and then create a new list that doesn't contain the duplicated rows.

Hi. 

This is a follow up to the question in this link: Solved: get values from select where - JMP User Community. At the ends of this discussion( in the link provided), there a way a jsl that select certain rows  and print them using show() function. In another post( I lost the link to it), there was a method to select duplicated rows using the Select Duplicated Row() function and then delete them. 

 

My objective is to select duplicated rows, and then get their index for a selected column in the table. For ex, I want to find the duplicate entries for the age column in the big class table.

The second objective is to create a new column that has only the unique rows without deleting the duplicated row. 

 

Any guidance is appreciated. 

 

 dt_limit = Open( "$SAMPLE_DATA/Big Class.jmp" );
  dt_limit:age << Select Duplicate Rows( Match (:age, :age   ) );   
 Group1List = (dt_limit:age[dt_limit << get selected rows]);
 theCount = N Rows( dt << get selected rows);

Show( Group1List);
2 ACCEPTED SOLUTIONS

Accepted Solutions

Re: Selecting duplicated row entries, save their index, and then create a new list that doesn't contain the duplicated rows.

Is something like this what you're looking for? There may be more elegant ways to accomplish it, but here is one way to create a new column with just the first unique row.

dt_limit = Open( "$SAMPLE_DATA/Big Class.jmp" );
 
dt_limit << Select Duplicate Rows( Match (:age ) );   
Group1List = (dt_limit << get selected rows);
theCount = N Items (Group1List);

//create a new column that has only the unique rows without deleting the duplicated row
dt_limit << Invert Row Selection;
UniqueList = dt_limit << Get Selected Rows;
dt_limit << new column ("Unique");
for each ({i}, UniqueList,
	dt_limit:Unique[i] = dt_limit:age[i]
);

View solution in original post

Re: Selecting duplicated row entries, save their index, and then create a new list that doesn't contain the duplicated rows.

This line added at the end of your script should get that column into a list:

future_list = as list (dt_limit:Unique << Get Values);

View solution in original post

9 REPLIES 9

Re: Selecting duplicated row entries, save their index, and then create a new list that doesn't contain the duplicated rows.

Is something like this what you're looking for? There may be more elegant ways to accomplish it, but here is one way to create a new column with just the first unique row.

dt_limit = Open( "$SAMPLE_DATA/Big Class.jmp" );
 
dt_limit << Select Duplicate Rows( Match (:age ) );   
Group1List = (dt_limit << get selected rows);
theCount = N Items (Group1List);

//create a new column that has only the unique rows without deleting the duplicated row
dt_limit << Invert Row Selection;
UniqueList = dt_limit << Get Selected Rows;
dt_limit << new column ("Unique");
for each ({i}, UniqueList,
	dt_limit:Unique[i] = dt_limit:age[i]
);
RA899
Level III

Re: Selecting duplicated row entries, save their index, and then create a new list that doesn't contain the duplicated rows.

Thank you @Jed_Campbell . This is good thank you very much ! 

 

I'm trying to capture that column into a list for further actions. Would you be able to help me with that? thanks! 

Re: Selecting duplicated row entries, save their index, and then create a new list that doesn't contain the duplicated rows.

This line added at the end of your script should get that column into a list:

future_list = as list (dt_limit:Unique << Get Values);
RA899
Level III

Re: Selecting duplicated row entries, save their index, and then create a new list that doesn't contain the duplicated rows.

Thank you @Jed_Campbell . It works perfectly fine. 

 

Another issue that I'm running into at the moment is that what if instead of having one column i.e. age, I would have a list of columns i.e. [age, height, weight, etc]. How would the code change? I tried using a combination of columns and eval to fetch the list entry as a column. I know the code below does work, but how would I modify it to work for any list of columns. Would having something like As column (ListOfTargetCol[i])[i] ); OR :(ListOfTargetCol[i])[i] ); means that jmp will fetch the ith entry of the list and then it fetches that column( the column) ith entry ? I'm new to jsl and not sure how its structured. Please advising how to overcome this issue. 

 

For reference, I'm using a similar code to the one you provided above and tried adding something like this: I'm also curious if it will be any different if it is a list of names of a string like the ones below. Thanks a lot!  

ListOfTargetCol = { age, height, weight};
StringOfTargetCol = { "age", "height", "weight"}; 


... dt_limit:Unique[i] = dt_limit:( As coulmn(ListOfTargetCol[1])) [i]... 


 

jthi
Super User

Re: Selecting duplicated row entries, save their index, and then create a new list that doesn't contain the duplicated rows.

If you don't need the selections, you can also use formula / summary (or many other options)

Names Default To Here(1);

dt = Open("$SAMPLE_DATA/Big Class.jmp");

// Option1
new_col = dt << New Column("R1", Numeric, Continuous, Formula(
	If(Row() == Col Min(Row(), :age), :age, .)
));
dt << Run Formulas;
new_col << Delete Formula;


// Option2
dt << New Column("Row", Numeric, "Continuous", Formula(Row()));
dt_summary = dt << Summary(
	Group(:age),
	Min(:Row),
	Freq("None"),
	Weight("None"),
	statistics column name format("column"),
	Link to original data table(0)
);
dt_summary << Delete Columns("N Rows");
Column(dt_summary, 1) << Set Name("R2");

dt << Update(With(dt_summary), Match Columns(:Row = :Row));
Close(dt_summary, no save);

jthi_0-1707591070272.png

 

-Jarmo
RA899
Level III

Re: Selecting duplicated row entries, save their index, and then create a new list that doesn't contain the duplicated rows.

Thank you @jthi for your help! Both solution works well, but I'm still confused about the syntax that I should use. 

 

In the loop below, we have the dt:age[i] which iterates through the "age" column in the dt_limit loop. 
What if I want to have more than one column to iterate through.

The only solution I can find (and works with no error) is to copy past the loop for
all the needed columns i.e. age, height, ..etc. It is just not efficient and will not work for a large number of columns.

Any suggestions would be appreciated.
for each ({i}, UniqueList, dt_limit:Unique[i] = dt_limit:age[i] );
jthi
Super User

Re: Selecting duplicated row entries, save their index, and then create a new list that doesn't contain the duplicated rows.

I'm not exactly sure what you are trying to do. You can add more columns to Col Min as byvars

Names Default To Here(1);

dt = Open("$SAMPLE_DATA/Big Class.jmp");

// Option1
new_col = dt << New Column("R1", Numeric, Continuous, Formula(
	If(Row() == Col Min(Row(), :age, :sex), :age, .)
));
dt << Run Formulas;
new_col << Delete Formula;

jthi_0-1707761902077.png

Col Minimum(name, <By var, ...>) 

-Jarmo
RA899
Level III

Re: Selecting duplicated row entries, save their index, and then create a new list that doesn't contain the duplicated rows.

Thank @jthi . True. I can just add more columns as you suggested above. However, I'm trying to generalize the solution. What I'm trying to do is to take the user input for which columns to evaluate. Place them in a list. And then, evaluate all of them.  For example, the user can choose age, and height only. I need the code to be as flexible as possible. 

Names Default To Here(1);

dt = Open("$SAMPLE_DATA/Big Class.jmp");

// Option1

TargetList = {age, sex, height};
new_col = dt << New Column("R1", Numeric, Continuous, Formula(
	If(Row() == Col Min(Row(), :age, :sex), :age, .)
));
dt << Run Formulas;
new_col << Delete Formula;
jthi
Super User

Re: Selecting duplicated row entries, save their index, and then create a new list that doesn't contain the duplicated rows.

You can build the formula with expressions

Names Default To Here(1);

dt = Open("$SAMPLE_DATA/Big Class.jmp");

sel_cols = {"age", "sex", "height"};

colmin_expr = Expr(Col Min(Row()));
For Each({sel_col}, sel_cols,
	Insert Into(colmin_expr, Name Expr(AsColumn(dt, sel_col)))
);

Eval(Substitute(
	Expr(new_col = dt << New Column("R1", Numeric, Continuous, Formula(
		If(Row() == _colminformula_, :age, .)
	))),
	Expr(_colminformula_), Name Expr(colmin_expr)
));

dt << Run Formulas;
new_col << Delete Formula;

You should also be able to fairly easily modify the Summary version to work with inputs from a list

-Jarmo