cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
UserID16644
Level V

How to determine duplicate values?

Hi all,

I have this piece of code but I'm quite not sure if there is a more appropriate approach to this. My loop gets an error because some of the data table have multiple NBO values.

 

Data out of range at row 10 in access or evaluation of 'For Each' , For Each/*###*/({table}, dt_Test, table << New Column( "Ref" ) ; Wait( 0 ) ; table
:Ref[table << get rows where( :Lot == "NBO" & :N Rows == Col Max( :N Rows, :Lot ) )]
= 1)

So what I am trying to do is create an If statement inside the For Each loop. The IF statement will determine if NBO is has no duplicates on the Lot Column, if it has no duplicates, then it will put 1 on Ref column.

For Each( {table}, dt_Test,
	table << New Column("Ref");
	wait(0);
		If (
		//NBO has duplicate, deletes the NBO with lowest N Row value
		dt_Test << Select duplicate rows(Match(:Lot)); //only selects the matching values, how can I delete the NBO with lowest N Rows count?
		dt_Test << Delete Rows;
		,
		//if NBO has no duplicates,
		table:Ref[table << get rows where( :Lot == "NBO" & :N Rows == col max(:N Rows, :Lot))]=1;	
		);
	
);

 

 

 

7 REPLIES 7
jthi
Super User

Re: How to determine duplicate values and delete the data with lowest N rows value?

Would make this much easier if we could see some of the data. 

-Jarmo
UserID16644
Level V

Re: How to determine duplicate values and delete the data with lowest N rows value?

Here is a sample data table after subset. dt has 2 NBOs and this works fine.

table:Ref[table << get rows where( :Lot == "NBO" & :N Rows == col max(:N Rows, :Lot))]=1;	

 However, in dt2, which I have 1 NBO, the script suddenly stops and has this error:

Data out of range at row 10 in access or evaluation of 'For Each' , For Each/*###*/({table}, dt_Test, table << New Column( "Ref" ) ; Wait( 0 ) ; table
:Ref[table << get rows where( :Lot == "NBO" & :N Rows == Col Max( :N Rows, :Lot ) )]
= 1)

jthi
Super User

Re: How to determine duplicate values and delete the data with lowest N rows value?

I'm a bit confused with your script and what you are trying to do. Are you trying to find duplicated lots and delete all but last row? And then set NBO as 1 (there should be only one left)?

-Jarmo
UserID16644
Level V

Re: How to determine duplicate values and delete the data with lowest N rows value?

Sorry, I also got confused writing the question. So for the data, first I'll subset it then go through each data table to add column Ref and its value as 1 where NBO has the highest count in N Rows. I initially got this working:

For Each( {table}, dt_Test,
	table << New Column("Ref");
	wait(0);
		table:Ref[table << get rows where( :Lot == "NBO" & :N Rows == col max(:N Rows, :Lot))]=1;	
);

But when it looped to a data table with only 1 NBO present in the column, I am having this error:

Data out of range at row 10 in access or evaluation of 'For Each' , For Each/*###*/({table}, dt_Test, table << New Column( "Ref" ) ; Wait( 0 ) ; table
:Ref[table << get rows where( :Lot == "NBO" & :N Rows == Col Max( :N Rows, :Lot ) )]
= 1)

jthi
Super User

Re: How to determine duplicate values and delete the data with lowest N rows value?

Maybe the issue is somewhere else? What does dt_test contain for example? And dt_test and table are used in a bit weird manner (also the if might not be doing what you think here)

jthi_0-1723008788020.png

 

I would modify your code a bit to make it easier to debug (don't place get rows where inside the [])

Names Default To Here(1);

dt = Open("$DOWNLOADS/dt.jmp");
dt2 = Open("$DOWNLOADS/dt2.jmp");

dts = Eval List({dt, dt2});

For Each({cur_dt}, dts,
	cur_dt << New Column("Ref");
	//cur_dt = dts[2];
	sel_rows = cur_dt << get rows where(:Lot == "NBO" & :N Rows == Col Max(:N Rows, :Lot));
	cur_dt:Ref[sel_rows] = 1;
);

This way you can print the sel_rows and see if it is empty / what it has

-Jarmo
Byron_JMP
Staff

Re: How to determine duplicate values?

I might suggest a completely different approach.

1. sort the table by date, or what ever tells you what the most current duplicate is.

2. join the table to its self (Tables>Join), and click the drop multiples option

-Only the first, by row order, of the duplicates ends up in the product table.

 

do this manually once with the workflow builder turned on, and then Bang - Bob's your uncle - you've got the script to do it again.    scripting... without scripting.

JMP Systems Engineer, Health and Life Sciences (Pharma)
hogi
Level XII

Re: How to determine duplicate values?

nice trick