cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
lukasz
Level IV

Removing duplicates from data table in JMP 15

Hello Everybody,

I have one script in which I want to remove duplicated values from a table based on the specified column. The following piece of code is used to remove duplicates and that works well with JMP 14.

dt << Select duplicate rows( Match( :name ) );
selected = dt << get selected rows;
if (n rows(selected) > 0,
	dt << delete rows;	
);

In JMP 15, however, that does not seem to work. The rows are selected but later they are not removed. I tried to use "current data table ()" in several locations but it did not help. I would appreciate for hints how to get this part working. Am I missing something? Best regards.

9 REPLIES 9
txnelson
Super User

Re: Removing duplicates from data table in JMP 15

Here is an example of your code working without error, and as expected

names default to here(1);
dt=open("$SAMPLE_DATA/big class.jmp");

dt << Select duplicate rows( Match( :name ) );
selected = dt << get selected rows;
if (n rows(selected) > 0,
	dt << delete rows;	
);
Jim
lukasz
Level IV

Re: Removing duplicates from data table in JMP 15

Thank you for reply!

Yes, that is working well, however, what I have noticed is that, that the data table in which duplicates need to be removed is created as a subset of some bigger table. During making a subset an additional option "linked" is used which is helpful, because script creates several diagrams and by selecting data on one graph associated with the bigger table highlights at the same time data on graph from subset table. And that works well in JMP 14.

With this "linked" option in JMP 15 the removal of duplicates does not work. Without it, it works as expected but in this case the functionality of application is noticeably limited. Is there any additional options or commands that need to be considered in order to have that worked?

dt = dt_original << Subset(invisible, linked, Selected Rows( 0 ), selected columns( 1 ), Output Table Name( "Table"));

 

Byron_JMP
Staff

Re: Removing duplicates from data table in JMP 15

This looks a little ratchety, but it work well and its pretty fast too.

 

Try joining a table to its self, and drop multiples for both tables.

Match the columns in both tables that create the imperfect unique identifier for duplicates.

 

here's an example

Data Table( "Big Class.jmp" ) << Join(
	With( Data Table( "Big Class.jmp" ) ),
	By Matching Columns( :name = :name ),
	Drop multiples( 1, 1 ),
	Include Nonmatches( 0, 0 ),
	Preserve main table order( 1 )
)

This approach is menu driven. Just get the source script from the deduped table after you run the join.

 

JMP Systems Engineer, Health and Life Sciences (Pharma)
Thomas1
Level V

Re: Removing duplicates from data table in JMP 15

This JSL code works fine for me:

Names Default To Here(1);
 
dt = Current Data Table();
 
// Clear Data Table
dt << Clear Column Selection();
dt << Clear Select();
 
//Update Formula Columns
dt << Rerun Formulas;
wait(0);
 
// Delete multiple rows
dt << Select Duplicate Rows();
dt << Delete Rows();
 
lukasz
Level IV

Re: Removing duplicates from data table in JMP 15

Thank you for your suggestions! I have tried all solutions but in all cases removing of duplicates works well if there is no "linked" or "LinkToOriginalDataTable( 1 )" option during creation of a subset from the original table. Joining with the same table and removing duplicates worked well but then no linking with original table is created. Maybe virtual joining will help, I will try.. I would appreciate for further hints and suggestions.

Thomas1
Level V

Re: Removing duplicates from data table in JMP 15

I see two options
1. Unlink the data table => LinkToOriginalDataTable( 0 ).
2. Delete the multiple rows in the source data table.
lukasz
Level IV

Re: Removing duplicates from data table in JMP 15

Thank you for options! However, I cannot remove duplicates from original table because it is used to create one kind of graph and all data are required (even duplicates). Additionally, the second graph needs to be created and that is based on the subset of the original table, since duplicates are not required in this case. Because both graphs are displayed on the same window, the "linked" option during creation of a subset allows highlighting data points in both graphs if dataset from one of graphs is selected. So, it would be good to have such option as well. Why removing duplicates after using option "linked" by creating a subset in JMP 15 does not work, I don't know. It can be that I am still missing something. Anyway, I will try some another options and I apologize if my previous descriptions was not clear enough.

Byron_JMP
Staff

Re: Removing duplicates from data table in JMP 15

Try creating your de-duped table, and then since you have unique rows in that table, do a virtual join back to the first table. Not a link. The tables will still have the same behavior.  

Another benefit, the de-duped table only needs to contain the cols you need for the graph that point to it, not the entire table, so it will take less memory. 

JMP Systems Engineer, Health and Life Sciences (Pharma)
lukasz
Level IV

Re: Removing duplicates from data table in JMP 15

Hello Byron_JMP,

thank you for your reply. I think I did the virtual joining of tables. But still I cannot get the selection of data in one table if data from reference column of another table are selected.

//dt_run - table with duplicates
//dt_run_exp - table without duplicates

dt_run_exp << Select duplicate rows( Match( dt_run_exp:DMC ) );
del_sel = dt_run_exp << get selected rows;
		
if(n rows(del_sel) > 0,
	dt_run_exp << delete rows;			
);
		
dt_run_exp:DMC << Set Property( "Link ID", 1 );
dt_run:DMC << Set Property( "Link Reference", Reference Table( dt_run_exp ) );

I will try to apply other options later. Any further hints of suggestions are welcome. If an additional explanation is needed feel free to ask.

Thank you all for time and best regards.