Hi,
I have 256k rows in a parent table right now. While doing some analysis, I found out I have few duplicate rows. This table constantly gets new data ammended through script. And there is chance source data may have duplicate again.
So I am trying to create a clean up script, that i can run before executing some of my analysis. I searched other discussions here and found a solution (https://community.jmp.com/t5/Discussions/Eliminating-Duplicate-Rows-keeping-first-duplicate/td-p/342...), which i customized for my application
dt3 = Current Data Table();
dt2 = dt3 << Summary(
group(:CL1, :CL2, :CL3, :CL4)
);
dt3 << join(
with (dt2),
update,
by matching columns (
:CL1 = :CL1,
:CL2 = :CL2,
:CL3 = :CL3,
:CL4 = :CL5
),
drop multiples(1,0),
name("include non-matches")(0,0),
preserve main table order (1),
);
This solution did what i was looking for. However, its not able to update the parent table. It creates a new table with the right rows, but my parent table is unchanged. How can I use this code on my parent table to delete rows from the parent table, which in this case is "Currnet Data Table()."
Also, (i already know, but wanted to confirm), I have 25 columns in my data set. Should i run above script to match all of 25 columns to be accurate or as long as i have time column, i can run less number of columns (like 4 per above example)? I didnt want to slow script as i will have massive amount of data in the JMP file.
Thanks in advance!