cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
UserID16644
Level V

Evaluate duplicate data on same column

Hi all,

I have these duplicate data where I need to evaluate them as one. Is it possible to evaluate data as one on the same column? I have this example. If column1 (same data value) has different column2 values, column3 result should be Outlier and be counted as one and not in duplicate values like column1. But if their column2 has the same "Good" values, column3 value will be "good". 

Is this evaluation possible? 

Column 1      Column 2     Column 3
5631823522     Good          Outlier
5631823522     Outlier        
563180572E    Good          Good
563180572E    Good          
5631826915    Outlier        Outlier
5631826915    Good          

2 ACCEPTED SOLUTIONS

Accepted Solutions
txnelson
Super User

Re: Evaluate duplicate data on same column

Here is one way of handling this.  The below is a formula  for Column 3

If( Row() == 1 | :Column 1 != Lag( :Column 1 ),
	theColumn = :Column 1;
	theRows = Current Data Table() << get rows where( :Column 1 == theColumn );
	x = "Good";
	For( i = 1, i <= N Rows( theRows ), i++,
		If( :Column 2[theRows[i]] == "Outlier",
			x = "Outlier";
			Break();
		)
	);
	x;
)
Jim

View solution in original post

jthi
Super User

Re: Evaluate duplicate data on same column

Many ways to handle this. Writing one formula like @txnelson suggested, or you could include a few more steps and have a bit simpler formula with extra step

1. Recode Column 2 in such a way that Good is 0 and Outlier 1 (can use JMPs Recode for this), make sure new column is numeric

jthi_0-1648617375012.png

2. Use formula which will calculate sum (or max value) of the data in the new column (this could be also done with Summary and join or many other ways)

If(ColSum(:"Column 22"n, :"Column 1"n) >= 1,
	"Outlier",
	"Good"
)

jthi_1-1648617408616.png

 

3. Then depending on what you want to do with your data, you could remove duplicates or just the values. Select columns Column 1 and Column 3 and then use JMPs select duplicate rows

jthi_2-1648617467404.png

4. From here you could delete the rows

jthi_3-1648617493906.png

5. Or if you want to keep the duplicated values but just clear the Column 3 values. While holding alt press on Column3 header so you have only Column3 selected

jthi_4-1648617616183.png

and press delete

jthi_5-1648617637796.png

 

-Jarmo

View solution in original post

3 REPLIES 3
txnelson
Super User

Re: Evaluate duplicate data on same column

Here is one way of handling this.  The below is a formula  for Column 3

If( Row() == 1 | :Column 1 != Lag( :Column 1 ),
	theColumn = :Column 1;
	theRows = Current Data Table() << get rows where( :Column 1 == theColumn );
	x = "Good";
	For( i = 1, i <= N Rows( theRows ), i++,
		If( :Column 2[theRows[i]] == "Outlier",
			x = "Outlier";
			Break();
		)
	);
	x;
)
Jim
jthi
Super User

Re: Evaluate duplicate data on same column

Many ways to handle this. Writing one formula like @txnelson suggested, or you could include a few more steps and have a bit simpler formula with extra step

1. Recode Column 2 in such a way that Good is 0 and Outlier 1 (can use JMPs Recode for this), make sure new column is numeric

jthi_0-1648617375012.png

2. Use formula which will calculate sum (or max value) of the data in the new column (this could be also done with Summary and join or many other ways)

If(ColSum(:"Column 22"n, :"Column 1"n) >= 1,
	"Outlier",
	"Good"
)

jthi_1-1648617408616.png

 

3. Then depending on what you want to do with your data, you could remove duplicates or just the values. Select columns Column 1 and Column 3 and then use JMPs select duplicate rows

jthi_2-1648617467404.png

4. From here you could delete the rows

jthi_3-1648617493906.png

5. Or if you want to keep the duplicated values but just clear the Column 3 values. While holding alt press on Column3 header so you have only Column3 selected

jthi_4-1648617616183.png

and press delete

jthi_5-1648617637796.png

 

-Jarmo
Georg
Level VII

Re: Evaluate duplicate data on same column

I would use summary for this,

put Column 1 into Group to ensure single occurrence,

and put Column 2 into Subgroup, to be able to see what has been in Column 2.

Than you can take e.g. "N(Outlier)" to see, whether there as been an outlier, or whatever you need.

Summarize Your Data (jmp.com)

Georg_0-1648622488648.png

 

Georg