Solved: Re: Evaluate duplicate data on same column

UserID16644 · Jun 10, 2023 4:46 PM

Hi all,

I have these duplicate data where I need to evaluate them as one. Is it possible to evaluate data as one on the same column? I have this example. If column1 (same data value) has different column2 values, column3 result should be Outlier and be counted as one and not in duplicate values like column1. But if their column2 has the same "Good" values, column3 value will be "good".

Is this evaluation possible?

Column 1 Column 2 Column 3
5631823522 Good Outlier
5631823522 Outlier
563180572E Good Good
563180572E Good
5631826915 Outlier Outlier
5631826915 Good

txnelson · Mar 30, 2022 12:28 AM

Here is one way of handling this. The below is a formula for Column 3

If( Row() == 1 | :Column 1 != Lag( :Column 1 ),
	theColumn = :Column 1;
	theRows = Current Data Table() << get rows where( :Column 1 == theColumn );
	x = "Good";
	For( i = 1, i <= N Rows( theRows ), i++,
		If( :Column 2[theRows[i]] == "Outlier",
			x = "Outlier";
			Break();
		)
	);
	x;
)

Jim

View solution in original post

jthi · Mar 30, 2022 01:22 AM

Many ways to handle this. Writing one formula like @txnelson suggested, or you could include a few more steps and have a bit simpler formula with extra step

1. Recode Column 2 in such a way that Good is 0 and Outlier 1 (can use JMPs Recode for this), make sure new column is numeric

2. Use formula which will calculate sum (or max value) of the data in the new column (this could be also done with Summary and join or many other ways)

If(ColSum(:"Column 22"n, :"Column 1"n) >= 1,
	"Outlier",
	"Good"
)

3. Then depending on what you want to do with your data, you could remove duplicates or just the values. Select columns Column 1 and Column 3 and then use JMPs select duplicate rows

4. From here you could delete the rows

5. Or if you want to keep the duplicated values but just clear the Column 3 values. While holding alt press on Column3 header so you have only Column3 selected

and press delete

-Jarmo

View solution in original post

txnelson · Mar 30, 2022 12:28 AM

Here is one way of handling this. The below is a formula for Column 3

If( Row() == 1 | :Column 1 != Lag( :Column 1 ),
	theColumn = :Column 1;
	theRows = Current Data Table() << get rows where( :Column 1 == theColumn );
	x = "Good";
	For( i = 1, i <= N Rows( theRows ), i++,
		If( :Column 2[theRows[i]] == "Outlier",
			x = "Outlier";
			Break();
		)
	);
	x;
)

Jim

jthi · Mar 30, 2022 01:22 AM

Many ways to handle this. Writing one formula like @txnelson suggested, or you could include a few more steps and have a bit simpler formula with extra step

1. Recode Column 2 in such a way that Good is 0 and Outlier 1 (can use JMPs Recode for this), make sure new column is numeric

2. Use formula which will calculate sum (or max value) of the data in the new column (this could be also done with Summary and join or many other ways)

If(ColSum(:"Column 22"n, :"Column 1"n) >= 1,
	"Outlier",
	"Good"
)

3. Then depending on what you want to do with your data, you could remove duplicates or just the values. Select columns Column 1 and Column 3 and then use JMPs select duplicate rows

4. From here you could delete the rows

5. Or if you want to keep the duplicated values but just clear the Column 3 values. While holding alt press on Column3 header so you have only Column3 selected

and press delete

-Jarmo

Georg · Mar 30, 2022 02:43 AM

I would use summary for this,

put Column 1 into Group to ensure single occurrence,

and put Column 2 into Subgroup, to be able to see what has been in Column 2.

Than you can take e.g. "N(Outlier)" to see, whether there as been an outlier, or whatever you need.

Summarize Your Data (jmp.com)

Georg