Re: Add Counter for Unique Cases in Groups

StumptownSlash · May 9, 2023 9:51 PM

Hi all,

How can I write an integer value counter to increment unique values in column B within group A, to do this?

Here is what I have tried so far:

dt_Table1 = Data Table("Table1");
dt_Table1 << New Column("Dataset", Numeric, "Nominal", Set Selected);

Sum1 = dt_Table1 << Summary(Group(:Group, :Date));
Sum2 = Sum1 << Summary(Group(:Group));
:N Rows << Set Name("N Datasets in Group");

dt_Table1 << Update(
	with(Data Table(Sum2)),
	Merge Same Name Columns
Match Flag(0),
	Suppress foruma evaluation(0),
	By Matching Columns(:Group = :Group),
	Drop multiples(1, 0),
	Include Nonmatches(0, 0),
	Preserve main table order(1)
);

dt = dt_Table1;
For(i = 1, i <= :N Rows(dt), i++,
	While(Group_Number = GetValue(:Group[i]),
		N == Get Value(dt, :N Datasets in Group[i]),
		For(j = 1, j <= N, j++,
			:Dataset << Set Value == (j),

		),
		Group_Number = Group_Number + 1,

	),

);

Outputs:

jthi · May 10, 2023 01:02 AM

You should be able to do this with some combination of Col Rank(), Col Cumulative Sum, Col Min() and so on functions. Based on your example, Col Rank() or Col Cumulative Sum(1, <byvars>) should be enough (there are no duplicate dates in any of the groups).

Column 3 is using Col Rank and Column 4 Col Cumulative Sum and Column 5 one option of a bit more complicated formula

Names Default To Here(1);

dt = New Table("Untitled",
	Add Rows(8),
	Compress File When Saved(1),
	New Column("g", Numeric, "Continuous", Format("Best", 12), Set Values([1, 1, 1, 1, 2, 2, 2, 2])),
	New Column("d", Numeric, "Continuous", Format("Best", 12), Set Values([1, 2, 3, 4, 1, 2, 2, 3])),
	New Column("r", Numeric, "Continuous", Format("Best", 12), Set Values([1, 2, 3, 4, 1, 2, 2, 3])),
	New Column("Column 3", Numeric, "Continuous", Format("Best", 12), Formula(Col Rank(:d, :g))),
	New Column("Column 4", Numeric, "Continuous", Format("Best", 12), Formula(Col Cumulative Sum(1, :g)))
);

dt << New Column("Column 5", Numeric, Continuous, Formula(
	Col Cumulative Sum(
		If(Row() == Col Min(Row(), :g, :d),
			1
		,
			.
		),
		:g
	)
));

-Jarmo

StumptownSlash · May 10, 2023 9:30 AM

Thanks Jarmo! Your way is definitely cleaner. I don't totally follow the logic, but I see it works!

Note: I found another way, that also used Col Cumulative Sum(). Slower than yours. Multiple sub-tables and a join...

Summary1 = Table1 << Summary(Group(:Group, :Date));
Summary2 = Summary1 << Summary(Group(:Group, :Date));
Summary2 << New Column( "Dataset",
Numeric,
"Nominal",
Set Selected
);
:Dataset << Formula(Col Cumulative Sum( :N Rows, :Group ) );

//Join to Table 1

Table1 << Join(
with ( Data Table( Summary2 ) ),
Select(all),
SelectWith(
:Dataset
),
By Matching Columns( :Run = :Run ),
Drop multiples( 1, 0 ),
Include Nonmatches( 0, 0 ),
Preserve main table order( 1 ),
Output Table Name( "Table2" )
);

Byron_JMP · May 15, 2023 02:49 PM

I make a new column with this formula.

If( Row() == 1,

1,

If( :ID == Lag( :ID, 1 ),

Lag( :Name( "Count of ID" ) ) + 1,1))

This assumes your "ID" column is sorted sequentially.

the Count of ID, column is the name of the new column with the above formula

JMP Systems Engineer, Health and Life Sciences (Pharma)

hogi · Oct 28, 2023 04:31 PM

Hm, seems that Formula editor is missing a Col N Categories function with a GroupBy option.

Wish is waiting for Kudos:

new JSL function: Col N Categories

txnelson · Oct 29, 2023 11:59 AM

Here is a little script that produces the number of unique values found in column B, within column A

Names Default To Here( 1 );
dt = new table("Example",
New Column("A", character),
	New Column("B", character)
);
charList = {"A","B","C","D","E","F","G","H","I","J"};
dt << add rows(25);
for each row(
	:A = char(random integer(1,5));
	:B = charList[random integer(1,10)]
);

dt << New Column( "count",
	set each value(
		As Constant( Summarize( dt, bygroup = by( :a,  ) ) );
		N Items( Loc( bygroup[1], :a ) );
	)
);

Jim

hogi · Oct 29, 2023 12:57 PM

nice workaround

(p -> b)

but the missing Col N Categories still hurts. Can one add a custom function Col N Categories (: col, <{...}>)

the optional list could be used to specify columns/subgroups used as by - like the :a in the previous example.

Who knows how in the summarize function the columns can be referenced?
Custom Function - how to reference the column

jthi · Oct 29, 2023 02:37 PM

Can one add a custom function Col N Categories (: col, <{...}>)

Not in clean and nice way Creating statistical custom (formula) function which has byVar ,Option or function to evaluate a custom column formula at once, or cache values between rows

-Jarmo

hogi · Oct 29, 2023 05:59 PM

@jthi wrote:
Not in clean and nice way

Ouch!

I wanted to add the custom Col N Categories function to have it available in the right click New Formula Column menu.

So, if it requires additional Expr() Exrepssions around the columns when calling the function, there is NO chance to get this right?

Argh ....

Really astonishing that it takes so long to add the missing Col N Categories

jthi · Oct 29, 2023 02:31 PM

Usually some weird combination of different Col something() functions can also be used (might not be the fastest due to the comparison done and if I remember correctly you shouldn't use Col functions this way even though it is one of the best ways of using them).

dt << New Column("C", Formula(
	Col Sum(Row() == Col Min(Row(), :A, :B), :A);	
));

-Jarmo