Re: Formula column to obtain unique values using grouping columns

Report Inappropriate Content

What would be a simple column formula to obtain unique values with large datasets (100k rows)? Is this possible?

For example, looking at Cars.jmp. The distribution platform gives already the number of levels (unique elements), but I couldn't find a formula column able to calculate these.

Make column --> 37 levels (different manufacturers).

This is often called cardinality, distinct, or number of unique values.

The formula should be able to use group columns so we will obtain, for example, the number of models per manufacturer.

Where we have Toyota, the value will be 11 models. Mitsubishi 6 models, etc.

This is also possible under JMP Query Builder, for example.

SELECT t1.Make, COUNT(DISTINCT t1.Model) AS "Count DISTINCT-Model"
FROM Cars t1
GROUP BY t1.Make;

Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/Cars.jmp" );

Distribution(
	Nominal Distribution( Column( :Make ) ),
	Nominal Distribution( Column( :Model ) ),
	Local Data Filter(
		Add Filter(
			columns( :Make ),
			Where( :Make == "Toyota" ),
			Display( :Make, N Items( 15 ), Find( Set Text( "" ) ) )
		)
	),
	SendToReport(
		Dispatch( {"Make"}, "Frequencies", OutlineBox, {Close( 1 )} ),
		Dispatch( {"Model"}, "Distrib Nom Hist", FrameBox,
			{Frame Size( 110, 178 )}
		)
	)
);

txnelson · | Posted in reply to message from FN 03-12-2025

Here is my crazy solution

As Constant(
	t1 = Current Data Table();
	Query(
		t1,
		"
		SELECT t1.Make, COUNT(DISTINCT t1.Model) AS 'Count DISTINCT-Model'
FROM Cars t1
GROUP BY t1.Make;"
	);
	newdt = Current Data Table();
	makeList = newdt:Make << get values;
	countMat = Column( newdt, 2 ) << get values;
	Close( newdt, nosave );
);
countMat[Contains( makeList, t1:make )];

Jim

hogi · | Posted in reply to message from txnelson 03-12-2025

will come soon:
Col N Categories - and all the others ...

jthi · | Posted in reply to message from FN 03-12-2025

I think this might provide correct result

Col Sum(Row() == Col Min(Row(), :Model, :Make), :Make)

But currently I would most likely go with Summary -> update and not use a formula

Names Default To Here(1);

dt = Open("$SAMPLE_DATA/Cars.jmp");


// Formula option
newcol = dt << New Column("R", Numeric, Ordinal, Formula(
	Col Sum(Row() == Col Min(Row(), :Model, :Make), :Make)
));


// Summary + update
dt_summary = dt << Summary(
	Group(:Make),
	N Categories(:Model),
	Freq("None"),
	Weight("None"),
	statistics column name format("stat of column"),
	private
);


// Update data table
dt << Update(
	With(dt_summary),
	Match Columns(:Make = :Make),
	Add Columns from Update Table(:N Categories of Model),
	Replace Columns in Main Table(:N Categories of Model)
);

Close(dt_summary, no save);

-Jarmo