- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Formula column to obtain unique values using grouping columns
What would be a simple column formula to obtain unique values with large datasets (100k rows)? Is this possible?
For example, looking at Cars.jmp. The distribution platform gives already the number of levels (unique elements), but I couldn't find a formula column able to calculate these.
Make column --> 37 levels (different manufacturers).
This is often called cardinality, distinct, or number of unique values.
The formula should be able to use group columns so we will obtain, for example, the number of models per manufacturer.
Where we have Toyota, the value will be 11 models. Mitsubishi 6 models, etc.
This is also possible under JMP Query Builder, for example.
SELECT t1.Make, COUNT(DISTINCT t1.Model) AS "Count DISTINCT-Model"
FROM Cars t1
GROUP BY t1.Make;
Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/Cars.jmp" );
Distribution(
Nominal Distribution( Column( :Make ) ),
Nominal Distribution( Column( :Model ) ),
Local Data Filter(
Add Filter(
columns( :Make ),
Where( :Make == "Toyota" ),
Display( :Make, N Items( 15 ), Find( Set Text( "" ) ) )
)
),
SendToReport(
Dispatch( {"Make"}, "Frequencies", OutlineBox, {Close( 1 )} ),
Dispatch( {"Model"}, "Distrib Nom Hist", FrameBox,
{Frame Size( 110, 178 )}
)
)
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Formula column to obtain unique values using grouping columns
Here is my crazy solution
As Constant(
t1 = Current Data Table();
Query(
t1,
"
SELECT t1.Make, COUNT(DISTINCT t1.Model) AS 'Count DISTINCT-Model'
FROM Cars t1
GROUP BY t1.Make;"
);
newdt = Current Data Table();
makeList = newdt:Make << get values;
countMat = Column( newdt, 2 ) << get values;
Close( newdt, nosave );
);
countMat[Contains( makeList, t1:make )];
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Formula column to obtain unique values using grouping columns
will come soon:
Col N Categories - and all the others ...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Formula column to obtain unique values using grouping columns
I think this might provide correct result
Col Sum(Row() == Col Min(Row(), :Model, :Make), :Make)
But currently I would most likely go with Summary -> update and not use a formula
Names Default To Here(1);
dt = Open("$SAMPLE_DATA/Cars.jmp");
// Formula option
newcol = dt << New Column("R", Numeric, Ordinal, Formula(
Col Sum(Row() == Col Min(Row(), :Model, :Make), :Make)
));
// Summary + update
dt_summary = dt << Summary(
Group(:Make),
N Categories(:Model),
Freq("None"),
Weight("None"),
statistics column name format("stat of column"),
private
);
// Update data table
dt << Update(
With(dt_summary),
Match Columns(:Make = :Make),
Add Columns from Update Table(:N Categories of Model),
Replace Columns in Main Table(:N Categories of Model)
);
Close(dt_summary, no save);