- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Alphabetical Ranking of Nominal Data
If I have a column of nominal data where the inputs can occur more than once, is there a way to have a column whereby the output is the alphabetical ranking of the input? I can go the scripting route but I'd prefer a simple column formula.
Slán
SpannerHead
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Alphabetical Ranking of Nominal Data
It is late in the day, so my solution is probably not the best, however using the Big Class data table, Here is the formula I came up with
As Constant( Summarize( theLevels = by( :age ) ) );
Contains( theLevels, Char( :age ) );
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Alphabetical Ranking of Nominal Data
It is late in the day, so my solution is probably not the best, however using the Big Class data table, Here is the formula I came up with
As Constant( Summarize( theLevels = by( :age ) ) );
Contains( theLevels, Char( :age ) );
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Alphabetical Ranking of Nominal Data
Until we get Add dense ranking to Ranking Tie and Col Rank functions I would use something like txnelson did show in simple cases. There are all sorts of solutions using Col statistical functions but they tend to be very difficult to understand as you have to combine multiple of them and inside of each other (you can find some from the comments of that wish list item).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Alphabetical Ranking of Nominal Data
On the other hand -
wich such a clever workaround I fear that it will take some while till dense ranking gets implemented.
Thanks Jim!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Alphabetical Ranking of Nominal Data
dt = Open( "$SAMPLE_DATA/Big Class Families.jmp" );
for(i=1, i<16, i++, dt<< Concatenate(dt,append to first table()));
wait(0);
show(nrows(dt));
Print("std rank");
t0 = hptime();
New Column( "std Rank",
Formula(
Col Rank( :height)
)
);
dt << run formulas();
show((hptime()-t0)/1000000);
Print("row_min rank median");
t0 = hptime();
New Column( "dense_Rank_ref1",
Formula(
Col Median(
Col Rank( If( Row() == Col Minimum( Row(), :height ), :height, . ) ),
:height
)
)
);
dt << run formulas();
show((hptime()-t0)/1000000);
print("summarize_contains");
t0 = hptime();
New Column( "dense rank_summarize_contains",
Formula(
As Constant( Summarize( theLevels = by( :height ) ) );
Contains( theLevels, Char( :height ) );
)
);
dt << run formulas();
show((hptime()-t0)/1000000);
print("row_min cumulative sum");
t0 = hptime();
print("- just works after sorting");
dt << Sort( By( :height ), Replace Table);
New Column( "dense_Rank_ref1",
Formula(
Col Cumulative Sum(If(Row() == Col Min(Row(), :height), 1,0))
)
);
dt << run formulas();
show((hptime()-t0)/1000000);
close(dt, nosave);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Alphabetical Ranking of Nominal Data
factor 2 faster: a detour via summary table:
dt = Open( "$SAMPLE_DATA/Big Class Families.jmp" );
for(i=1, i<16, i++, dt<< Concatenate(dt,append to first table()));
t0 = hptime();
dtsum = dt << Summary( Group( :height ),private);
dtsum << New Formula Column(
Operation( Category( "Distributional" ), "Rank" ),
Columns( :height )
);
dtsum << run formulas();
dt << Update(
With(dtsum),
Match Columns( :height = :height ),
Add Columns from Update Table( :"Rank[height]"n ),
Replace Columns in Main Table( None )
);
show((hptime() - t0)/1000000)
This approach can be easily extended to calculate dense ranking also for subsets (groupBy).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Alphabetical Ranking of Nominal Data
When you create a summary table instead of using Summarize, one can @txnelson approach to calculate dense ranks for subsets.
the disadvantage: quite slow
names default to here(1);
Try(close(dt,nosave));
dt = Open( "$SAMPLE_DATA/Big Class Families.jmp" );
for(i=1, i<16, i++, dt<< Concatenate(dt,append to first table()));
t0 = hptime();
//dense rank - with groupby
dt << New Column( "dense rank_ grouped by",
Character,
formula (
local({tmp,myAge},
as constant (
dt = Current Data Table();
dtsum = dt << summary( Group( :age, :height ) );
//dtsum:age << set name(age2);
nr = N Rows( dt );
);
rw = row();
myage = dt:age[row()];
Eval (Eval Expr(myrows = dtsum << get rows where( dtsum:age == Expr(myAge)))); // <- use dtsum:age !!!
tmp = contains(dtsum[myrows,2], dt:height[rw]); // <- use dt:height !!!!
if(row()==nr, close(dtsum, noSave));
tmp
)));
show((hptime() - t0)/1000000)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Alphabetical Ranking of Nominal Data
Better put everything into As Constant and stop the "formula" before the main part begins.
... super fast.
... with the help of a small GUI: quite flexible
... and via the main command that is allowed to sleep forever: even dynamic : )
Names Default To Here( 1 );
Try(
dt = Current Data Table(),
Caption( "no dat table" );
Stop();
);
updateLBox = Function( {t},
(t << sib) << setitems( (t << topparent)[listboxbox( 1 )] << getselected )
);
nw = New Window( "rank unique entries",
<<Type( "Modal Dialog" ),
<<return result,
Border Box( Left( 10 ), Right( 10 ), bottom( 5 ),
V List Box(
Lineup Box( N Col( 1 ), spacing( 10 ),
// Text Box( "XG Boost Specification" ),
// colistbox only necessary if the script is standalone
H List Box(
Panel Box( "Select Columns", Col List Box( all, nlines( 15 ) ) ),
Panel Box( "Cast Selected Columns into Roles",
Lineup Box( N Col( 2 ),
Button Box( "Unique", <<setfunction( Function( {t}, updateLBox( t ) ) ) ),
Cunique = Col List Box( nlines( 2 ), minitems( 1 ), maxitems( 1 ) ),
Button Box( "By Group", <<setfunction( Function( {t}, updateLBox( t ) ) ) ),
CBy = Col List Box( nlines( 2 ), minitems( 0 ) )
)
)
)
),
Spacer Box( size( 1, 10 ) ),
Button Box( "OK",
<<setfunction(
Function( {t},
ColsUnique = Cunique << get items;
ColsBy = Cby << get items;
)
)
),
Spacer Box( size( 1, 10 ) ),
)
)
);
If( Not( nw["Button"] == 1 ),
Stop()
);
myCol = dt << New Column( "Rank_" || Char( ColsUnique ) || If( N Items( ColsBy ), "_by_ " || Char( ColsBy ), "" ) );
colName = mycol << get name();
groupExpr = Expr( Group() );
For Each( {col}, ColsUnique, Insert Into( groupExpr, Name Expr( As Column( col ) ) ) );
For Each( {col}, ColsBy, Insert Into( groupExpr, Name Expr( As Column( col ) ) ) );
mainCommand = Substitute( Name Expr( groupExpr ), Expr( Group() ), Expr( Concat Items() ) );
If( N Items( ColsUnique ) == 1,
columnsExpr = Eval Expr( Columns( Expr( Name Expr( As Column( ColsUnique[1] ) ) ) ) ),
Caption( "multiple columns not implemented yet" );
Stop();
);
GroupByExpr = Expr( Group By() );
For Each( {col}, ColsBy, Insert Into( groupByExpr, Name Expr( As Column( col ) ) ) );
Eval(
Eval Expr(
myCol << set formula(
As Constant(
dt0 = Current Data Table();
dtsum = dt0 << Summary( Expr( Name Expr( groupExpr ) ) );
newCol = dtsum << New Formula Column(
Operation( Category( "Distributional" ), "Rank" ),
Expr(
Name Expr( Columns Expr )
),
Expr(
Name Expr( GroupByExpr )
),
);
newcol[1] << set name( Expr( colName ) );
dtsum << run formulas();
dt0 << Update(
With( dtsum ),
Match Columns( Expr( Parse( "dt0:" || ColsUnique[1] || " = dtsum:" || ColsUnique[1] ) ) ),
Add Columns from Update Table( none ),
Replace Columns in Main Table( Expr( colName ) )
);
Close( dtsum, noSave );
Stop();
);
Expr(
Name Expr( mainCommand ) // will never execute - ut needed for the automatic update : )
)
;
)
)
);
dt << run formulas();
Workflows\BuildingBlocks\ColumnOperations\RankUnique_with_ByGroup.jmpflow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Alphabetical Ranking of Nominal Data
hogi
This one's really clever. I can enter multiple values in the By Group if I hold down the Ctrl button but it only operates on the last entry for the number of inputs. Am I doing something wrong?
Slán
SpannerHead