cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
SpannerHead
Level IV

Alphabetical Ranking of Nominal Data

If I have a column of nominal data where the inputs can occur more than once, is there a way to have a column whereby the output is the alphabetical ranking of the input?  I can go the scripting route but I'd prefer a simple column formula.


Slán



SpannerHead
1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: Alphabetical Ranking of Nominal Data

It is late in the day, so my solution is probably not the best, however using the Big Class data table, Here is the formula I came up with

As Constant( Summarize( theLevels = by( :age ) ) );
Contains( theLevels, Char( :age ) );

txnelson_0-1734132093242.png

 

Jim

View solution in original post

8 REPLIES 8
txnelson
Super User

Re: Alphabetical Ranking of Nominal Data

It is late in the day, so my solution is probably not the best, however using the Big Class data table, Here is the formula I came up with

As Constant( Summarize( theLevels = by( :age ) ) );
Contains( theLevels, Char( :age ) );

txnelson_0-1734132093242.png

 

Jim
jthi
Super User

Re: Alphabetical Ranking of Nominal Data

Until we get Add dense ranking to Ranking Tie and Col Rank functions I would use something like txnelson did show in simple cases. There are all sorts of solutions using Col statistical functions but they tend to be very difficult to understand as you have to combine multiple of them and inside of each other (you can find some from the comments of that wish list item).

 

-Jarmo
hogi
Level XII

Re: Alphabetical Ranking of Nominal Data

On the other hand - 
wich such a clever workaround I fear that it will take some while till dense ranking gets implemented.

Thanks Jim!!

hogi
Level XII

Re: Alphabetical Ranking of Nominal Data

hogi_0-1734206541018.png

 

dt =  Open( "$SAMPLE_DATA/Big Class Families.jmp" );

for(i=1, i<16, i++, dt<< Concatenate(dt,append to first table()));
wait(0);
show(nrows(dt));

Print("std rank");
t0 = hptime();
New Column( "std Rank",
	Formula(
	Col Rank( :height)
	)
);
dt << run formulas();
show((hptime()-t0)/1000000);


Print("row_min rank median");
t0 = hptime();
New Column( "dense_Rank_ref1",
	Formula(
		Col Median(
			Col Rank( If( Row() == Col Minimum( Row(), :height ), :height, . ) ),
			:height
		)
	)
);
dt << run formulas();
show((hptime()-t0)/1000000);



print("summarize_contains");
t0 = hptime();
New Column( "dense rank_summarize_contains",

	Formula(
		As Constant( Summarize( theLevels = by( :height ) ) );
		Contains( theLevels, Char( :height ) );
	)
);
dt << run formulas();
show((hptime()-t0)/1000000);


print("row_min cumulative sum");
t0 = hptime();
print("- just works after sorting");
dt << Sort(	By( :height ),	Replace Table);
New Column( "dense_Rank_ref1",

	Formula(

Col Cumulative Sum(If(Row() == Col Min(Row(), :height), 1,0))
	)
);
dt << run formulas();
show((hptime()-t0)/1000000);

close(dt, nosave);
hogi
Level XII

Re: Alphabetical Ranking of Nominal Data

factor 2 faster: a detour via summary table:

hogi_0-1734209642530.png

 

dt =  Open( "$SAMPLE_DATA/Big Class Families.jmp" );

for(i=1, i<16, i++, dt<< Concatenate(dt,append to first table()));

t0 = hptime();
dtsum = dt << Summary(	Group( :height ),private);
dtsum  << New Formula Column(
	Operation( Category( "Distributional" ), "Rank" ),
	Columns( :height )
);
dtsum << run formulas();

dt  << Update(
	With(dtsum),
	Match Columns( :height = :height ),
	Add Columns from Update Table( :"Rank[height]"n ),
	Replace Columns in Main Table( None )
);

show((hptime() - t0)/1000000)

 

This approach can be easily extended to calculate dense ranking also for subsets (groupBy).

hogi
Level XII

Re: Alphabetical Ranking of Nominal Data

When you create a summary table instead of using Summarize, one can @txnelson approach to calculate dense ranks for subsets.

the disadvantage: quite slow 

hogi_0-1734219947328.png

 

View more...
names default to here(1);
Try(close(dt,nosave));
dt =  Open( "$SAMPLE_DATA/Big Class Families.jmp" );

for(i=1, i<16, i++, dt<< Concatenate(dt,append to first table()));

t0 = hptime();

//dense rank - with groupby
dt << New Column( "dense rank_ grouped by",
Character, 
	formula (
	local({tmp,myAge},
	as constant (
			dt = Current Data Table();
			dtsum = dt << summary( Group( :age, :height ) );
			//dtsum:age << set name(age2);
			nr = N Rows( dt );
		);
		rw = row();
		myage = dt:age[row()];
		Eval (Eval Expr(myrows = dtsum << get rows where( dtsum:age == Expr(myAge)))); // <- use dtsum:age !!!
		tmp = contains(dtsum[myrows,2], dt:height[rw]); // <- use dt:height !!!!
		if(row()==nr, close(dtsum, noSave));
		tmp

)));

show((hptime() - t0)/1000000)

 

hogi
Level XII

Re: Alphabetical Ranking of Nominal Data

Better put everything  into As Constant and stop the "formula" before the main part begins.

... super fast.

 

... with the help of a small GUI: quite flexible

hogi_0-1734297391806.png

 

... and via the main command that is allowed to sleep forever: even dynamic : )

Names Default To Here( 1 );
Try(
	dt = Current Data Table(),
	Caption( "no dat table" );
	Stop();
);


updateLBox = Function( {t},
	(t << sib) << setitems( (t << topparent)[listboxbox( 1 )] << getselected )
);

nw = New Window( "rank unique entries",
	<<Type( "Modal Dialog" ),
	<<return result,
	Border Box( Left( 10 ), Right( 10 ), bottom( 5 ),
		V List Box(
			Lineup Box( N Col( 1 ), spacing( 10 ), 
				// Text Box( "XG Boost Specification" ), 
				// colistbox only necessary if the script is standalone
				H List Box(
					Panel Box( "Select Columns", Col List Box( all, nlines( 15 ) ) ),
					Panel Box( "Cast Selected Columns into Roles",
						Lineup Box( N Col( 2 ),
							Button Box( "Unique", <<setfunction( Function( {t}, updateLBox( t ) ) ) ),
							Cunique = Col List Box( nlines( 2 ), minitems( 1 ), maxitems( 1 ) ),
							Button Box( "By Group", <<setfunction( Function( {t}, updateLBox( t ) ) ) ),
							CBy = Col List Box( nlines( 2 ), minitems( 0 ) )
						)
					)
				)
			),
			Spacer Box( size( 1, 10 ) ),
			Button Box( "OK",
				<<setfunction(
					Function( {t},
						ColsUnique = Cunique << get items;
						ColsBy = Cby << get items;
					)
				)
			),
			Spacer Box( size( 1, 10 ) ),

		)
	)
);

If( Not( nw["Button"] == 1 ),
	Stop()
);


myCol = dt << New Column( "Rank_" || Char( ColsUnique ) || If( N Items( ColsBy ), "_by_ " || Char( ColsBy ), "" ) );
colName = mycol << get name();

groupExpr = Expr( Group() );
For Each( {col}, ColsUnique, Insert Into( groupExpr, Name Expr( As Column( col ) ) ) );
For Each( {col}, ColsBy, Insert Into( groupExpr, Name Expr( As Column( col ) ) ) );
mainCommand = Substitute( Name Expr( groupExpr ), Expr( Group() ), Expr( Concat Items() ) );

If( N Items( ColsUnique ) == 1,
	columnsExpr = Eval Expr( Columns( Expr( Name Expr( As Column( ColsUnique[1] ) ) ) ) ),
	Caption( "multiple columns not implemented yet" );
	Stop();
);
GroupByExpr = Expr( Group By() );
For Each( {col}, ColsBy, Insert Into( groupByExpr, Name Expr( As Column( col ) ) ) );

Eval(
	Eval Expr(
		myCol << set formula(

			As Constant(
				dt0 = Current Data Table();
				dtsum = dt0 << Summary( Expr( Name Expr( groupExpr ) ) );
				newCol = dtsum << New Formula Column(
					Operation( Category( "Distributional" ), "Rank" ),
					Expr(
						Name Expr( Columns Expr )
					),
					Expr(
						Name Expr( GroupByExpr )
					),
				);
				newcol[1] << set name( Expr( colName ) );
				dtsum << run formulas();

				dt0 << Update(
					With( dtsum ),
					Match Columns( Expr( Parse( "dt0:" || ColsUnique[1] || " = dtsum:" || ColsUnique[1] ) ) ),
					Add Columns from Update Table( none ),
					Replace Columns in Main Table( Expr( colName ) )
				);
				Close( dtsum, noSave );
				Stop();
			);
			Expr(
				Name Expr( mainCommand ) // will never execute - ut needed for the automatic update : )
			)

			;
		)
	)
);

dt << run formulas();

Workflows\BuildingBlocks\ColumnOperations\RankUnique_with_ByGroup.jmpflow

SpannerHead
Level IV

Re: Alphabetical Ranking of Nominal Data

hogi

 

This one's really clever.  I can enter multiple values in the By Group if I hold down the Ctrl button but it only operates on the last entry for the number of inputs.  Am I doing something wrong?


Slán



SpannerHead