cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMP® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
SpannerHead
Level IV

Alphabetical Ranking of Nominal Data

If I have a column of nominal data where the inputs can occur more than once, is there a way to have a column whereby the output is the alphabetical ranking of the input?  I can go the scripting route but I'd prefer a simple column formula.


Slán



SpannerHead
6 REPLIES 6
txnelson
Super User

Re: Alphabetical Ranking of Nominal Data

It is late in the day, so my solution is probably not the best, however using the Big Class data table, Here is the formula I came up with

As Constant( Summarize( theLevels = by( :age ) ) );
Contains( theLevels, Char( :age ) );

txnelson_0-1734132093242.png

 

Jim
jthi
Super User

Re: Alphabetical Ranking of Nominal Data

Until we get Add dense ranking to Ranking Tie and Col Rank functions I would use something like txnelson did show in simple cases. There are all sorts of solutions using Col statistical functions but they tend to be very difficult to understand as you have to combine multiple of them and inside of each other (you can find some from the comments of that wish list item).

 

-Jarmo
hogi
Level XII

Re: Alphabetical Ranking of Nominal Data

On the other hand - 
wich such a clever workaround I fear that it will take some while till dense ranking gets implemented.

Thanks Jim!!

hogi
Level XII

Re: Alphabetical Ranking of Nominal Data

hogi_0-1734206541018.png

 

dt =  Open( "$SAMPLE_DATA/Big Class Families.jmp" );

for(i=1, i<16, i++, dt<< Concatenate(dt,append to first table()));
wait(0);
show(nrows(dt));

Print("std rank");
t0 = hptime();
New Column( "std Rank",
	Formula(
	Col Rank( :height)
	)
);
dt << run formulas();
show((hptime()-t0)/1000000);


Print("row_min rank median");
t0 = hptime();
New Column( "dense_Rank_ref1",
	Formula(
		Col Median(
			Col Rank( If( Row() == Col Minimum( Row(), :height ), :height, . ) ),
			:height
		)
	)
);
dt << run formulas();
show((hptime()-t0)/1000000);



print("summarize_contains");
t0 = hptime();
New Column( "dense rank_summarize_contains",

	Formula(
		As Constant( Summarize( theLevels = by( :height ) ) );
		Contains( theLevels, Char( :height ) );
	)
);
dt << run formulas();
show((hptime()-t0)/1000000);


print("row_min cumulative sum");
t0 = hptime();
print("- just works after sorting");
dt << Sort(	By( :height ),	Replace Table);
New Column( "dense_Rank_ref1",

	Formula(

Col Cumulative Sum(If(Row() == Col Min(Row(), :height), 1,0))
	)
);
dt << run formulas();
show((hptime()-t0)/1000000);

close(dt, nosave);
hogi
Level XII

Re: Alphabetical Ranking of Nominal Data

factor 2 faster: a detour via summary table:

hogi_0-1734209642530.png

 

dt =  Open( "$SAMPLE_DATA/Big Class Families.jmp" );

for(i=1, i<16, i++, dt<< Concatenate(dt,append to first table()));

t0 = hptime();
dtsum = dt << Summary(	Group( :height ),private);
dtsum  << New Formula Column(
	Operation( Category( "Distributional" ), "Rank" ),
	Columns( :height )
);
dtsum << run formulas();

dt  << Update(
	With(dtsum),
	Match Columns( :height = :height ),
	Add Columns from Update Table( :"Rank[height]"n ),
	Replace Columns in Main Table( None )
);

show((hptime() - t0)/1000000)

 

This approach can be easily extended to calculate dense ranking also for subsets (groupBy).

hogi
Level XII

Re: Alphabetical Ranking of Nominal Data

When you create a summary table instead of using Summarize, one can @txnelson approach to calculate dense ranks for subsets.

the disadvantage: quite slow 

hogi_0-1734219947328.png

 

View more...
names default to here(1);
Try(close(dt,nosave));
dt =  Open( "$SAMPLE_DATA/Big Class Families.jmp" );

for(i=1, i<16, i++, dt<< Concatenate(dt,append to first table()));

t0 = hptime();

//dense rank - with groupby
dt << New Column( "dense rank_ grouped by",
Character, 
	formula (
	local({tmp,myAge},
	as constant (
			dt = Current Data Table();
			dtsum = dt << summary( Group( :age, :height ) );
			//dtsum:age << set name(age2);
			nr = N Rows( dt );
		);
		rw = row();
		myage = dt:age[row()];
		Eval (Eval Expr(myrows = dtsum << get rows where( dtsum:age == Expr(myAge)))); // <- use dtsum:age !!!
		tmp = contains(dtsum[myrows,2], dt:height[rw]); // <- use dt:height !!!!
		if(row()==nr, close(dtsum, noSave));
		tmp

)));

show((hptime() - t0)/1000000)