Re: Alphabetical Ranking of Nominal Data

SpannerHead

If I have a column of nominal data where the inputs can occur more than once, is there a way to have a column whereby the output is the alphabetical ranking of the input? I can go the scripting route but I'd prefer a simple column formula.

Slán

SpannerHead

txnelson · | Posted in reply to message from SpannerHead 12-13-2024

It is late in the day, so my solution is probably not the best, however using the Big Class data table, Here is the formula I came up with

As Constant( Summarize( theLevels = by( :age ) ) );
Contains( theLevels, Char( :age ) );

Jim

jthi · | Posted in reply to message from SpannerHead 12-13-2024

Until we get Add dense ranking to Ranking Tie and Col Rank functions I would use something like txnelson did show in simple cases. There are all sorts of solutions using Col statistical functions but they tend to be very difficult to understand as you have to combine multiple of them and inside of each other (you can find some from the comments of that wish list item).

-Jarmo

hogi · | Posted in reply to message from jthi 12-14-2024

On the other hand -
wich such a clever workaround I fear that it will take some while till dense ranking gets implemented.

Thanks Jim!!

hogi · | Posted in reply to message from hogi 12-14-2024

dt =  Open( "$SAMPLE_DATA/Big Class Families.jmp" );

for(i=1, i<16, i++, dt<< Concatenate(dt,append to first table()));
wait(0);
show(nrows(dt));

Print("std rank");
t0 = hptime();
New Column( "std Rank",
	Formula(
	Col Rank( :height)
	)
);
dt << run formulas();
show((hptime()-t0)/1000000);


Print("row_min rank median");
t0 = hptime();
New Column( "dense_Rank_ref1",
	Formula(
		Col Median(
			Col Rank( If( Row() == Col Minimum( Row(), :height ), :height, . ) ),
			:height
		)
	)
);
dt << run formulas();
show((hptime()-t0)/1000000);



print("summarize_contains");
t0 = hptime();
New Column( "dense rank_summarize_contains",

	Formula(
		As Constant( Summarize( theLevels = by( :height ) ) );
		Contains( theLevels, Char( :height ) );
	)
);
dt << run formulas();
show((hptime()-t0)/1000000);


print("row_min cumulative sum");
t0 = hptime();
print("- just works after sorting");
dt << Sort(	By( :height ),	Replace Table);
New Column( "dense_Rank_ref1",

	Formula(

Col Cumulative Sum(If(Row() == Col Min(Row(), :height), 1,0))
	)
);
dt << run formulas();
show((hptime()-t0)/1000000);

close(dt, nosave);

hogi · Dec 14, 2024 3:38 PM

factor 2 faster: a detour via summary table:

dt =  Open( "$SAMPLE_DATA/Big Class Families.jmp" );

for(i=1, i<16, i++, dt<< Concatenate(dt,append to first table()));

t0 = hptime();
dtsum = dt << Summary(	Group( :height ),private);
dtsum  << New Formula Column(
	Operation( Category( "Distributional" ), "Rank" ),
	Columns( :height )
);
dtsum << run formulas();

dt  << Update(
	With(dtsum),
	Match Columns( :height = :height ),
	Add Columns from Update Table( :"Rank[height]"n ),
	Replace Columns in Main Table( None )
);

show((hptime() - t0)/1000000)

This approach can be easily extended to calculate dense ranking also for subsets (groupBy).

hogi · Dec 14, 2024 4:18 PM

When you create a summary table instead of using Summarize, one can @txnelson approach to calculate dense ranks for subsets.

the disadvantage: quite slow

names default to here(1);
Try(close(dt,nosave));
dt =  Open( "$SAMPLE_DATA/Big Class Families.jmp" );

for(i=1, i<16, i++, dt<< Concatenate(dt,append to first table()));

t0 = hptime();

//dense rank - with groupby
dt << New Column( "dense rank_ grouped by",
Character, 
	formula (
	local({tmp,myAge},
	as constant (
			dt = Current Data Table();
			dtsum = dt << summary( Group( :age, :height ) );
			//dtsum:age << set name(age2);
			nr = N Rows( dt );
		);
		rw = row();
		myage = dt:age[row()];
		Eval (Eval Expr(myrows = dtsum << get rows where( dtsum:age == Expr(myAge)))); // <- use dtsum:age !!!
		tmp = contains(dtsum[myrows,2], dt:height[rw]); // <- use dt:height !!!!
		if(row()==nr, close(dtsum, noSave));
		tmp

)));

show((hptime() - t0)/1000000)

names default to here(1); Try(close(dt,nosave)); dt = Open( "$SAMPLE_DATA/Big Class Families.jmp" ); for(i=1, i<16, i++, dt<< Concatenate(dt,append to first table())); t0 = hptime(); //dense rank - with groupby dt << New Column( "dense rank_ grouped by", Character, formula ( local({tmp,myAge}, as constant ( dt = Current Data Table(); dtsum = dt << summary( Group( :age, :height ) ); //dtsum:age << set name(age2); nr = N Rows( dt ); ); rw = row(); myage = dt:age[row()]; Eval (Eval Expr(myrows = dtsum << get rows where( dtsum:age == Expr(myAge)))); // <- use dtsum:age !!! tmp = contains(dtsum[myrows,2], dt:height[rw]); // <- use dt:height !!!! if(row()==nr, close(dtsum, noSave)); tmp ))); show((hptime() - t0)/1000000)