Solved: Col Min() & Col Max() for Character columns

john_madden · Jan 22, 2019 09:28 AM

I'm interested in a function that would behave analogously to how Col Min()/Col Max() does, including allowing grouping variables; but instead of operating on numeric columns, it would operate on Character columns. For example:

"Col Min Str( :Data, :Grouping)" would return the alphabetically first entry from :Data, grouped by :Grouping.
"Col Max Str(:Data, :Grouping)" would return the alphabetically last entry from :Data, grouped by :Grouping.

Ideas?

txnelson · Jan 22, 2019 11:28 AM

I think this provides what you want

names default to here(1);
dt=open("$SAMPLE_DATA/big class.jmp");

dt << New Column( "Max Name by Sex",
	Character,
	Nominal,
	Formula(
		Col Max Str = Function( {data, by},
			dt = Current Data Table();
			testBy = Column( dt, by )[Row()];
			Eval(
				Parse(
					"rr = dt << get rows where( :" || by || " == \!"" || testBy ||
					"\!" );"
				)
			);
			myList = As List( Column( dt, data )[rr] );
			myList = Sort List( myList );
			myList[N Items( myList )];
		);
		result = Col Max Str( "Name", "Sex" );
	)
);

dt << New Column( "Min Name by Sex",
	Character,
	"Nominal",
	Formula(
		Col Min Str = Function( {data, by},
			dt = Current Data Table();
			testBy = Column( dt, by )[Row()];
			Eval(
				Parse(
					"rr = dt << get rows where( :" || by || " == \!"" || testBy ||
					"\!" );"
				)
			);
			myList = As List( Column( dt, data )[rr] );
			myList = Sort List( myList );
			myList[1];
		);
		result = Col Min Str( "Name", "Sex" );
	)
);

Jim

View solution in original post

txnelson · Jan 22, 2019 11:28 AM

I think this provides what you want

names default to here(1);
dt=open("$SAMPLE_DATA/big class.jmp");

dt << New Column( "Max Name by Sex",
	Character,
	Nominal,
	Formula(
		Col Max Str = Function( {data, by},
			dt = Current Data Table();
			testBy = Column( dt, by )[Row()];
			Eval(
				Parse(
					"rr = dt << get rows where( :" || by || " == \!"" || testBy ||
					"\!" );"
				)
			);
			myList = As List( Column( dt, data )[rr] );
			myList = Sort List( myList );
			myList[N Items( myList )];
		);
		result = Col Max Str( "Name", "Sex" );
	)
);

dt << New Column( "Min Name by Sex",
	Character,
	"Nominal",
	Formula(
		Col Min Str = Function( {data, by},
			dt = Current Data Table();
			testBy = Column( dt, by )[Row()];
			Eval(
				Parse(
					"rr = dt << get rows where( :" || by || " == \!"" || testBy ||
					"\!" );"
				)
			);
			myList = As List( Column( dt, data )[rr] );
			myList = Sort List( myList );
			myList[1];
		);
		result = Col Min Str( "Name", "Sex" );
	)
);

Jim

john_madden · Jan 22, 2019 11:56 AM

Jim,

Thanks, this seems to work just right. It does solve my issue.

But can you say a bit about how it works? It looks interesting, and I'm not familiar with the style, in particular what the Eval ( Parse … ) clause is doing.

Cheers, John

txnelson · Jan 22, 2019 12:19 PM

The use of Eval(Parse()) methodology, just simply allows one to build a JSL statement(s) into a literal string, and then execute them. I use this on occasion when JMP does not interpret a statement properly. I initially tried the statement in question in this form

rr = dt << get rows where( Column( dt, by ) == testBy );

JMP did not like this.....or rather, JMP did not interpret this as I thought it would. The statement did work if it was in an explicit form

rr = dt << get rows where( :sex == "F" );

Therefore, by generating that specific line as a literal string and using the Eval(Parse()) combined function, it generates the exact explicit form of the code I want and then executes it

Eval(Parse(
	"rr = dt << get rows where( :" || by || " == \!"" || testBy ||"\!" );"
	)
);

Jim

txnelson · Jan 22, 2019 01:40 PM

@john_madden, your reply got me started about my use of the Eval(Parse()) method. It is a powerful but confusing methodology. So I took a look at the code once again, and by using an As Column() function the whole Eval(Parse()) thing can be removed

names default to here(1);
dt=open("$SAMPLE_DATA/big class.jmp");

dt << New Column( "Max Name by Sex",
	Character,
	Nominal,
	Formula(
		Col Max Str = Function( {data, by},
			dt = Current Data Table();
			testBy = Column( dt, by )[Row()];
			rr = dt << get rows where( as column(dt, by) == testBy  );
			myList = As List( Column( dt, data )[rr] );
			myList = Sort List( myList );
			myList[N Items( myList )];
		);
		result = Col Max Str( "Name", "Sex" );
	)
);

Jim

Isak_JMP · Jul 11, 2019 03:36 AM

Jim,
Can be this jsl more generic?
for example for the table below

Try Why
A 1A
A 2C
A 2D
B 3B
B 2F
C 1S
C 1W
C 1R

I want to get the max of "Why" per Try i.e.
A 2D
B 3B
C 1W

txnelson · Jul 11, 2019 06:26 AM

The Col Max Str() function is generic. It works on whatever character data is passed to it. Below is it's usage on your sample data table

names default to here(1);
dt=New Table( "Untitled 2",
	Add Rows( 8 ),
	New Column( "Try",
		Character,
		"Nominal",
		Set Values( {"A", "A", "A", "B", "B", "C", "C", "C"} )
	),
	New Column( "Why",
		Character,
		"Nominal",
		Set Values( {"1A", "2C", "2D", "3B", "2F", "1S", "1W", "1R"} )
	)
);

dt << New Column( "Max Why by Try",
	Character,
	Nominal,
	Formula(
		Col Max Str = Function( {data, by},
			dt = Current Data Table();
			testBy = Column( dt, by )[Row()];
			rr = dt << get rows where( as column(dt, by) == testBy  );
			myList = As List( Column( dt, data )[rr] );
			myList = Sort List( myList );
			myList[N Items( myList )];
		);
		result = Col Max Str( "Why", "try" );
	)
);

Jim

vince_faller · Jul 11, 2019 10:57 AM

Here's another solution that uses one of my favorite properties of math.

Names default to here(1);
dt = open("$SAMPLE_DATA\Big Class.jmp");
l = :name << Get Values;

r = rank(l);

rr = rank(r);
dt << new Column("Order", <<Set Values(rr));

dt << new Column("ColMin", Formula(ColMin(:Order, :age)));
dt << new Column("ColMax", Formula(ColMax(:Order, :age)));

dt << New Column("Col Character Min", character, <<Set Each Value(l[r][:ColMin]));
dt << New Column("Col Character Max", character, <<Set Each Value(l[r][:ColMax]));

The way it works is all based on rank.

Rank of a list will give the order required to subset the original list to get an ordered list.

l = {"B", "D", "C", "A"};
r = rank(l);
// r = [4, 1, 3, 2]
// l[r] = {"A", "B", "C", "D"}

The cool think is if you tank a rank() of a rank() it will give you the order required to bring it back to the original list when used to subscript the sorted list.

rr = rank(r);
// rr = [2, 4, 3, 1]
// l[r][rr] = {"B", "D", "C", "A"}

Hope this works.

Vince Faller - Predictum

Isak_JMP · Jul 11, 2019 11:21 AM

Both repliers- thanks - I'll validate this solution and reply you

The Question is if it can work on ~70k rows and how long?

BTW, I'm using very, very strong (RAM, SSD) machine on AWS instance ...

Col Min() & Col Max() for Character columns

Re: Col Min() & Col Max() for Character columns

Re: Col Min() & Col Max() for Character columns

Re: Col Min() & Col Max() for Character columns

Re: Col Min() & Col Max() for Character columns

Re: Col Min() & Col Max() for Character columns

Re: Col Min() & Col Max() for Character columns

Re: Col Min() & Col Max() for Character columns

Re: Col Min() & Col Max() for Character columns

Re: Col Min() & Col Max() for Character columns