I'm interested in a function that would behave analogously to how Col Min()/Col Max() does, including allowing grouping variables; but instead of operating on numeric columns, it would operate on Character columns. For example:
Ideas?
I think this provides what you want
names default to here(1);
dt=open("$SAMPLE_DATA/big class.jmp");
dt << New Column( "Max Name by Sex",
Character,
Nominal,
Formula(
Col Max Str = Function( {data, by},
dt = Current Data Table();
testBy = Column( dt, by )[Row()];
Eval(
Parse(
"rr = dt << get rows where( :" || by || " == \!"" || testBy ||
"\!" );"
)
);
myList = As List( Column( dt, data )[rr] );
myList = Sort List( myList );
myList[N Items( myList )];
);
result = Col Max Str( "Name", "Sex" );
)
);
dt << New Column( "Min Name by Sex",
Character,
"Nominal",
Formula(
Col Min Str = Function( {data, by},
dt = Current Data Table();
testBy = Column( dt, by )[Row()];
Eval(
Parse(
"rr = dt << get rows where( :" || by || " == \!"" || testBy ||
"\!" );"
)
);
myList = As List( Column( dt, data )[rr] );
myList = Sort List( myList );
myList[1];
);
result = Col Min Str( "Name", "Sex" );
)
);
I think this provides what you want
names default to here(1);
dt=open("$SAMPLE_DATA/big class.jmp");
dt << New Column( "Max Name by Sex",
Character,
Nominal,
Formula(
Col Max Str = Function( {data, by},
dt = Current Data Table();
testBy = Column( dt, by )[Row()];
Eval(
Parse(
"rr = dt << get rows where( :" || by || " == \!"" || testBy ||
"\!" );"
)
);
myList = As List( Column( dt, data )[rr] );
myList = Sort List( myList );
myList[N Items( myList )];
);
result = Col Max Str( "Name", "Sex" );
)
);
dt << New Column( "Min Name by Sex",
Character,
"Nominal",
Formula(
Col Min Str = Function( {data, by},
dt = Current Data Table();
testBy = Column( dt, by )[Row()];
Eval(
Parse(
"rr = dt << get rows where( :" || by || " == \!"" || testBy ||
"\!" );"
)
);
myList = As List( Column( dt, data )[rr] );
myList = Sort List( myList );
myList[1];
);
result = Col Min Str( "Name", "Sex" );
)
);
Jim,
Thanks, this seems to work just right. It does solve my issue.
But can you say a bit about how it works? It looks interesting, and I'm not familiar with the style, in particular what the Eval ( Parse … ) clause is doing.
Cheers, John
The use of Eval(Parse()) methodology, just simply allows one to build a JSL statement(s) into a literal string, and then execute them. I use this on occasion when JMP does not interpret a statement properly. I initially tried the statement in question in this form
rr = dt << get rows where( Column( dt, by ) == testBy );
JMP did not like this.....or rather, JMP did not interpret this as I thought it would. The statement did work if it was in an explicit form
rr = dt << get rows where( :sex == "F" );
Therefore, by generating that specific line as a literal string and using the Eval(Parse()) combined function, it generates the exact explicit form of the code I want and then executes it
Eval(Parse(
"rr = dt << get rows where( :" || by || " == \!"" || testBy ||"\!" );"
)
);
@john_madden, your reply got me started about my use of the Eval(Parse()) method. It is a powerful but confusing methodology. So I took a look at the code once again, and by using an As Column() function the whole Eval(Parse()) thing can be removed
names default to here(1);
dt=open("$SAMPLE_DATA/big class.jmp");
dt << New Column( "Max Name by Sex",
Character,
Nominal,
Formula(
Col Max Str = Function( {data, by},
dt = Current Data Table();
testBy = Column( dt, by )[Row()];
rr = dt << get rows where( as column(dt, by) == testBy );
myList = As List( Column( dt, data )[rr] );
myList = Sort List( myList );
myList[N Items( myList )];
);
result = Col Max Str( "Name", "Sex" );
)
);
The Col Max Str() function is generic. It works on whatever character data is passed to it. Below is it's usage on your sample data table
names default to here(1);
dt=New Table( "Untitled 2",
Add Rows( 8 ),
New Column( "Try",
Character,
"Nominal",
Set Values( {"A", "A", "A", "B", "B", "C", "C", "C"} )
),
New Column( "Why",
Character,
"Nominal",
Set Values( {"1A", "2C", "2D", "3B", "2F", "1S", "1W", "1R"} )
)
);
dt << New Column( "Max Why by Try",
Character,
Nominal,
Formula(
Col Max Str = Function( {data, by},
dt = Current Data Table();
testBy = Column( dt, by )[Row()];
rr = dt << get rows where( as column(dt, by) == testBy );
myList = As List( Column( dt, data )[rr] );
myList = Sort List( myList );
myList[N Items( myList )];
);
result = Col Max Str( "Why", "try" );
)
);
Here's another solution that uses one of my favorite properties of math.
Names default to here(1);
dt = open("$SAMPLE_DATA\Big Class.jmp");
l = :name << Get Values;
r = rank(l);
rr = rank(r);
dt << new Column("Order", <<Set Values(rr));
dt << new Column("ColMin", Formula(ColMin(:Order, :age)));
dt << new Column("ColMax", Formula(ColMax(:Order, :age)));
dt << New Column("Col Character Min", character, <<Set Each Value(l[r][:ColMin]));
dt << New Column("Col Character Max", character, <<Set Each Value(l[r][:ColMax]));
The way it works is all based on rank.
Rank of a list will give the order required to subset the original list to get an ordered list.
l = {"B", "D", "C", "A"};
r = rank(l);
// r = [4, 1, 3, 2]
// l[r] = {"A", "B", "C", "D"}
The cool think is if you tank a rank() of a rank() it will give you the order required to bring it back to the original list when used to subscript the sorted list.
rr = rank(r);
// rr = [2, 4, 3, 1]
// l[r][rr] = {"B", "D", "C", "A"}
Hope this works.
Both repliers- thanks - I'll validate this solution and reply you
The Question is if it can work on ~70k rows and how long?
BTW, I'm using very, very strong (RAM, SSD) machine on AWS instance ...