cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
jthi
Super User

Creating statistical custom (formula) function which has byVar

Related to this wish list item I created: Add normalization and robust statistical functions (and matrix functions) . I thought I could just try to write these as custom functions for now (even though I know it will be impossible to share them and have them updated for everyone), but I faced my first problem almost immediately: how to reference columns... When using Custom Formulas am I stuck using row-level calculations or referencing columns as strings instead of :colname which would prevent me from using them directly from Formula Editor.

 

Names Default To Here(1);
dt = Open("$SAMPLE_DATA/Big Class.jmp");

my_min = New Custom Function(
	"custom",
	"My Min1",
	Function({x, y},
		As Constant(
			vals = Column(x) << get values;
			groups = Column(y) << get values;
		);
		cur_group = Loc(groups, As Column(y));
		vals[cur_group[Loc Min(vals[cur_group])]];
	),
	<< Formula Category("Statistical")
);
Add Custom Functions(my_min);

my_min = New Custom Function(
	"custom",
	"My Min2",
	Function({x, y},
		Col Minimum(Column(x), Column(y));
	),
	<< Formula Category("Statistical")
);
Add Custom Functions(my_min);

my_min = New Custom Function(
	"custom",
	"My Min3",
	Function({x, y},
		Col Minimum(x, y);
	),
	<< Formula Category("Statistical")
);
Add Custom Functions(my_min);

dt << New Column("ColMin", Numeric, Continuous, Formula(Col Minimum(:height, :sex)));
dt << New Column("MyMin1_str", Numeric, Continuous, Formula(custom:My Min1("height", "sex")));
dt << New Column("MyMin2_str", Numeric, Continuous, Formula(custom:My Min2("height", "sex")));
dt << New Column("MyMin3_str", Numeric, Continuous, Formula(custom:My Min3("height", "sex")));
dt << New Column("MyMin1_ref", Numeric, Continuous, Formula(custom:My Min1(:height, :sex)));
dt << New Column("MyMin2_ref", Numeric, Continuous, Formula(custom:My Min2(:height, :sex)));
dt << New Column("MyMin3_ref", Numeric, Continuous, Formula(custom:My Min3(:height, :sex)));
-Jarmo
5 REPLIES 5
Jasean
Staff

Re: Creating statistical custom (formula) function which has byVar

Does this do what you want?  It looks like you need to explicitly evaluate the function parameters if you expect them to be column references.

my_min = New Custom Function(
	"custom",
	"My Min4",
	Function({x, y},
		EvalExpr(Col Minimum(Expr(x), Expr(y)));
	),
	<< Formula Category("Statistical")
);
Add Custom Functions(my_min);
jthi
Super User

Re: Creating statistical custom (formula) function which has byVar

That doesn't seem to provide correct answer as it is most likely evaluating them "row by row" and not as columns

jthi_0-1655279502111.png

 

-Jarmo
Jasean
Staff

Re: Creating statistical custom (formula) function which has byVar

Good point!  I was so focused on getting the column references to evaluate, I didn't even notice that the values were nonsensical.  I'll think about it more.

ih
Super User (Alumni) ih
Super User (Alumni)

Re: Creating statistical custom (formula) function which has byVar

Two comments:

  1. I can't seem to find the reference to it, but I do remember being definitively told at one point that there is no way to write a custom column formula that caches it's value between rows the way internal formulas do.  Thus you would need to write a function that evaluates in the context of each row, meaning it returns a single value at a time.  (I would love for someone to correct me here if that is possible.) 
  2. JMP evaluates the column reference before passing it to the function (eager evaluation) which means you get the value at that row instead of the whole column.  You need to tell JMP to pass a reference to the column, and you can do that with Expr.

I believe this fixes both issues, but I'm not sure it is very intuitive or user friendly.

Update: the custom function can access the row context, so you can skip needing to pass the value for the current row into the function, as in the Test2. 

Names Default To Here(1);
dt = Open("$SAMPLE_DATA/Big Class.jmp");

my_min = New Custom Function(
	"custom",
	"Test1",
	Function({x,y,yval},
		show(yval);
		show(Name Expr(x)<< get values);
		show(Loc(Name Expr(y) << Get Values, yval));
		Min((Name Expr(x)<< get values)[Loc(Name Expr(y) << Get Values, yval)]);
	),
	<< Formula Category("Statistical")
);
Add Custom Functions(my_min);

custom:Test1(Expr(:age),Expr(:sex),"F");

dt << New Column("Test1", Numeric, Continuous, Formula(custom:Test1(Expr(:height), Expr(:sex), :sex)));

my_min = New Custom Function(
	"custom",
	"Test2",
	Function({x,y},
		xvals = Name Expr(x)<< get values;
		yvals = Name Expr(y) << Get Values;
		Min(xvals[Loc(yvals, yvals[row()])]);
	),
	<< Formula Category("Statistical")
);
Add Custom Functions(my_min);

dt << New Column("Test2", Numeric, Continuous, Formula(custom:Test2(Expr(:height), Expr(:sex))));
ih
Super User (Alumni) ih
Super User (Alumni)

Re: Creating statistical custom (formula) function which has byVar

Wish list item that I believe would make this a lot easier:

Option or function to evaluate a custom column formula at once, or cache values between rows