Subscribe Bookmark RSS Feed

Extract prediction formula without having to save formula to data table

cschwarz

Community Trekker

Joined:

Dec 26, 2015

I'm writing a script to fit a set of generalized linear models to a data and then extracting the AIC etc to do model averaging. Because of the way that JMP codes categorical variables in the design matrix for generalized linear models (JMP uses a sum-to-zero coding rather than a corner point coding), it is difficult to model average the predictions using the estimated coefficients reported in the fit (one of the categories is never shown and I would need to do extensive programming to recreate it). It looks like it would be easiest to simply use the prediction formula from the platform and then apply the model weights to these predictions in one large formula.

However, as far as I can see, I need to save the prediction formula to the data table for EACH model, and then I can extract the formula and save for use elsewhere using something along the lines of

  modelfit << prediction formula;

  modelpredformula = char(Column(data, "P(PA) Formula") << Get formula());

where "P(PA) Formula" is the predicted formula. I can then save the model prediction formula (in character form) in my summary AIC table and eventually create a super formula which is a weighted sum of the individual formulae ....

However, this slows the script down considerably because the predictions are computed for the entire data table and I have to add and remove the column for each model fit.

So, is there a way to extract the actual prediction formula from a model fit WITHOUT having to first save the column to the data table and have JMP actually do the computations?

Thanks


Carl Schwarz

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

I'm not sure there is a way to do what you want.  The approach I take is to suppress formula evaluation prior to saving the prediction formulae:

dt << Suppress Formula Eval(1)

-Dave
10 REPLIES
Solution

I'm not sure there is a way to do what you want.  The approach I take is to suppress formula evaluation prior to saving the prediction formulae:

dt << Suppress Formula Eval(1)

-Dave
ian_jmp

Staff

Joined:

Jun 23, 2011

FWIW, I think Dave may be correct. There are '<< Get Estimates' and '<< Get Variance Components' messages, but they are not always available (depending on the nature of the fit). Might be worth using 'ShowProperties()' in your specific case to see.

cschwarz

Community Trekker

Joined:

Dec 26, 2015

Thanks Ian and Dave. I could not find the prediction formula anywhere in the ShowProperties tree, so I guess it isn't computed until requested by a save column action.

I tried my script with and without suppressing the formula evaluation. Without suppressing the formula, it took 45 seconds. With suppressing the formula it took 44 seconds. I guess I was wrong in assuming that formula evaluation took a large amount of time compared to the actual model fittings, extracting from the report, and saving information into a new (summary) data table.

It didn't occur to me to try and profile the script. I've just done this. Here are the results

11% of the time spent in fitting the models

13% of the time spent in creating the reports

47% of the time spent on Deleting the column that was used for each model for the predictions using: data << delete columns("P(PA) Formula");

23% of the time spent in closing the report window after I extract the information from it using:  modelreport << close window;


I found this surprising!


Carl

ian_jmp

Staff

Joined:

Jun 23, 2011

Regarding the timings, did you try making the report window 'invisible'. Just curious if this impacts your 23% figure above.

cschwarz

Community Trekker

Joined:

Dec 26, 2015

The reports from the fit never appear on the screen, but I never explicitly made the reports invisible. I see that there is an `invisible' option for launching fits....

When I specify the invisible option on the platform launch, it reduces the time from 45 seconds to 30 seconds.

The original data table is visible (into which the prediction formulae are successively saved), and the final summary data table (where the results from each model fit are successively added) are also visible. I tried making the original data table invisible and that really speeds things up (now down to 15 second) even with adding/removing the prediction columns.


The final results table was built up successively so I could see the rows being added successively. Also made that invisible and reduced the time again to 8 second!


So the bottle neck appears to be dealing with screen operations rather than underling CPU time.

Now the revised timings (on the final speeded up script) are

56% model fitting

13% generating model reports

12% closing the model report (even though it is invisible?)

When looking at my log, my line

    modelfit << prediction formula;

triggered a warning message about a missing boolean operator. If I switch it to

    modelfit << prediction formula(1);

the script still runs fine without a warning message. However, I can't seem to find any documentation in the JSL Syntax Reference or Scripting Guide on what a

    modelfit << prediction formula(0);

would do? I'll try writing a small script to see if the latter would also not evaluate the prediction formula.

I'm now trying to modify the script so the the "invisible" option can be set on the fly. I want to do something like

runsilent=1;

data << open("blah", if(run silent, "Invisible"));

but JMP always halts at that point... I always have problem in how to make messages dynamic when "hard wired" code words are needed. Back to the books....

Carl.

David_Burnham

Super User

Joined:

Jul 13, 2011

Invisible is a key word so it's not so easy to set as a parameter.  I usually just use IF statements:

My Distribution = Function({dt,col,doInvisible},{Default Local},

     If (doInvisible,

         dt << Distribution( Invisible, Column( Eval(col) ) )

     ,

         dt << Distribution( Column( Eval(col) ) )

     )

);

dt = Open("$SAMPLE_DATA/Big Class.jmp");

My Distribution(dt,"height",1);


If you want to set it as a parameter it you can build the command as a string then parse and execute it:

My Distribution = Function({dt,col,doInvisible},{Default Local},

   

    If (doInvisible,

        keyword = "Invisible"

    ,

        keyword = ""

  );

    Eval(Parse(Eval Insert("\[    

        dt << Distribution( ^keyword^, Column( "^col^" ) )

    ]\")));

   

);

dt = Open("$SAMPLE_DATA/Big Class.jmp");

My Distribution(dt,"height",1);

-Dave
cschwarz

Community Trekker

Joined:

Dec 26, 2015

Thanks for the suggestions... I also came up with

// The runsilent flag below is useful for debugging. Set it to 0 to make everything visible. 1 to run silently

runsilent = 0;


eval(parse("data = Open( \!"SPPI_ Data_JMP_Revised.jmp\!"" || if(runsilent,",\!"invisible\!""," ") || ");"));


Note I had to include the comma with the ",invisible" string to make an express that works if runsilent=0 to avoid having "empty" arguments in the call.


Carl.

Byron_JMP

Staff

Joined:

Apr 26, 2012

slightly different, and no parsing


runsilent = 1;

invis = Expr( invisible );  //this isn't a string or a list, and it isn't evaluated

dt = Expr( Open( "$SAMPLE_DATA/Big Class.jmp" ));

If( runsilent == 1, insert Into( dt, Name Expr( invis ) ));  //use Name Expr to return the expression's symbol without evaluating it.

cschwarz

Community Trekker

Joined:

Dec 26, 2015

Neat... but I don't really understand how the 'insert into' knows where to insert the "invis" expression into the "dt" expression. I guess that  Expr "parsea" the expression to make some sort of tree and then "insert into" simply add the "invis" argument as another leaf in the tree? 

I don't have a good enough understanding of how arguments are passed to functions in JMP. I come from an R back ground where it is either positional or argument=value and find JMPs use of things like Invisible in the Open() function a bit weird because it isn't a string). Any good references for this way of calling functions?

I haven't tried this yet, but I assume you need to evaluate the expression 'dt'  to open the actual data table.