Subscribe Bookmark RSS Feed

Non-linear regression with parameters expanded by categories

binoyjosep

Community Trekker

Joined:

May 23, 2016

Hi

    I have a data set which I need to fit using a non-linear model, with four fit parameters, A, B, C and D. The data set (Y, column) needs to be separated using 'Categories' which is a column. So, Y for each unique 'Category' is a unique set of data.

I need to fit all the data and find the parameters A, B, C, D, with the condition that I need a unique C and D for all data sets across all 'Categories' while A, B may vary across 'Categories'.

   For this, I create column for the model and create parameters C, D and for A and B I create parameters with 'Expand into categories, selecting column' checked and I select 'Categories' for the column and I do a non-linear regression fit. I get the optimum C and D across all Categories and different A and B across Categories.

My question is this: How does it find a UNIQUE C and D across all Categories? Does it fit data set corresponding to each data set and find C1, D1, C2, D2 etc for each set and later take the median/mean or something else?

Or does it find the range of C and D values and create fine points within C and D and fit again all data set and go finer and finer till the error is within limits?

I need to know the method, as I want to be sure that I am doing what I need to do.

Thanks

Binoy

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

I used three principles to solve this problem. It looks easy now but there were some messy false starts before I got it right. I reverse engineered the model formula, I divided the problem into smaller problems, and I built the solution from the inside out. It assumes that three data columns are present with given names. You would have to add a custom dialog if the column names would change between runs.But it does accommodate any number of categories. I don't know if Nonlinear will be able to solve the model when you have large numbers of categories.Also, you might want to prepare a list of starting values and use my example to use unique ones.

I don't have your data table but my mock up of what I think you have (Category, rRelax, tStress columns) worked.

/* assume that these columns exist or make custom dialog to assign them to script variables:

       Category

       tRelax

       tStress

*/

// find all categories

category# = :Category << Get Values;

// build formula from inside out with expressions

match expr p# = match expr r0# = Expr( Match( :Category ) );

parameter list# = List( B=2, beta=0 );

// iterate over all of the categories

For( c = 1, c <= N Items( category# ), c++,

       // make category-specific R0 parameter

       Eval(

              Parse(

                     Substitute(

                           "Insert Into( parameter list#, Expr( Name( \!"R0_ccc\!" ) = 2 ) )",

                           "ccc",

                           category#[c]

                     )

              )

       );

       Insert Into( match expr r0#, category#[c] );

       Eval(

              Parse(

                     Substitute(

                           "Insert Into( match expr r0#, Expr( Name( \!"R0_ccc\!" ) ) )",

                           "ccc",

                           category#[c]

                     )

              )

       );

       // make category-specific P parameter

       Eval(

              Parse(

                     Substitute(

                           "Insert Into( parameter list#, Expr( Name( \!"P_ccc\!" ) = 2 ) )",

                           "ccc",

                           category#[c]

                     )

              )

       );

       Insert Into( match expr p#, category#[c] );

       Eval(

              Parse(

                     Substitute(

                           "Insert Into( match expr p#, Expr( Name( \!"P_ccc\!" ) ) )",

                           "ccc",

                           category#[c]

                     )

              )

       );

);

// substitute into general model formula

formula expr# = Substitute(

       Expr( Parameter( lll, rrr / (1 + B*(:tRelax / :tStress)^beta + ppp) ) ),

       Expr( lll ),

       parameter list#,

       Expr( rrr ),

       Name Expr( match expr r0# ),

       Expr( ppp ),

       Name Expr( match expr p# )

);

// set Model column formula

Eval(

       Substitute(

              Expr( Current Data Table() << New Column( "Model", Formula( fff ) ) ),

              Expr( fff ),

              Name Expr( formula expr# )

       )

);

Learn it once, use it forever!
10 REPLIES
M_Anderson

Staff

Joined:

Nov 21, 2014

It sounds like you want to use a grouping variable in the non-linear platform.  Put your X and Y variables into the appropriate fields of the platform and then put the "categories" column into the "group" field.  When you fit models after that it will calculate the result for each group.  So, if you had 4 categories, you would get 4 models with 4 sets fo coefficients.

[Missing image]

binoyjosep

Community Trekker

Joined:

May 23, 2016

Hi,

  Thanks for the reply.

To be more clear. I know a method to do what I want. I want to know how is it being done.

In the example shown below B and beta correspond to C and D asked in the question, while P is a parameter like A or B that can vary across categories.

Below is a screenshot of the fitting thats being done, with B and beta as parameters that remain CONSTANT for all categories while P is a parameter that can vary across categories.

The question is how does it get the best B and beta over all categories? Does it initially find different B and beta for each category and do some statistics to get the best B and beta which is good enough for all categories? If so, whats the statistic used.

Thanks

Binoy

12092_pastedImage_0.png

billw_jmp

Staff

Joined:

Jul 2, 2014

Binoy,

A good place to get a sense for what your model fit is and the statistics behind the fit(s) is in the Help section under Books > Specialized Models.  Nonlinear fits start on page 105.  Michael Anderson's suggestion for grouping your fits will also show how the individual categories are fit and you will also see a graph of the combined fits.  An example from the help is shown below.  You can use your own parameter set as well and this is shown in Chapter 7.  The statistics section starts on page 152 of chapter 7.

12093_pastedImage_0.png

binoyjosep

Community Trekker

Joined:

May 23, 2016

Hi,

  Thanks for the reply. I did go through the manual. Still could not get what I was looking for.

I am now trying to script up this non-linear fitting and facing some issues. Need some help. Ill reiterate the problem here:

I have data which needs to be fit by a model which has 4 fit parameters. The fits for each category are NOT INDEPENDENT of each other. 2 of the fit parameters, B and beta as shown in above post need to be unique, i.e., the regression fit extracts a unique B and beta for all categories. While the other two parameters P and R0 can vary across categories.

The JMP interface looks something like shown below:

12317_pastedImage_0.png

Now, I need to code this up in JSL.

I know we can do the below to get all four unique parameters:

dt << New Column("Model", Numeric, Continuous, Formula( Parameter( {B = 5, beta = 0.25, P = 10, R0 = 10}, R0 / (1 + B * (:Name( "tRelax(s)" ) / :Name( "tStress(s)" )) ^ beta) + P )));

However this is not what I want. And I don't want to using "grouping" feature as it would only help if the extractions were independent.

The code generated while I copy the formula is shown below:

Parameter(

  {B = 2, beta = 0,

  Name( "R0_Category_D03_01_-1.8_100" ) = 2,

  Name( "R0_Category_D03_01_-1.8_1000" ) = 3,

  Name( "R0_Category_D03_01_-1.8_10000" ) = 3,

  Name( "R0_Category_D03_01_-1.8_158" ) = 3,

  Name( "R0_Category_D03_01_-1.8_1580" ) = 3,

  Name( "P_Category_D03_01_-1.8_100" ) = 0,

  Name( "P_Category_D03_01_-1.8_1000" ) = 1,

  Name( "P_Category_D03_01_-1.8_10000" ) = 1,

  Name( "P_Category_D03_01_-1.8_158" ) = 0,

  Name( "P_Category_D03_01_-1.8_251" ) = 0},

  Exp( -B * (:Name( "tR" ) / :Name( "tS" )) ^ beta ) *

  Match( :Category,

  "D03_01_-1.8_100", Name( "R0_Category_D03_01_-1.8_100" ),

  "D03_01_-1.8_1000", Name( "R0_Category_D03_01_-1.8_1000" ),

  "D03_01_-1.8_10000", Name( "R0_Category_D03_01_-1.8_10000" ),

  "D03_01_-1.8_158", Name( "R0_Category_D03_01_-1.8_158" ),

  "D10_05_-2.6_6310", Name( "R0_Category_D10_05_-2.6_6310" )

  ) + Match( :Category,

  "D03_01_-1.8_100", Name( "P_Category_D03_01_-1.8_100" ),

  "D03_01_-1.8_1000", Name( "P_Category_D03_01_-1.8_1000" ),

  "D03_01_-1.8_10000", Name( "P_Category_D03_01_-1.8_10000" ),

  "D03_01_-1.8_158", Name( "P_Category_D03_01_-1.8_158" ),

  "D10_05_-2.6_6310", Name( "P_Category_D10_05_-2.6_6310" )

  )

)

I need to automate this and so cannot use this as is.

I can create arrays for P_XXX and R0_XXX from the column 'Category'.

I was thinking on the lines of creating associated arrays for each of these P and R0 and use them in formula, but did not work out.

Ofcourse after I create this formula, I need to run the non-linear regression fit on the column and data.

Can someone please help me code this?

Thanks

Binoy

Solution

I used three principles to solve this problem. It looks easy now but there were some messy false starts before I got it right. I reverse engineered the model formula, I divided the problem into smaller problems, and I built the solution from the inside out. It assumes that three data columns are present with given names. You would have to add a custom dialog if the column names would change between runs.But it does accommodate any number of categories. I don't know if Nonlinear will be able to solve the model when you have large numbers of categories.Also, you might want to prepare a list of starting values and use my example to use unique ones.

I don't have your data table but my mock up of what I think you have (Category, rRelax, tStress columns) worked.

/* assume that these columns exist or make custom dialog to assign them to script variables:

       Category

       tRelax

       tStress

*/

// find all categories

category# = :Category << Get Values;

// build formula from inside out with expressions

match expr p# = match expr r0# = Expr( Match( :Category ) );

parameter list# = List( B=2, beta=0 );

// iterate over all of the categories

For( c = 1, c <= N Items( category# ), c++,

       // make category-specific R0 parameter

       Eval(

              Parse(

                     Substitute(

                           "Insert Into( parameter list#, Expr( Name( \!"R0_ccc\!" ) = 2 ) )",

                           "ccc",

                           category#[c]

                     )

              )

       );

       Insert Into( match expr r0#, category#[c] );

       Eval(

              Parse(

                     Substitute(

                           "Insert Into( match expr r0#, Expr( Name( \!"R0_ccc\!" ) ) )",

                           "ccc",

                           category#[c]

                     )

              )

       );

       // make category-specific P parameter

       Eval(

              Parse(

                     Substitute(

                           "Insert Into( parameter list#, Expr( Name( \!"P_ccc\!" ) = 2 ) )",

                           "ccc",

                           category#[c]

                     )

              )

       );

       Insert Into( match expr p#, category#[c] );

       Eval(

              Parse(

                     Substitute(

                           "Insert Into( match expr p#, Expr( Name( \!"P_ccc\!" ) ) )",

                           "ccc",

                           category#[c]

                     )

              )

       );

);

// substitute into general model formula

formula expr# = Substitute(

       Expr( Parameter( lll, rrr / (1 + B*(:tRelax / :tStress)^beta + ppp) ) ),

       Expr( lll ),

       parameter list#,

       Expr( rrr ),

       Name Expr( match expr r0# ),

       Expr( ppp ),

       Name Expr( match expr p# )

);

// set Model column formula

Eval(

       Substitute(

              Expr( Current Data Table() << New Column( "Model", Formula( fff ) ) ),

              Expr( fff ),

              Name Expr( formula expr# )

       )

);

Learn it once, use it forever!
binoyjosep

Community Trekker

Joined:

May 23, 2016

Excellent!!! Worked like a charm. That's just about the answer for my earlier post (slightly different from the original post.)

Thank you for your time and script.

Now for the bonus question (actually my original question ):

In this non-linear regression fit we have 2 parameters B and beta which remain the same for all categories, while P and R0 vary across them. Now we are not specifying through any equation how B and beta vary across categories, so how does JMP find the best B and beta that works for all categories? What is the algorithm used?

Thanks

Binoy

binoyjosep

Community Trekker

Joined:

May 23, 2016

Hi,

  Is there any update on this question that I have asked?

Thanks

Binoy

markbailey

Staff

Joined:

Jun 23, 2011

I want to add to my previous reply after Jeff Perkinson pointed out an example that is similar to what you want to do. It is not clear to us if you really need a script to make the model formula that you need. Select Help > Sample Data Library, open the Nonlinear Examples folder, and finally open the Algae Mitscherlich data table. Open the Formula Editor​ with the Mitscherlich column. It should look like this:

12350_Capture.JPG

in this case, like your case, the 2 parameters alpha and beta are conditional on the Treatment.

I open the mock up of your data that I created and select Cols > New Column... I will call it Model and add a Formula column property. I am going to create the parameters before I use them to make the model formula. I click Table Columns in the upper left and select Parameters. I click New Parameter... and enter B for the name and give it a starting value of 2, then click OK. I create beta = 0 in the same way. The B and beta parameters are unconditional and their value is common to all categories.

The other two parameters are different. I click New Parameter, enter R0 for the name, 2 for the starting value, and select Expand into categories, selecting column. I click OK and then select the Category column. Now I have a unique parameter for R0 for each category. I do the same thing for the P parameter.

I build the formula now switching between the list of parameters and the list of table columns to make this formula:

12352_Capture.JPG

The editor automatically inserts the Match(...) expression when I click on the grouped parameter in the parameter list. So now there is one B, one beta, and one R0 and P for each category.

Using this approach with the first example Algae leads to this formula:

12351_Capture.JPG

This model produces exactly the same estimates with Nonlinear as the original parameterization.

Please understand that we might have misinterpreted your question. Scripts are powerful but not always necessary.

Learn it once, use it forever!
binoyjosep

Community Trekker

Joined:

May 23, 2016

Hi,

  I do know this method of doing it manually. I understand its pretty easy to do it manually. However, I need to automate my script as its part of a bigger system and hence necessary.

Thanks for your time and effort.

Binoy