Subscribe Bookmark RSS Feed

For Loop does not evaluate formula correctly.

datadisk

Community Trekker

Joined:

Nov 7, 2013

I am trying to create new columns in a data table, each new column is the Z-score of an existing column.  Here is what I have

Colnames_1st = dt_zscore3 << Get Column Names();

nlist = N Items(Colnames_1st);

for (i=5, i <=10, i++,

    var1 = Column(Colnames_1st);

    mean = Col Mean(var1);

    stdev = Col Std Dev(var1);

    namevar = char(Colnames_1st);

   

   

    New Column("Zscore " || namevar,

    Numeric, Continuous,

    Format("Best", 12),

    Formula((((var1[]) - mean )/stdev )));

);

The code creates the 6 new columns and names them correctly.

But when it goes to fill in the values in all the columns, it does not do it correctly.

All the columns end up with the same value in each row.  It looks to me like it

is creating the columns, but then waits til the for loop is over to calculate the

values that go in each row in all the columns, so it uses the last value of all

the variables for all the columns, instead of calculating the formula for each

column using the values that should be used by that column.

I am really stuck.  I have tried all sorts of things but can't get over this hurdle.

Any help would be much appreciated.

2 REPLIES
datadisk

Community Trekker

Joined:

Nov 7, 2013

I think I finally found a solution.

I have a feeling that my problem lies somehow in defining variables as local vs global, but no matter what I tried I could not figure out how to fix it that way.

Instead I added the line

    dt_zscore3 << Run Formulas

  

and now it does it correctly.

Colnames_1st = dt_zscore3 << Get Column Names();

nlist = N Items(Colnames_1st);

for (i=5, i <=nlist, i++,

    Names Default To Here(1);

    var1 = Column(Colnames_1st);

  

    //var1 = Column(Colnames_1st);

    mean = Col Mean(var1);

    stdev = Col Std Dev(var1);

    namevar = char(Colnames_1st);

  

    New Column("Zscore " || namevar,

    Numeric, Continuous,

    Format("Best", 12),

    Formula((((var1[]) - mean )/stdev )));

    dt_zscore3 << Run Formulas

  

);

This is possibly not elegant or the most correct way to do it, but I post it in case anyone finds it useful.

ms

Super User

Joined:

Jun 23, 2011

I think there may be a problem with your solution. It may work when the columns are created, but not in another session or if data is changed or added in the source columns.

If you look at the formulas in the new columns I suspect the will all look like "(var1[] - mean) / stdev". These variable names only make sense in the current script context (or even within a specific for-loop iteration. If the variables are redefined or cleared the formulas will not work properly.

Formulas should be entered literally because the variables in the Formula() argument are apparently not evaluated but taken literally. If variables have to be used, e.g. in for loop, this can be achieved by manipulating strings or expressions.

Example using evalexpr() to evaluate expresions within an expressions before evaluating it:

dt = Open( "$SAMPLE_DATA/Big Class.jmp" );

n = N Col( dt );

For( i = 4, i <= n, i++,

  var1 = Column( i );

  Eval(

  Eval Expr(

  dt << New Column( "Zscore " || (var1 << get name),

  Numeric,

  Continuous,

  Formula( Col Standardize( Expr( var1 ) ) )

  )

  )

  );

);