cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar
bernie426
Level II

Update Column at Specified Rows

Hello,

I try to use the "For each row" and "Begin Data Update" script commands to update the calculation on specific row related to a column.

According to the table below, I intend to have two addition columns, Defective Rate (%) and Yield, and they both use the same equation of cell qty/daily total.

However, Defective Rate (%) column will only calculate the rows where Category = Reject & MRB, and Yield column will only calculate the rows where Category=Binned.

I can get this work out using IF command like

dt=Current data table();

dt<< New Column("Defective Rate(%)", Numeric, "Continuous", Formula(Name Expr(If(:Category == "Reject", "MRB" ,:Cell Qty/:Daily Total))));

However, I try to learn how to use the For each row combined with Begin Data Update to reach the same result.

dt<< New Column("Yield", Numeric, "Continuous");

       << Begin data update;

       For each Row( :Yield_ = Selected(Row State())=Contains(:Category,"Binned"),  :Cell Qty/:Daily Total)

Can anyone help out?

Thanks,

Category

WorkWeekDate_PROD.Cell QtyDaily TotalDefective Rate(%)Yield
Reject252016-06-2075115553
MRB252016-06-2021155530.00135022
Binned252016-06-2014781155530.950363270.95036327
Reject252016-06-21125131586
MRB252016-06-2183315860.00262775
Binned252016-06-2130252315860.95776610.9577661
Reject252016-06-22127627964
MRB252016-06-2242279640.00150193
Binned252016-06-2226646279640.952867970.95286797
Reject252016-06-231261786
MRB252016-06-23517860.00279955
Binned252016-06-23165517860.926651740.92665174
1 ACCEPTED SOLUTION

Accepted Solutions
ValarieSimmons
Level III

Re: Update Column  at Specified Rows

Hi bernie426,

According to my personal experience, IF command complete its operations really quickly and will return you with instant results because they only deal with a yes or no situation. On the other hand, you will see some noticable delay before getting your results fully generated through looping since looping commands have to deal with iterations and they will keep operating until they reach the last iteration defined by the programmer.

These statements only apply if you are dealing with super huge data set that involves significantly large amount of rows and columns, which makes IF command a more superior option than looping commands in terms of time consumption. However for smaller sets of data, it is safe to use either one of the commands since the time delay for results to be populated is insignificant.

Best Regards,

Val

View solution in original post

3 REPLIES 3
ValarieSimmons
Level III

Re: Update Column at Specified Rows

Hi,

Based on the expected output in your question, I think there is a slight mistake in your formula for "Defective Rate(%)" column if you only wanted "Rejected" rows from "Category" column to be ignored.


I never tried using For each row or Begin data update before, so I am not sure how to fix your current codes using those commands. However I have 2 alternative approaches on how you can achieve the same output.

1) Using IF command (faster approach):

dt=Current data table();

a = (dt<< New Column("DR", "Continuous", Formula(If(:Category == "Reject", "", Char(:Cell Qty / :Daily Total)))))<<get values;

dt << New Column("Defective Rate(%)", "Continuous", set values(a));

b = (dt<< New Column("Y", "Continuous", Formula(If(:Category == "Binned", Char(:Cell Qty / :Daily Total), ""))))<<get values;

dt << New Column("Yield", "Continuous", set values(b));

dt<<delete columns({"DR","Y"});

Since some columns are meant to be returned as empty columns (""), the created columns will automatically set themselves to character columns even though you have already defined them as numeric when you create the columns. Hence I've created extra dummy columns to serve the calculation purposes and set these values into new columns ("Defective Rate(%)"  and " Yield") that you intend to create. In this way, you will be getting numeric columns.

2) Using Get rows where() and FOR loops:

dt=Current data table();

dt << New Column("Defective Rate(%)", "Continuous", numeric);

r1 = dt << get rows where(:Category!="Reject");

for (i=1, i<=nrows(r1), i++,

  Column(dt, "Defective Rate(%)")[r1] = :Cell Qty[r1] / :Daily Total[r1]

);

dt << New Column("Yield", "Continuous", numeric);

r2 = dt << get rows where(:Category=="Binned");

for (i=1, i<=nrows(r2), i++,

  Column(dt, "Yield")[r2] = :Cell Qty[r2] / :Daily Total[r2]

);

I believe this approach will be slower if your data set is huge since looping takes longer processing time.

Final Output:

11997_pastedImage_21.png

Hope this helps!

Best Regards,

Val

bernie426
Level II

Re: Update Column  at Specified Rows

Hi Val,

Thanks for the suggested solutions, they both work out well.

Actually, I have came up with another formula approach using IF command like

dt=Current data table();

dt<< New Column("Defective Rate(%)", Numeric, "Continuous", Formula(Name Expr(If((contains(:Category, "MRB")| contains(:Category, "Reject")),:Cell Qty/:Daily Total))));

dt<< New Column("Yield", Numeric, "Continuous", Formula(Name Expr(If(:Category == "Binned",:Cell Qty/:Daily Total))));

Do you know what is the speed difference b/w using IF command and looping command?

Many Thanks,

ValarieSimmons
Level III

Re: Update Column  at Specified Rows

Hi bernie426,

According to my personal experience, IF command complete its operations really quickly and will return you with instant results because they only deal with a yes or no situation. On the other hand, you will see some noticable delay before getting your results fully generated through looping since looping commands have to deal with iterations and they will keep operating until they reach the last iteration defined by the programmer.

These statements only apply if you are dealing with super huge data set that involves significantly large amount of rows and columns, which makes IF command a more superior option than looping commands in terms of time consumption. However for smaller sets of data, it is safe to use either one of the commands since the time delay for results to be populated is insignificant.

Best Regards,

Val