Discussions

Elise_B · Jul 1, 2021 07:37 AM

Bonjour,

J'ai une table de donnée de très grande taille (environ 6 million de lignes). Elle a notamment comme colonnes une colonne Date, une colonne Référence Produit et une colonne Résultat (qui a deux valeurs possibles : OK soit NOK). Chaque ligne correspond à un test de composant.

Je voudrais pouvoir laisser l'utilisateur sélectionner une période ou une date. A partir de cette période je voudrais pouvoir trouver et extraire les 5 références produit qui ont le plus de Résultats == NOK en pourcentage par rapport au nombre de tests effectués sur la période sélectionnée.

Je ne sais pas comment scripter cela de manière optimale.

Merci d'avance si vous avez des idées à me proposer

Elise B.

brady_brady · Jul 2, 2021 1:31 PM

Hi, There are many ways to do this. One particularly nice way involves JMP Query Builder with a post-query script, which will prompt the user for the begin and end dates before proceeding with the analysis. I highly recommend learning about the JMP Query Builder for work like this. It is a bit too much to type here, unfortunately.

I'll give a simple alternative that still works well.

First, I always recommend storing 1/0 data numerically, as 1s and 0s, using value labels to show things like "OK" and "NOK". In your case, we could use 1 => "OK" and 0 -> "NOK". (See attached data table.)

I recommend this because summarizing a mean by groups immediately provides the proportion of "OK" in each group. If the 1s and 0s had been reversed, the mean gives the proportion of "NOK".

So, with that change made, we do one more thing: we add table variables. Right-click on the red-triangle near the table name, and select New Table Variable. This shows a dialog where you can name a new table variable and give it an initial value.

Doing this for "Begin" and "End" allows us to define two values (strings, in this case, but numbers work, too) that can be used just like columns in formulas and scripts. To change a value later, just double-click on it to edit.

Using the same red triangle menu, we can add a new script to the data table. I've added a script called "Worst 5" and pasted the text of the script below.

Names Default To Here(1);

dt = Current Data Table();

//change table variable strings to times. the format used for the string must be one that JMP recognizes.
beg = informat(dt:Begin);
end = informat(dt:End);

//subset the table to get only those rows in the time interval of interest
dtSub = dt << subset(selected Columns (0), rows(dt << get rows where (beg < :date < end)));

//summarize this new table to find averages of the OK column, grouping by product. This will be the proportion of "good"
dtSubSum = dtSub << summary(group(:product), mean(:OK));

//sort this new table, ascending
dtSubSum << sort(replace table, by(:"Mean(OK)"n), order(ascending));

//the first 5 rows now have the lowest proportion of "good", hence highest proportion of "bad"
dtSubSum << subset(selected Columns(0), rows(1::5));

//close the first subset table, and the summary table linked to it.
close(dtSub, nosave);

Now it is easy to get the report you want:

1) Change the Begin and End values as desired.

2) Press the green button on the "Worst 5" script to get the results.

Again, there are slicker ways to do this, but they require knowledge of either JMP Query Builder, or interactive window (Display Tree) scripting. I highly recommend learning both of these, but in the meantime there is much that can be done with table variables and basic table operations.

Another option, involving no scripting:

1) Use the Tabulate platform to compute the means, also using a local data filter to limit the date range considered.

2) Right-click on the finished Tabulate report table, select Make into Data Table, then sort the resulting data table so the 5 "worst" rows (i.e., the 5 lowest OK averages) are at the top of the table.

3) If desired, select these 5 rows and subset.

Hopefully one of these ideas will get you started.

Cheers,

Brady

View solution in original post

brady_brady · Jul 2, 2021 1:31 PM

Hi, There are many ways to do this. One particularly nice way involves JMP Query Builder with a post-query script, which will prompt the user for the begin and end dates before proceeding with the analysis. I highly recommend learning about the JMP Query Builder for work like this. It is a bit too much to type here, unfortunately.

I'll give a simple alternative that still works well.

First, I always recommend storing 1/0 data numerically, as 1s and 0s, using value labels to show things like "OK" and "NOK". In your case, we could use 1 => "OK" and 0 -> "NOK". (See attached data table.)

I recommend this because summarizing a mean by groups immediately provides the proportion of "OK" in each group. If the 1s and 0s had been reversed, the mean gives the proportion of "NOK".

So, with that change made, we do one more thing: we add table variables. Right-click on the red-triangle near the table name, and select New Table Variable. This shows a dialog where you can name a new table variable and give it an initial value.

Doing this for "Begin" and "End" allows us to define two values (strings, in this case, but numbers work, too) that can be used just like columns in formulas and scripts. To change a value later, just double-click on it to edit.

Using the same red triangle menu, we can add a new script to the data table. I've added a script called "Worst 5" and pasted the text of the script below.

Names Default To Here(1);

dt = Current Data Table();

//change table variable strings to times. the format used for the string must be one that JMP recognizes.
beg = informat(dt:Begin);
end = informat(dt:End);

//subset the table to get only those rows in the time interval of interest
dtSub = dt << subset(selected Columns (0), rows(dt << get rows where (beg < :date < end)));

//summarize this new table to find averages of the OK column, grouping by product. This will be the proportion of "good"
dtSubSum = dtSub << summary(group(:product), mean(:OK));

//sort this new table, ascending
dtSubSum << sort(replace table, by(:"Mean(OK)"n), order(ascending));

//the first 5 rows now have the lowest proportion of "good", hence highest proportion of "bad"
dtSubSum << subset(selected Columns(0), rows(1::5));

//close the first subset table, and the summary table linked to it.
close(dtSub, nosave);

Now it is easy to get the report you want:

1) Change the Begin and End values as desired.

2) Press the green button on the "Worst 5" script to get the results.

Again, there are slicker ways to do this, but they require knowledge of either JMP Query Builder, or interactive window (Display Tree) scripting. I highly recommend learning both of these, but in the meantime there is much that can be done with table variables and basic table operations.

Another option, involving no scripting:

1) Use the Tabulate platform to compute the means, also using a local data filter to limit the date range considered.

2) Right-click on the finished Tabulate report table, select Make into Data Table, then sort the resulting data table so the 5 "worst" rows (i.e., the 5 lowest OK averages) are at the top of the table.

3) If desired, select these 5 rows and subset.

Hopefully one of these ideas will get you started.

Cheers,

Brady

Elise_B · Jul 5, 2021 03:20 AM

Merci beaucoup pour votre réponse ! Cela va beaucoup m'aider pour commencer !!

Elise B.

Discussions

Classer les valeurs uniques d'une colonne selon un calcul de pourcentage

Re: Classer les valeurs uniques d'une colonne selon un calcul de pourcentage

Re: Classer les valeurs uniques d'une colonne selon un calcul de pourcentage

Re: Classer les valeurs uniques d'une colonne selon un calcul de pourcentage

Recommended Articles