Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Looking for a sample size calculator for defect proportions that can help me optimize the sample size with a minimum proportion to detect and the CIs for computed proportions.
Created:
Feb 25, 2021 04:31 PM
| Last Modified: Jun 10, 2023 1:43 PM(3324 views)
Use case:
35,000 defects detected by a machine on one wafer. An operator samples 1500 images and sorts them into 10 categories. The percentages of the defect categories are meant to reflect the defects in the population of 35,000. In this case 1500 images were used but what happens when the 5000 images are used?
What does that do to the confidence intervals on the proportions. As the sampling gets low, some of the low proportions will not be detectible. How does one determine the lowest detectible proportion for a given samples size.
Any ideas if there any calculators or tools that can help with this?
Re: Looking for a sample size calculator for defect proportions that can help me optimize the sample size with a minimum proportion to detect and the CIs for computed proportions.
You might be able to replicate this experiment within JMP. If so, and if you have JMP Pro, you could do a Monte Carlo simulation to evaluate different sample sizes and the rate of detection for each. This would be an alternative to more pure statistical methods which I'm sure exist.
Consider a 35,000 row table (one row per defect) with:
A column of random data representing the actual category for each defect, weighted with some categories more important than others to represent your actual data. This part requires a lot of attention and thought, you need this to represent the distribution(s) your categories might take. It might need to randomly choose from a variety of distributions.
A sample size column, where the first row indicates how many rows to sample
A column which 'selects' the prescribed number of rows
A column with the 'detected' categories for each of the sampled rows
A column to check whether the most common actual category matches the most common detected category (only use the first row of the column)
Use the distribution platform and include both the first column (actual defect category) and the last column (whether the correct defect category was identified). Under the frequencies for the correct category column, you should see a single value. Now for the JMP Pro magic: right click on a value in the table and select simulate. Swap out the Defect Actual Category column for itself. When it runs a new table will show up where each row represents a time your original table was recalculated, every time giving you a fresh set of defects to be analyzed. Record the fraction of 'yes' values, and then repeat with different sample sizes. Then you can chart the fraction of correct diagnoses against the sample size.
Remember, all of this depends on that first bullet, representing your actual results using a simulation.
Here is an example of what your simulation table might look like with a sample size of 20, assuming that one defect category is twice the size of all others:
Notice how clean the actual defect categories are here! I suspect this will not be so clear in your actual data, so you would need to use a different function in that column.
Here is the the simulation results table, here only using 100 rows (I recommend using 1000s):
You could record this in a table, and then repeat with other values to find your acceptable level of risk:
If you have the computer power, you could skip a step and make the first 35k rows of your table use one sample frequency, the next 35k rows use another frequency, and so on. Then you could simulate all values at once to populate the table in the last screenshot at once.
Re: Looking for a sample size calculator for defect proportions that can help me optimize the sample size with a minimum proportion to detect and the CIs for computed proportions.
To boil it a bit down, you can think about the defect category with the smallest proportion, for all others it is better. If all of ten are equal, this would be 0.1 (1 tenth). If we assume, that the proportion of the smallest category is e.g. 0.01, you could use the Sample Size and Power calculator (under DOE --> Design Diagnostics).
When you have 1500 samples and need a power of 0.9 (this is a reasonable value), you could detect a proportion of larger than 2 % in comparison to 1 %. If you calculate with 5000 samples, this will reduce to 1.5 %. Having smaller proportions, needs larger sample sizes.
Re: Looking for a sample size calculator for defect proportions that can help me optimize the sample size with a minimum proportion to detect and the CIs for computed proportions.
As you may be aware, there are more than just one way to calculate confidence intervals for proportions.
I believe the JMP Sample Size And Power calculators use a Normal approximation to the binomial. There exist exact methods and several other approximate methods as well. I think JMP uses Wilson score methods in their other platforms.
Attached, please find a reference from Agresti and Coull entitled "Approximate is Better than 'Exact' for Interval Estimation of Binomial Proportions" from The American Statistician in 1998. It basically makes the case that the guaranteed coverage of the intervals from the Exact method is overly conservative especially as you approach 0 or 1, and that some approximate methods are better than the Exact method and even other approximate methods.
Personally, I use the Wilson score method. But the Sample Size And Power calculator uses a different method, so it will be close but no cigar.
So, to answer your question, it's necessary for you to define which form of confidence intervals you desire. The simulation idea from @ih is a great one.
Re: Looking for a sample size calculator for defect proportions that can help me optimize the sample size with a minimum proportion to detect and the CIs for computed proportions.
...and I wrote about the Sample Size and Power calculators based on knowledge from previous versions of JMP. I now note that JMP 15.2.1 allows one to choose from two Exact methods.
'
var data = div.getElementsByClassName("video-js");
var script = document.createElement('script');
script.src = "https://players.brightcove.net/" + data_account + "/" + data_palyer + "_default/index.min.js";
for(var i=0;i< data.length;i++){
videodata.push(data[i]);
}
}
}
for(var i=0;i< videodata.length;i++){
document.getElementsByClassName('lia-vid-container')[i].innerHTML = videodata[i].outerHTML;
document.body.appendChild(script);
}
}
catch(e){
}
/* Re compile html */
$compile(rootElement.querySelectorAll('div.lia-message-body-content')[0])($scope);
}
if (code_l.toLowerCase() != newBody.getAttribute("slang").toLowerCase()) {
/* Adding Translation flag */
var tr_obj = $filter('filter')($scope.sourceLangList, function (obj_l) {
return obj_l.code.toLowerCase() === newBody.getAttribute("slang").toLowerCase()
});
if (tr_obj.length > 0) {
tr_text = "This post originally written in lilicon-trans-text has been computer translated for you. When you reply, it will also be translated back to lilicon-trans-text.".replace(/lilicon-trans-text/g, tr_obj[0].title);
try {
if ($scope.wootMessages[$rootScope.profLang] != undefined) {
tr_text = $scope.wootMessages[$rootScope.profLang].replace(/lilicon-trans-text/g, tr_obj[0].title);
}
} catch (e) {
}
} else {
//tr_text = "This message was translated for your convenience!";
tr_text = "This message was translated for your convenience!";
}
try {
if (!document.getElementById("tr-msz-" + value)) {
var tr_para = document.createElement("P");
tr_para.setAttribute("id", "tr-msz-" + value);
tr_para.setAttribute("class", "tr-msz");
tr_para.style.textAlign = 'justify';
var tr_fTag = document.createElement("IMG");
tr_fTag.setAttribute("class", "tFlag");
tr_fTag.setAttribute("src", "/html/assets/lingoTrFlag.PNG");
tr_fTag.style.marginRight = "5px";
tr_fTag.style.height = "14px";
tr_para.appendChild(tr_fTag);
var tr_textNode = document.createTextNode(tr_text);
tr_para.appendChild(tr_textNode);
/* Woot message only for multi source */
if(rootElement.querySelector(".lia-quilt-forum-message")){
rootElement.querySelector(".lia-quilt-forum-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-message-view-blog-topic-message")) {
rootElement.querySelector(".lia-message-view-blog-topic-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-blog-reply-message")){
rootElement.querySelector(".lia-quilt-blog-reply-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-tkb-message")){
rootElement.querySelector(".lia-quilt-tkb-message").appendChild(tr_para);
} else if(rootElement.querySelector(".lia-quilt-tkb-reply-message")){
rootElement.querySelector(".lia-quilt-tkb-reply-message").insertBefore(tr_para,rootElement.querySelector(".lia-quilt-row.lia-quilt-row-footer"));
} else if(rootElement.querySelector(".lia-quilt-idea-message")){
rootElement.querySelector(".lia-quilt-idea-message").appendChild(tr_para);
}else if(rootElement.querySelector(".lia-quilt-column-alley-left")){
rootElement.querySelector(".lia-quilt-column-alley-left").appendChild(tr_para);
}
else {
if (rootElement.querySelectorAll('div.lia-quilt-row-footer').length > 0) {
rootElement.querySelectorAll('div.lia-quilt-row-footer')[0].appendChild(tr_para);
} else {
rootElement.querySelectorAll('div.lia-quilt-column-message-footer')[0].appendChild(tr_para);
}
}
}
} catch (e) {
}
}
} else {
/* Do not display button for same language */
// syncList.remove(value);
var index = $scope.syncList.indexOf(value);
if (index > -1) {
$scope.syncList.splice(index, 1);
}
}
}
}
}
}
angular.forEach(mszList_l, function (value) {
if (document.querySelectorAll('div.lia-js-data-messageUid-' + value).length > 0) {
var rootElements = document.querySelectorAll('div.lia-js-data-messageUid-' + value);
}else if(document.querySelectorAll('.lia-occasion-message-view .lia-component-occasion-message-view').length >0){
var rootElements = document.querySelectorAll('.lia-occasion-message-view .lia-component-occasion-message-view')[0].querySelectorAll('.lia-occasion-description')[0];
}else {
var rootElements = document.querySelectorAll('div.message-uid-' + value);
}
angular.forEach(rootElements, function (rootElement) {
if (value == '362843' && "ForumTopicPage" == "TkbArticlePage") {
rootElement = document.querySelector('.lia-thread-topic');
}
/* V1.1 Remove from UI */
if (document.getElementById("tr-msz-" + value)) {
document.getElementById("tr-msz-" + value).remove();
}
if (document.getElementById("tr-sync-" + value)) {
document.getElementById("tr-sync-" + value).remove();
}
/* XPath expression for subject and Body */
var lingoRBExp = "//lingo-body[@id = " + "'lingo-body-" + value + "'" + "]";
lingoRSExp = "//lingo-sub[@id = " + "'lingo-sub-" + value + "'" + "]";
/* Get translated subject of the message */
lingoRSXML = doc.evaluate(lingoRSExp, doc, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
for (var i = 0; i < lingoRSXML.snapshotLength; i++) {
/* Replace Reply/Comment subject with transalted subject */
var newSub = lingoRSXML.snapshotItem(i);
/*** START : extracting subject from source if selected language and source language is same **/
var sub_L = "";
if (newSub.getAttribute("slang").toLowerCase() == code_l.toLowerCase()) {
if (value == '362843') {
sub_L = decodeURIComponent($scope.sourceContent[value].subject);
}
else{
sub_L = decodeURIComponent($scope.sourceContent[value].subject);
}
} else {
sub_L = newSub.innerHTML;
}
/*** End : extracting subject from source if selected language and source language is same **/
/* This code is placed to remove the extra meta tag adding in the UI*/
try{
sub_L = sub_L.replace('<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />','');
}
catch(e){
}
// if($scope.viewTrContentOnly || (newSub.getAttribute("slang").toLowerCase() != code_l.toLowerCase())) {
if ($scope.viewTrContentOnly) {
if ("ForumTopicPage" == "IdeaPage") {
if (value == '362843') {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
document.querySelector('.MessageSubject .lia-message-subject').innerHTML = sub_L;
}
}
}
if ("ForumTopicPage" == "TkbArticlePage") {
if (value == '362843') {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
var subTkbElement = document.querySelector('.lia-thread-subject');
if(subTkbElement){
document.querySelector('.lia-thread-subject').innerHTML = sub_L;
}
}
}
}
else if ("ForumTopicPage" == "BlogArticlePage") {
if (value == '362843') {
try {
if((sub_L != "") && (sub_L!= undefined) && (sub_L != "undefined")){
var subElement = rootElement.querySelector('.lia-blog-article-page-article-subject');
if(subElement) {
subElement.innerText = sub_L;
}
}
} catch (e) {
}
/* var subElement = rootElement.querySelectorAll('.lia-blog-article-page-article-subject');
for (var subI = 0; subI < subElement.length; subI++) {
if((sub_L != "") && (sub_L!= undefined) && (sub_L != "undefined")){
subElement[subI].innerHTML = sub_L;
}
} */
}
else {
try {
// rootElement.querySelectorAll('.lia-blog-article-page-article-subject').innerHTML= sub_L;
/** var subElement = rootElement.querySelectorAll('.lia-blog-article-page-article-subject');
for (var j = 0; j < subElement.length; j++) {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
subElement[j].innerHTML = sub_L;
}
} **/
} catch (e) {
}
}
}
else {
if (value == '362843') {
try{
/* Start: This code is written by iTalent as part of iTrack LILICON - 98 */
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
if(document.querySelectorAll('.lia-quilt-forum-topic-page').length > 0){
if(rootElement.querySelector('div.lia-message-subject').querySelector('h5')){
rootElement.querySelector('div.lia-message-subject').querySelector('h5').innerText = decodeURIComponent(sub_L);
} else {
rootElement.querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
}
} else {
rootElement.querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
}
}
/* End: This code is written by iTalent as part of iTrack LILICON - 98 */
}
catch(e){
console.log("subject not available for second time. error details: " + e);
}
} else {
try {
/* Start: This code is written by iTalent as part of LILICON - 98 reported by Ian */
if ("ForumTopicPage" == "IdeaPage") {
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
document.querySelector('.lia-js-data-messageUid-'+ value).querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
}
}
else{
if( (sub_L != "") && (sub_L != undefined) && (sub_L != "undefined") ){
rootElement.querySelector('.MessageSubject .lia-message-subject').innerText = sub_L;
/* End: This code is written as part of LILICON - 98 reported by Ian */
}
}
} catch (e) {
console.log("Reply subject not available. error details: " + e);
}
}
}
// Label translation
var labelEle = document.querySelector("#labelsForMessage");
if (!labelEle) {
labelEle = document.querySelector(".LabelsList");
}
if (labelEle) {
var listContains = labelEle.querySelector('.label');
if (listContains) {
/* Commenting this code as bussiness want to point search with source language label */
// var tagHLink = labelEle.querySelectorAll(".label")[0].querySelector(".label-link").href.split("label-name")[0];
var lingoLabelExp = "//lingo-label/text()";
trLabels = [];
trLabelsHtml = "";
/* Get translated labels of the message */
lingoLXML = doc.evaluate(lingoLabelExp, doc, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
/* try{
for(var j=0;j,';
}
trLabelsHtml = trLabelsHtml+'