已解决: Re: How to find every test to screen a unit

报告不当内容 · Jun 10, 2023 4:42 PM

我对 JMP 和 JSL 还很陌生，目前正致力于在一个非常大的数据集中筛选出一个单元。我的目标是获得一个输出表，其中包含每个测试的名称，其中单元（由其 ECID 定义）的 z 分数大于 3（或我可以选择的其他值）。这些 z 分数需要基于每个测试各自通过单元的分布。是否有用于此的 JMP 功能或已按照这些思路执行某些操作的脚本。下面是我非常松散的脚本伪代码。

伪代码：

需要足够大的数据集来为不同的测试提供正态分布（至少 30 个好的单元）

将基础数据表子集为仅 bin 1 单元

仅使用通过单元计算每个测试的平均值和标准偏差

计算与每个测试的均值和标准偏差相关的 z 分数

创建包含所有测试和分数高于 3 的输出表

txnelson · Jan 25, 2022 06:44 PM

Here is an example of one way of handling your issue. The way I have set it up, is that you select the ECID rows in your input data table that you want to examine, and then run the script. Check it out and see if it points you in a direction where you can make the finishing changes that you need.

Names Default To Here( 1 );

// Open Data Table: Probe.jmp
// → Data Table( "Probe" )
//dt = Open( "$SAMPLE_DATA/Process Measurements.jmp" );
dt = Data Table( "Modified_Probe_Input" );

// Only run if at least one ECID row has been selected
If( N Rows( dt << get selected rows() ) > 0, 

// Create a new data table to place the results in
	dtOutput = dt << subset( selected rows( 1 ), selected columns( 0 ) );
	selRows = dt << get selected rows;
	
// Get column names for all continuous columns
	colNames = dt << get column names( string );
	Remove From( colNames, 1, 9 );	
	
	dtSum = dt << Summary(
		invisible,
		Group( :Bin ),
		Mean( colNames ),
		Std Dev( colNames ),
		Freq( "None" ),
		Weight( "None" ),
		link to original data table(0)
	);
	dtSum << select where( :Bin != 1 );
	Try( dtSum << delete rows );
		
	For( i = 1, i <= N Items( colNames ), i++,
		Mean = Column( dtSum, "Mean(" || colNames[i] || ")" )[1];
		Stddev = Column( dtSum, "Std dev(" || colNames[i] || ")" )[1];
		column( dtOutput, colNames[i]) << set name( column( dtOutput, colNames[i]) << get name || " Z Score" );
		For( k = 1, k <= N Rows( dtOutput ), k++,
			Column( dtOutput, colNames[i] )[k] = Abs( As Column( dtOutput, colNames[i] )[k] - Mean ) / Stddev
		);


		targetRows = dtOutput << get rows where( As Column( dtOutput, colNames[i] ) >= 3 );
		Try( As Column( dtOutput, colNames[i] ) << color cells( "red", As List( targetRows ) ) );
	);
);
close( dtSum, nosave );

Jim

在原帖中查看解决方案

txnelson · Jan 19, 2022 11:43 PM

这是一个示例脚本，它创建 Z 分数并识别大于 3 的值。这是您想要的方向吗？

Names Default To Here( 1 );

// Open Data Table: Probe.jmp
// → Data Table( "Probe" )
dt = Open( "$SAMPLE_DATA/Process Measurements.jmp" );

// Create a Bin column
dt << New Column( "Bin",
 modeling type( nominal ),
 set each value( If( Random Uniform( 0, 1 ) <= .8, 1, Random Integer( 2, 7 ) ) )
);

// Get column names for all continuous columns
colNames = dt << get column names( string, continuous );
For( i = 1, i <= N Items( colNames ), i++,
 mean = .;
 stddev = .;
// Create the Z scores for each test 
 dt << New Column( colNames[i] || " Z Score",
  formula(
   If( Row() == 1,
    Mean = Col Mean( If( :Bin == 1, As Column( dt, colNames[i] ), . ) );
    Stddev = Col Std Dev( If( :Bin == 1, As Column( dt, colNames[i] ), . ) );
   );
   Abs( :Process 1 - Mean ) / Stddev;
  )
 );
 Column( dt, N Cols( dt ) ) << delete formula;

 targetRows = dt << get rows where( As Column( dt, N Cols( dt ) ) >= 3 );
 Try( As Column( dt, N Cols( dt ) ) << color cells( "red", As List( targetRows ) ) );
);

MichaelO · Jan 21, 2022 05:33 PM

这是一个非常好的起点。谢谢您的帮助。我想采用这个脚本，以便我只能在大型数据集中具有特定 ECID 的单元上运行它。如果我要在每个数据点/芯片上运行此脚本，我将不幸地导致 JMP 崩溃。您如何建议采用此脚本来实现这一点？

txnelson · Jan 21, 2022 10:03 PM

我只需创建原始数据表的子集并对其进行分析。

dt = current data table();
ECIDsToTest = { "413257", "56739",...........};

dt << select where( contains( ECIDsToTest, :ECID ) );

dtToTest = dt << subset( selected columns( 0 ), selected rows( 1 ) );

// Run the Z Score analysis on the new subset

MichaelO · Jan 25, 2022 11:30 AM

Hi Jim,

I wasn't clear with my previous issue. I would like to find a single units z-score for every test based on the the entire dataset's distribution. Because of this, I can't just subset the ECID's of note. Secondly, I'm not sure the original script is working correctly. Upon further review, I've noticed that many of the z-scores are unlikely. There are some z-scores that are nearing triple digits.

txnelson · Jan 25, 2022 03:38 PM

请提供一个示例输入数据表，以及您对输出的期望的示例。

MichaelO · Jan 25, 2022 04:58 PM

I've modified and attached the Probe sample dataset to be my input data table. From there I would like to create a script that can output a table containing a list of tests (column names) found in the original dataset where the parametric data for an individual unit (denoted by it's ECID) has a Z-score above 3 (or some other user controlled value). I've attached a sample table output below. The attached output table is not based on the parametric data, but an example of what I would like. In this example, I am screening all tests in which the first unit in the Probe dataset (ECID: Z1J4H_24_2_1) has parametric values that are greater than 3 sigma from the respective test's mean. Please let me know if I can clarify anything!

txnelson · Jan 25, 2022 06:44 PM

Here is an example of one way of handling your issue. The way I have set it up, is that you select the ECID rows in your input data table that you want to examine, and then run the script. Check it out and see if it points you in a direction where you can make the finishing changes that you need.

Names Default To Here( 1 );

// Open Data Table: Probe.jmp
// → Data Table( "Probe" )
//dt = Open( "$SAMPLE_DATA/Process Measurements.jmp" );
dt = Data Table( "Modified_Probe_Input" );

// Only run if at least one ECID row has been selected
If( N Rows( dt << get selected rows() ) > 0, 

// Create a new data table to place the results in
	dtOutput = dt << subset( selected rows( 1 ), selected columns( 0 ) );
	selRows = dt << get selected rows;
	
// Get column names for all continuous columns
	colNames = dt << get column names( string );
	Remove From( colNames, 1, 9 );	
	
	dtSum = dt << Summary(
		invisible,
		Group( :Bin ),
		Mean( colNames ),
		Std Dev( colNames ),
		Freq( "None" ),
		Weight( "None" ),
		link to original data table(0)
	);
	dtSum << select where( :Bin != 1 );
	Try( dtSum << delete rows );
		
	For( i = 1, i <= N Items( colNames ), i++,
		Mean = Column( dtSum, "Mean(" || colNames[i] || ")" )[1];
		Stddev = Column( dtSum, "Std dev(" || colNames[i] || ")" )[1];
		column( dtOutput, colNames[i]) << set name( column( dtOutput, colNames[i]) << get name || " Z Score" );
		For( k = 1, k <= N Rows( dtOutput ), k++,
			Column( dtOutput, colNames[i] )[k] = Abs( As Column( dtOutput, colNames[i] )[k] - Mean ) / Stddev
		);


		targetRows = dtOutput << get rows where( As Column( dtOutput, colNames[i] ) >= 3 );
		Try( As Column( dtOutput, colNames[i] ) << color cells( "red", As List( targetRows ) ) );
	);
);
close( dtSum, nosave );

Jim

MichaelO · Jan 28, 2022 04:56 PM

Hi Jim,

I've looked at this script and it seems to work on the sample dataset I sent you. For whatever reason when I modify just a few lines to fit my need case I'm getting the following error: "Scoped data table access requires a data table column or variable{1}". I've changed the following lines to fit my needs: 6, 17, 21, and 28. Other than that, this is the same script that you've provided above. Any ideas?

txnelson · Feb 2, 2022 04:35 PM

Can you supply a sample data table that is resulting in an error?

Jim

如何找到每个测试以筛选一个单元

Re: How to find every test to screen a unit

回复：如何找到每个测试来筛选一个单元

回复：如何找到每个测试来筛选一个单元

回复：如何找到每个测试来筛选一个单元

Re: How to find every test to screen a unit

回复：如何找到每个测试来筛选一个单元

Re: How to find every test to screen a unit

Re: How to find every test to screen a unit

Re: How to find every test to screen a unit

Re: How to find every test to screen a unit

推荐文章

Calculating Capability Indices Using the Distribution Platform

Conducting a Gauge R&R Analysis