我对 JMP 和 JSL 还很陌生,目前正致力于在一个非常大的数据集中筛选出一个单元。 我的目标是获得一个输出表,其中包含每个测试的名称,其中单元(由其 ECID 定义)的 z 分数大于 3(或我可以选择的其他值)。 这些 z 分数需要基于每个测试各自通过单元的分布。 是否有用于此的 JMP 功能或已按照这些思路执行某些操作的脚本。 下面是我非常松散的脚本伪代码。
伪代码:
需要足够大的数据集来为不同的测试提供正态分布(至少 30 个好的单元)
将基础数据表子集为仅 bin 1 单元
仅使用通过单元计算每个测试的平均值和标准偏差
计算与每个测试的均值和标准偏差相关的 z 分数
创建包含所有测试和分数高于 3 的输出表
这篇帖子最初是用 English (US) 书写的,已做计算机翻译处理。当您回复时,文字也会被翻译成 English (US)。
已接受的解答
Here is an example of one way of handling your issue. The way I have set it up, is that you select the ECID rows in your input data table that you want to examine, and then run the script. Check it out and see if it points you in a direction where you can make the finishing changes that you need.
Names Default To Here( 1 );
// Open Data Table: Probe.jmp
// → Data Table( "Probe" )
//dt = Open( "$SAMPLE_DATA/Process Measurements.jmp" );
dt = Data Table( "Modified_Probe_Input" );
// Only run if at least one ECID row has been selected
If( N Rows( dt << get selected rows() ) > 0,
// Create a new data table to place the results in
dtOutput = dt << subset( selected rows( 1 ), selected columns( 0 ) );
selRows = dt << get selected rows;
// Get column names for all continuous columns
colNames = dt << get column names( string );
Remove From( colNames, 1, 9 );
dtSum = dt << Summary(
invisible,
Group( :Bin ),
Mean( colNames ),
Std Dev( colNames ),
Freq( "None" ),
Weight( "None" ),
link to original data table(0)
);
dtSum << select where( :Bin != 1 );
Try( dtSum << delete rows );
For( i = 1, i <= N Items( colNames ), i++,
Mean = Column( dtSum, "Mean(" || colNames[i] || ")" )[1];
Stddev = Column( dtSum, "Std dev(" || colNames[i] || ")" )[1];
column( dtOutput, colNames[i]) << set name( column( dtOutput, colNames[i]) << get name || " Z Score" );
For( k = 1, k <= N Rows( dtOutput ), k++,
Column( dtOutput, colNames[i] )[k] = Abs( As Column( dtOutput, colNames[i] )[k] - Mean ) / Stddev
);
targetRows = dtOutput << get rows where( As Column( dtOutput, colNames[i] ) >= 3 );
Try( As Column( dtOutput, colNames[i] ) << color cells( "red", As List( targetRows ) ) );
);
);
close( dtSum, nosave );
这是一个示例脚本,它创建 Z 分数并识别大于 3 的值。这是您想要的方向吗?
Names Default To Here( 1 );
// Open Data Table: Probe.jmp
// → Data Table( "Probe" )
dt = Open( "$SAMPLE_DATA/Process Measurements.jmp" );
// Create a Bin column
dt << New Column( "Bin",
modeling type( nominal ),
set each value( If( Random Uniform( 0, 1 ) <= .8, 1, Random Integer( 2, 7 ) ) )
);
// Get column names for all continuous columns
colNames = dt << get column names( string, continuous );
For( i = 1, i <= N Items( colNames ), i++,
mean = .;
stddev = .;
// Create the Z scores for each test
dt << New Column( colNames[i] || " Z Score",
formula(
If( Row() == 1,
Mean = Col Mean( If( :Bin == 1, As Column( dt, colNames[i] ), . ) );
Stddev = Col Std Dev( If( :Bin == 1, As Column( dt, colNames[i] ), . ) );
);
Abs( :Process 1 - Mean ) / Stddev;
)
);
Column( dt, N Cols( dt ) ) << delete formula;
targetRows = dt << get rows where( As Column( dt, N Cols( dt ) ) >= 3 );
Try( As Column( dt, N Cols( dt ) ) << color cells( "red", As List( targetRows ) ) );
);
这篇帖子最初是用 English (US) 书写的,已做计算机翻译处理。当您回复时,文字也会被翻译成 English (US)。
这篇帖子最初是用 English (US) 书写的,已做计算机翻译处理。当您回复时,文字也会被翻译成 English (US)。
这篇帖子最初是用 English (US) 书写的,已做计算机翻译处理。当您回复时,文字也会被翻译成 English (US)。
Hi Jim,
I wasn't clear with my previous issue. I would like to find a single units z-score for every test based on the the entire dataset's distribution. Because of this, I can't just subset the ECID's of note. Secondly, I'm not sure the original script is working correctly. Upon further review, I've noticed that many of the z-scores are unlikely. There are some z-scores that are nearing triple digits.
I've modified and attached the Probe sample dataset to be my input data table. From there I would like to create a script that can output a table containing a list of tests (column names) found in the original dataset where the parametric data for an individual unit (denoted by it's ECID) has a Z-score above 3 (or some other user controlled value). I've attached a sample table output below. The attached output table is not based on the parametric data, but an example of what I would like. In this example, I am screening all tests in which the first unit in the Probe dataset (ECID: Z1J4H_24_2_1) has parametric values that are greater than 3 sigma from the respective test's mean. Please let me know if I can clarify anything!
Here is an example of one way of handling your issue. The way I have set it up, is that you select the ECID rows in your input data table that you want to examine, and then run the script. Check it out and see if it points you in a direction where you can make the finishing changes that you need.
Names Default To Here( 1 );
// Open Data Table: Probe.jmp
// → Data Table( "Probe" )
//dt = Open( "$SAMPLE_DATA/Process Measurements.jmp" );
dt = Data Table( "Modified_Probe_Input" );
// Only run if at least one ECID row has been selected
If( N Rows( dt << get selected rows() ) > 0,
// Create a new data table to place the results in
dtOutput = dt << subset( selected rows( 1 ), selected columns( 0 ) );
selRows = dt << get selected rows;
// Get column names for all continuous columns
colNames = dt << get column names( string );
Remove From( colNames, 1, 9 );
dtSum = dt << Summary(
invisible,
Group( :Bin ),
Mean( colNames ),
Std Dev( colNames ),
Freq( "None" ),
Weight( "None" ),
link to original data table(0)
);
dtSum << select where( :Bin != 1 );
Try( dtSum << delete rows );
For( i = 1, i <= N Items( colNames ), i++,
Mean = Column( dtSum, "Mean(" || colNames[i] || ")" )[1];
Stddev = Column( dtSum, "Std dev(" || colNames[i] || ")" )[1];
column( dtOutput, colNames[i]) << set name( column( dtOutput, colNames[i]) << get name || " Z Score" );
For( k = 1, k <= N Rows( dtOutput ), k++,
Column( dtOutput, colNames[i] )[k] = Abs( As Column( dtOutput, colNames[i] )[k] - Mean ) / Stddev
);
targetRows = dtOutput << get rows where( As Column( dtOutput, colNames[i] ) >= 3 );
Try( As Column( dtOutput, colNames[i] ) << color cells( "red", As List( targetRows ) ) );
);
);
close( dtSum, nosave );
Hi Jim,
I've looked at this script and it seems to work on the sample dataset I sent you. For whatever reason when I modify just a few lines to fit my need case I'm getting the following error: "Scoped data table access requires a data table column or variable{1}". I've changed the following lines to fit my needs: 6, 17, 21, and 28. Other than that, this is the same script that you've provided above. Any ideas?