Solved: Re: How to subset a data table based on column value

KKlear · Jun 10, 2023 4:28 PM

Hello! I have a large JMP file which contains the stability testing results for many (>75 product lots). I want to create a script to subset the rows of testing data based on the column value (i.e. the lot number) while only including certain test result columns from the file. I am fairly new to JMP and JMP scripting, but I have managed to put together the following script:

Data Table( "Stability Trending" ) <<
Subset(
	Output Table( "Data Subset - Lot 12345" ),
	Copy formula( 0 ),
	Selected Rows( 0 ),
	Rows(
		[1314, 1315, 1316, 1317, 1318, 1319, 1320, 1321, 1322, 1323, 1324, 1325,
		1326, 1327, 1328, 1329]
	),
	columns(
		:Protocol No.,
		:Lot No.,
		:Final Fill Configuration,
		:Name( "Storage Temperature (°C)" ),
		:Time Point,
		:Time Interval,
		:Sterility Result,
		:Name( "Test 1 Result" ),
		:Name( "Test 2 Result" ),
		:pH Result,
		:Name( "Test 3 Result" ),
		:Name( "Test 4 Result" )
	)
)

Right now, the script is selecting the rows based on the row numbers where the data for a particular lot is found. This is not optimal as rows are sometimes inserted within the table as stability studies are modified. How would I modify the script above to subset rows of "Lot 12345" based on the "Lot No." column rather than the row number? I am using JMP 15 (in case that is important).

KKlear · Mar 31, 2021 06:41 PM

Thanks for the suggestions everyone. Here is what I ended up doing:

dt = Data Table( "Stability Trending" );

LN={"Lot 12345"};

dt<<select where(contains(LN, Lot No.));

dt<<Subset(
	Output Table( "Data Subset - Lot 12345" ),
	Copy formula( 0 ),
	Selected Rows( 1 ),
	Columns(
		:Protocol No.,
		:Lot No.,
		:Final Fill Configuration,
		:Name( "Storage Temperature (°C)" ),
		:Time Point,
		:Time Interval,
		:Sterility Result,
		:Name( "Test 1 Result)" ),
		:Name( "Test 2 Result" ),
		:pH Result,
		:Name( "Test 3 Result" ),
		:Name( "Test 4 Result)" )
	)
)

The resulting data table is giving me the output I was looking for - only the rows containing the lot number of interest and the test results from the columns I specified.

View solution in original post

Georg · Mar 31, 2021 04:51 PM

Perhaps that example will work for you, BR

Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/Semiconductor Capability.jmp" );

dt_lst = dt << Subset(
	By( :lot_id ),
	All rows,
	columns( :wafer, :Wafer ID in lot ID, :SITE, :NPN1, :PNP1, :PNP2, :NPN2, :PNP3 )
);

Wait( 1 );

For( i = 1, i <= N Items( dt_lst ), i++,
	Close( dt_lst[i], nosave );
	wait(0.2);
);

Georg

cweisbart · Mar 31, 2021 3:36 PM

I would recommend using Select Where function (found under Rows->Rows Selection). Something similar to this should work:

Data Table( "Stability Trending" ) << Row Selection(
	Select where( :Lot No. == "Lot 12345" ),
	Dialog( Edit( Equal( Source Column( :Lot No. ) ) ), Keep dialog open( 0 ) )
);
Subset( (Selected Rows), Output Table Name( "Subset-Lot 12345" ) );

Clovis Weisbart

cweisbart · Mar 31, 2021 3:27 PM

Made a typo (forgot to replace one of my column names with the column name "Lot No." in your example):

Data Table( "Stability Trending" ) << Row Selection(
	Select where( :Lot No. == "Lot 12345" ),
	Dialog( Edit( Equal( Source Column( :Lot No. ) ) ), Keep dialog open( 0 ) )
);
Subset( (Selected Rows), Output Table Name( "Subset-Lot 12345" ) );

Clovis Weisbart

KKlear · Mar 31, 2021 06:41 PM

Thanks for the suggestions everyone. Here is what I ended up doing:

dt = Data Table( "Stability Trending" );

LN={"Lot 12345"};

dt<<select where(contains(LN, Lot No.));

dt<<Subset(
	Output Table( "Data Subset - Lot 12345" ),
	Copy formula( 0 ),
	Selected Rows( 1 ),
	Columns(
		:Protocol No.,
		:Lot No.,
		:Final Fill Configuration,
		:Name( "Storage Temperature (°C)" ),
		:Time Point,
		:Time Interval,
		:Sterility Result,
		:Name( "Test 1 Result)" ),
		:Name( "Test 2 Result" ),
		:pH Result,
		:Name( "Test 3 Result" ),
		:Name( "Test 4 Result)" )
	)
)

The resulting data table is giving me the output I was looking for - only the rows containing the lot number of interest and the test results from the columns I specified.