Solved: Re: Rookie question on data table generation (extraction) from raw probe data

HeroTom · Apr 3, 2024 02:23 PM

Hi everyone, this is my first post in this community. I recently changed job and will do lots of prober bench test and data analysis. Our group use JMP. When I need to figure out something, I found google always directed me here and the warmhearted experts always gives professional answers. That's why I am here :).

Right now, my probing test typically generate a big CSV file where the measurement data are buried together with uninterested info (such as test set up, conditions, etc.). I would like to get some help to pull the useful info out using JMP script and from here, start my advanced script learning.

I explain my task in the attached screenshot. The top right is a single I-V sweep data point example (to measure R) to give you an idea of what my gigantic CVS looks like (it contains hundreds of such structures). "Multi-row uninterested info" represents those useless rows that I don't want to import. Besides copy and paste the I/V/R data portion, I also like to extract the wafer lot id, wafer id, reticle (or say stepper field) index, and subsite number out of some strings. The top right box shows my desired JMP output.

As optional read, there are some trivia in the bottom two boxes. Bottom left is my general comments on the data structure, and bottom right explains the meanings of the strings where I need to extract certain info out.

Hope the screenshot provides you enough info for my question. I used to do such work using Excel VBA Macro: scan search signature strings, then selectively importing certain character/string/row near that location, and then continue search till the end of the raw data file. I wonder JMP script would do the same?

txnelson · Apr 4, 2024 04:28 PM

I overlooked the subsite, sorry. I have added it to the script.
The purpose of the Discussion forum is to help JMP users with issues they are having with running JMP. It is not a free script writing service. The reason I write scripts for Community members is to hep teach them how to use JSL. It is my hope that the recipients of the scripts I provide, take the time to learn from them, so they will be able to solve their total issue.
I have changed the dynamic columns(portion column properties) to be numeric
The script now appends each created table together
Concerning the compressing of the code.....why? If the purpose of the compression makes it run faster, I could go along with that. But if compressing a piece of code makes it harder to read by the next human that needs to look at it, then the code is not good code. Therefore, my response to your question if the code could be compressed into a more concise format, my answer is that I am sure it can.

names default to here(1);

clear symbols( dtOut );

file = pick file();

dt = Open(
	File,
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( Comma, CSV( 0 ) ),
		Treat Leading Zeros as Character( 1 ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 0 ),
		Column Names Start( 1 ),
		First Named Column( 1 ),
		Data Starts( 1 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
);

// Change column names to simplify coding
for( i=1,i<=ncols(dt), i++, column(i)<< set name( "C" || char(i)));

// Create a base output table
dtFinal = New Table( "Final" );

// Loop through the data 
for(theRow = 1, theRow <= NRows(dt), theRow++,
	If( dt:C1[theRow] == "SetupTitle",
		memoryProjectName = "";
		memoryLotName = "";
		memoryWaferID = "";
		memoryT0 = "";
		memory1strun = "";
		memoryIndexX = .;
		memoryIndexY = .;
		memorySubsite = .;
		setupFlag = 1;
		projFlag = 0;
		
		// If this is not the first read in table, then append the table
		// to the Final table, and close the current dtOut table
		If( nrows(dtFinal) != 0 | isEmpty( dtOut ) == 0,
			dtFinal << concatenate( dtOut, append to first table(1));
			close( dtOut, nosave )
			)
		,
		dt:C1[theRow] == "" & setupFlag == 1,
			projFlag = 1;
			setupFlag = 0;
		,
		projFlag == 1 & dt:C1[theRow] != "",
			dt:C1[theRow] = trim(dt:C1[theRow]);
			memoryProjectName = word(1, dt:C1[theRow], "_");
			memoryLotName = word(2, dt:C1[theRow], "_W");
			memoryWaferID = word(3, dt:C1[theRow], "_W");
			memoryT0 = word(4, dt:C1[theRow], "_W");
			memory1strun = word(5, dt:C1[theRow], "_W:");
			memoryIndexX = num(word(2, dt:C1[theRow], ": "));
			memoryIndexY = num(word(3, dt:C1[theRow], ": "));
			memorySubsite = num(word(2, trim(dt:C1[theRow + 2 ]),"<>"));
			projFlag = 0;
		,
		dt:C1[theRow] == "DataName",
			colList = {};
			col=2;
			while( column(dt,col)[theRow] != "",
				insert into(colList, column(dt,col)[theRow]);
				col++
			);
			dtOut = New Table( "Lot " || memoryLotName || " WaferID " || memoryWaferID,
				new column( "Project Name", character),
				new column( "Lot Name", character),
				new column( "WaferID", character),
				new column( "Time Readout", character),
				new column( "Personal Note", character),
				new column( "SubSite", numeric, ordinal),
				new column( "Index X", numeric, ordinal),
				new column( "Index Y", numeric, ordinal)
				);
			for each( {colName}, colList,
				dtOut << new column( eval(colName), numeric, continuous);
			);
		,
		dt:C1[theRow] == "DataValue",
			dtOut << Add Rows(1);
			dtOut:Project Name[nrows(dtOut)] = memoryProjectName;
			dtOut:Lot Name[nrows(dtOut)] = memoryLotName;
			dtOut:WaferID[nrows(dtOut)] = memoryWaferID;
			dtOut:Time Readout[nrows(dtOut)] = memoryT0;
			dtOut:Personal Note[nrows(dtOut)] = memory1strun;
			dtOut:SubSite[nrows(dtOut)] = memorysubsite;
			dtOut:Index X[nrows(dtOut)] = memoryIndexX;
			dtOut:Index Y[nrows(dtOut)] = memoryIndexY;
			for each( {colName, index}, colList,
				column( dtOut, colName )[nrows(dtOut)] = num(column(dt,index+1)[theRow]) 
				)
	)
);
// Close the last work file
close( dtOut, nosave );

// Uncomment the below line if you want the raw table deleted
// close( dt, nosave );

Jim

View solution in original post

jthi · Apr 5, 2024 02:46 AM

Continuing with my earlier idea by separating text file into sections and then parsing those sections (this recording partially covers this topic Scripters Club 2024: Session 2 - Preparing Unstructured Incoming Data for Analysis).

Names Default To Here(1);
// Test sections seem to always start with a line starting with "SetupTitle"
// and end to same UNLESS it is last section when it ends with
// line of ",,,,,,,,,,,,,,,,,,,,,,,"
TEST_SECTION_START = "SetupTitle, ";
EMPTY_LINE = ",,,,,,,,,,,,,,,,,,,,,,,";

// Section patterns for regex
PROJECT_PATTERN = "^(.+)_(\d+\.\d*)(W\d{1,2})_(T\d+)_(.+): (\d+ \d+)\,+?";
SUBSITE_PATTERN = "<Subsite>(\d+)</Subsite>";
DATASTART_PATTERN = "^DataName";


find_test_section_lines = function({lines}, {Default Local},

	start_idx = 0;
	end_idx = 0;
	
	TEST_SECTION_START = TEST_SECTION_START; // JMP "feature", 00075244
	TEST_SECTION_END = TEST_SECTION_END; // JMP "feature", 00075244
	
	For Each({line, idx}, lines,
		If(Starts With(line, TEST_SECTION_START),
			If(start_idx == 0,
				start_idx = idx;
			,
				end_idx = idx;
				break();
			);
		);
	);
	
	If(start_idx != 0 & end_idx == 0,
		end_idx = N Items(lines);
	);
	
	return(Eval List({start_idx, end_idx}));
);

parse_test_section = function({lines}, {Default Local},
	data_start = 0;
	
	PROJECT_PATTERN = PROJECT_PATTERN; //JMP "feature", 00075244
	SUBSITE_PATTERN = SUBSITE_PATTERN; //JMP "feature", 00075244
	DATASTART_PATTERN = DATASTART_PATTERN; //JMP "feature", 00075244
	
	For Each({line, idx}, lines,
		If(!IsMissing(Regex(line, PROJECT_PATTERN)),
			matches = Regex Match(line, PROJECT_PATTERN);
			projname = matches[2];
			lotid = matches[3];
			waferid = matches[4];
			time = matches[5];
			notes = matches[6];
			reticles = Words(Trim Whitespace(matches[7]), " ");
		, !IsMissing(Regex(line, SUBSITE_PATTERN)),
			subsite = Regex(line, SUBSITE_PATTERN, "\1");
		, !IsMissing(Regex(line, DATASTART_PATTERN)),
			data_start = idx;
			break();
		);
	);

	data_str = Concat Items(lines[data_start::N Items(lines)], "\!N");
	dt = Open(Char To Blob(data_str), "text", invisible);
	
	For Each({colname}, Reverse(dt << Get Column Names("String")), // drop empty columns
		If(Col Number(Column(dt, colname)) == 0,
			dt << Delete Column(colname);
		, 
			break(); // break on first "ok" column
		);
	);
	
	
	// define order and names here
	dt << New Column("project", Character, Nominal, Set Each Value(projname));
	dt << New Column("lot", Character, Nominal, Set Each Value(lotid));
	dt << New Column("waferid", Character, Nominal, Set Each Value(waferid));
	dt << New Column("time", Character, Nominal, Set Each Value(time));
	dt << New Column("notes", Character, Nominal, Set Each Value(notes));
	dt << New Column("x", Character, Nominal, Set Each Value(reticles[1]));
	dt << New Column("y", Character, Nominal, Set Each Value(reticles[2]));
	dt << New Column("subsite", Character, Nominal, Set Each Value(subsite));

	dt << Move Selected Columns({:project, :lot, :waferid, :time, :notes, :x, :y, :subsite}, To First); 
	dt << Delete Columns("DataName");
	
	return(dt);
);


// Start parsing
filepath = "$DOWNLOADS/Raw data examples_3DP.csv";
txt = Load Text File(filepath);

lines = Words(txt, "\!N");
lines = Filter Each({line}, lines, line != EMPTY_LINE); // drop empty lines

{start, end} = find_test_section_lines(lines);
dt_result = Empty();
While(All(start, end),
	cur_testset = Remove From(lines, start, end - 1);
	dt = parse_test_section(cur_testset);
	
	If(Is Empty(dt_result),
		dt_result = dt;
	,
		dt_result << Concatenate(
			dt,
			"Append to first table"
		);
		Close(dt, no save);
	);
	{start, end} = find_test_section_lines(lines);
);

dt_result << Show Window(1);

-Jarmo

View solution in original post

pmroz · Apr 3, 2024 02:27 PM

No screenshot. Would it be possible to attach a small, anonymized dataset or .csv file?

HeroTom · Apr 3, 2024 03:03 PM

Hi pmroz, thanks for prompt reply. I did forgot the attachment when I first submit but then within 1 minute, I edit the post and attached the screenshot. You can check again.

Also per your request, I upload the excel here. I deleted most of the useless info to make the file clean but you can copy and paste the top left box as the "raw data" example.

txnelson · Apr 3, 2024 05:38 PM

The format you are showing should not be a huge issue to read in. The uninteresting information rows concern me because what is there can interfere with reading in the data. My other concern is how to have JMP recognize the row that contains the project name. Does it always start with Proj?

Providing an actual file would be very helpful.

Jim

HeroTom · Apr 4, 2024 9:07 AM

Deleted. Content merged into a later post

jthi · Apr 4, 2024 01:02 AM

Like others have said this can be highly dependant on your real data format. Below is example using the data copy pasted from excel to a text file

Names Default To Here(1);

filepath = "$DOWNLOADS/744027.txt";
txt = Load Text File(filepath);
lines = Words(txt, "\!N");

project_pattern = "^(.+)_(\d+\.\d*)(W\d{1,2})_(T\d+)_(.+):(.+)?";
subsite_pattern = "<Subsite>(\d+)</Subsite>";
datastart_pattern = "^DataName";

data_start = 0;
For Each({line, idx}, lines,
	If(!IsMissing(Regex(line, project_pattern)),
		matches = Regex Match(line, project_pattern);
		projname = matches[2];
		lotid = matches[3];
		waferid = matches[4];
		time = matches[5];
		notes = matches[6];
		reticles = Words(Trim Whitespace(matches[7]), " ");
	, !IsMissing(Regex(line, subsite_pattern)),
		subsite = Regex(line, subsite_pattern, "\1");
	, !IsMissing(Regex(line, datastart_pattern)),
		data_start = idx;
		break();
	);
);

data_str = Concat Items(lines[data_start::N Items(lines)], "\!N");
dt = Open(Char To Blob(data_str), "text");
dt << New Column("lot", Character, Nominal, Set Each Value(lotid));
dt << New Column("waferid", Character, Nominal, Set Each Value(waferid));
dt << New Column("x", Character, Nominal, Set Each Value(reticles[1]));
dt << New Column("y", Character, Nominal, Set Each Value(reticles[2]));
dt << New Column("subsite", Character, Nominal, Set Each Value(subsite));

dt << Move Selected Columns({:lot, :waferid, :x, :y, :subsite}, To First);
dt << Delete Columns("DataName");

It uses regex to find specific lines and then parses them. For data it finds the line index where it starts and then parses it by letting JMP just open it (depending on the real data, this might require additional step(s)).

-Jarmo

HeroTom · Apr 4, 2024 9:18 AM

Hi txnelson and jthi, thank you so much for helping out!

To txnelson:

1. uninteresting information row interference.

//Agree the script is highly content-dependent. Will check when new format data is used.

2. JMP recognize the row that contains the project name

//very valid. The job ID is typed by myself. Will make sure format style unique and consistent.

3. Actual data file

//Attached "Raw data examples_3DP.csv". Note it contains 7 columns instead of simplified 3 columns (I/V/R) in the early files.

To jthi,

Wow, this is the first time I see the use of "patterns"! This thread is already worth it because of that! Your script very well makes sense. It did run into error when I challenged it by adding 2nd DP into the attached txt ("744027_modified.txt"). 2nd DP created in the following way to make it different from the 1st one:

1. Duplicated the first DP block and retained those junk lines.

2. Changed Job ID as following and subsite from "8" to "9"

1st DP: ProjTiger_753367.1W1_T0_1strun: 3 5
2nd DP: ProjTiger_753367.2W11_T168_2ndrun: 3 6

3. Multiplied the "I" column by 10.

The result pasted below. Note, in future, this script will see different column patterns (say, more than three in the early example), and varying data length (say, by using different measurement step size).

Please feel free to use either new raw data CVS or the modified txt, which is simpler. Either way, appreciate the helps!

txnelson · Apr 4, 2024 12:22 PM

Here is a script that reads in all 3 of the project data found in your sample CSV file and creates 3 separate data tables.

names default to here(1);

file = pick file();

dt = Open(
	File,
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( Comma, CSV( 0 ) ),
		Treat Leading Zeros as Character( 1 ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 0 ),
		Column Names Start( 1 ),
		First Named Column( 1 ),
		Data Starts( 1 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
);

// Change column names to simplify coding
for( i=1,i<=ncols(dt), i++, column(i)<< set name( "C" || char(i)));

// Loop through the data 
for(theRow = 1, theRow <= NRows(dt), theRow++,
	If( dt:C1[theRow] == "SetupTitle",
		memoryProjectName = "";
		memoryLotName = "";
		memoryWaferID = "";
		memoryT0 = "";
		memory1strun = "";
		memoryIndexX = .;
		memoryIndexY = .;
		setupFlag = 1;
		projFlag = 0;
		,
		dt:C1[theRow] == "" & setupFlag == 1,
			projFlag = 1;
			setupFlag = 0;
		,
		projFlag == 1 & dt:C1[theRow] != "",
			dt:C1[theRow] = trim(dt:C1[theRow]);
			memoryProjectName = word(1, dt:C1[theRow], "_");
			memoryLotName = word(2, dt:C1[theRow], "_W");
			memoryWaferID = word(3, dt:C1[theRow], "_W");
			memoryT0 = word(4, dt:C1[theRow], "_W");
			memory1strun = word(5, dt:C1[theRow], "_W:");
			memoryIndexX = num(word(2, dt:C1[theRow], ": "));
			memoryIndexY = num(word(3, dt:C1[theRow], ": "));
			projFlag = 0;
		,
		dt:C1[theRow] == "DataName",
			colList = {};
			col=2;
			while( column(dt,col)[theRow] != "",
				insert into(colList, column(dt,col)[theRow]);
				col++
			);
			dtOut = New Table( "Lot " || memoryLotName || " WaferID " || memoryWaferID,
				new column( "Project Name", character),
				new column( "Lot Name", character),
				new column( "WaferID", character),
				new column( "Time Readout", character),
				new column( "Personal Note", character),
				new column( "Index X", numeric, ordinal),
				new column( "Index Y", numeric, ordinal)
				);
			for each( {colName}, colList,
				dtOut << new column( eval(colName), numeric, continuous);
			);
		,
		dt:C1[theRow] == "DataValue",
			dtOut << Add Rows(1);
			dtOut:Project Name[nrows(dtOut)] = memoryProjectName;
			dtOut:Lot Name[nrows(dtOut)] = memoryLotName;
			dtOut:WaferID[nrows(dtOut)] = memoryWaferID;
			dtOut:Time Readout[nrows(dtOut)] = memoryT0;
			dtOut:Personal Note[nrows(dtOut)] = memory1strun;
			dtOut:Index X[nrows(dtOut)] = memoryIndexX;
			dtOut:Index Y[nrows(dtOut)] = memoryIndexY;
			for each( {colName, index}, colList,
				column( dtOut, colName )[nrows(dtOut)] = column(dt,index+1)[theRow] 
				)
	)
);

Jim

HeroTom · Apr 4, 2024 12:48 PM

Hi txnelson, it works! I literally will need this script for my next week report. Thank you!!!~~~

Meanwhile, could you make some improvements if it is not too complicated?

1. The subsite info (column) is missing.

//To help future viewers, one wafer has multiple reticles (aka stepper field or die); each reticle has its own unique (index X, index Y); each reticle has numerous devices called "subsite" (aka subdie). The measurement is done per subsite-reticle-wafer.

2. Output all DPs into a single JMP table

//Without thinking too much, I applied your script to my full raw file (hundreds of DPs...), you can imagine what have happened : fought with numerous forever-popup JMP windows, and eventually surrendered by turning to task manager...

3. Change the data table portion column properties from character to numeric.

4. Is it possible to compress the coding into a more concise format?

//Current one is very intuitive, great for me as a beginner. But, down the road, a shorter code could be more user-friendly. Please forget it if it is complicated or no longer first-time-viewer-friendly.

txnelson · Apr 4, 2024 04:28 PM

I overlooked the subsite, sorry. I have added it to the script.
The purpose of the Discussion forum is to help JMP users with issues they are having with running JMP. It is not a free script writing service. The reason I write scripts for Community members is to hep teach them how to use JSL. It is my hope that the recipients of the scripts I provide, take the time to learn from them, so they will be able to solve their total issue.
I have changed the dynamic columns(portion column properties) to be numeric
The script now appends each created table together
Concerning the compressing of the code.....why? If the purpose of the compression makes it run faster, I could go along with that. But if compressing a piece of code makes it harder to read by the next human that needs to look at it, then the code is not good code. Therefore, my response to your question if the code could be compressed into a more concise format, my answer is that I am sure it can.

names default to here(1);

clear symbols( dtOut );

file = pick file();

dt = Open(
	File,
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( Comma, CSV( 0 ) ),
		Treat Leading Zeros as Character( 1 ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 0 ),
		Column Names Start( 1 ),
		First Named Column( 1 ),
		Data Starts( 1 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
);

// Change column names to simplify coding
for( i=1,i<=ncols(dt), i++, column(i)<< set name( "C" || char(i)));

// Create a base output table
dtFinal = New Table( "Final" );

// Loop through the data 
for(theRow = 1, theRow <= NRows(dt), theRow++,
	If( dt:C1[theRow] == "SetupTitle",
		memoryProjectName = "";
		memoryLotName = "";
		memoryWaferID = "";
		memoryT0 = "";
		memory1strun = "";
		memoryIndexX = .;
		memoryIndexY = .;
		memorySubsite = .;
		setupFlag = 1;
		projFlag = 0;
		
		// If this is not the first read in table, then append the table
		// to the Final table, and close the current dtOut table
		If( nrows(dtFinal) != 0 | isEmpty( dtOut ) == 0,
			dtFinal << concatenate( dtOut, append to first table(1));
			close( dtOut, nosave )
			)
		,
		dt:C1[theRow] == "" & setupFlag == 1,
			projFlag = 1;
			setupFlag = 0;
		,
		projFlag == 1 & dt:C1[theRow] != "",
			dt:C1[theRow] = trim(dt:C1[theRow]);
			memoryProjectName = word(1, dt:C1[theRow], "_");
			memoryLotName = word(2, dt:C1[theRow], "_W");
			memoryWaferID = word(3, dt:C1[theRow], "_W");
			memoryT0 = word(4, dt:C1[theRow], "_W");
			memory1strun = word(5, dt:C1[theRow], "_W:");
			memoryIndexX = num(word(2, dt:C1[theRow], ": "));
			memoryIndexY = num(word(3, dt:C1[theRow], ": "));
			memorySubsite = num(word(2, trim(dt:C1[theRow + 2 ]),"<>"));
			projFlag = 0;
		,
		dt:C1[theRow] == "DataName",
			colList = {};
			col=2;
			while( column(dt,col)[theRow] != "",
				insert into(colList, column(dt,col)[theRow]);
				col++
			);
			dtOut = New Table( "Lot " || memoryLotName || " WaferID " || memoryWaferID,
				new column( "Project Name", character),
				new column( "Lot Name", character),
				new column( "WaferID", character),
				new column( "Time Readout", character),
				new column( "Personal Note", character),
				new column( "SubSite", numeric, ordinal),
				new column( "Index X", numeric, ordinal),
				new column( "Index Y", numeric, ordinal)
				);
			for each( {colName}, colList,
				dtOut << new column( eval(colName), numeric, continuous);
			);
		,
		dt:C1[theRow] == "DataValue",
			dtOut << Add Rows(1);
			dtOut:Project Name[nrows(dtOut)] = memoryProjectName;
			dtOut:Lot Name[nrows(dtOut)] = memoryLotName;
			dtOut:WaferID[nrows(dtOut)] = memoryWaferID;
			dtOut:Time Readout[nrows(dtOut)] = memoryT0;
			dtOut:Personal Note[nrows(dtOut)] = memory1strun;
			dtOut:SubSite[nrows(dtOut)] = memorysubsite;
			dtOut:Index X[nrows(dtOut)] = memoryIndexX;
			dtOut:Index Y[nrows(dtOut)] = memoryIndexY;
			for each( {colName, index}, colList,
				column( dtOut, colName )[nrows(dtOut)] = num(column(dt,index+1)[theRow]) 
				)
	)
);
// Close the last work file
close( dtOut, nosave );

// Uncomment the below line if you want the raw table deleted
// close( dt, nosave );

Jim