Solved: Re: Rookie question on data table generation (extraction) from raw probe data - Page 2

HeroTom · Apr 3, 2024 02:23 PM

Hi everyone, this is my first post in this community. I recently changed job and will do lots of prober bench test and data analysis. Our group use JMP. When I need to figure out something, I found google always directed me here and the warmhearted experts always gives professional answers. That's why I am here :).

Right now, my probing test typically generate a big CSV file where the measurement data are buried together with uninterested info (such as test set up, conditions, etc.). I would like to get some help to pull the useful info out using JMP script and from here, start my advanced script learning.

I explain my task in the attached screenshot. The top right is a single I-V sweep data point example (to measure R) to give you an idea of what my gigantic CVS looks like (it contains hundreds of such structures). "Multi-row uninterested info" represents those useless rows that I don't want to import. Besides copy and paste the I/V/R data portion, I also like to extract the wafer lot id, wafer id, reticle (or say stepper field) index, and subsite number out of some strings. The top right box shows my desired JMP output.

As optional read, there are some trivia in the bottom two boxes. Bottom left is my general comments on the data structure, and bottom right explains the meanings of the strings where I need to extract certain info out.

Hope the screenshot provides you enough info for my question. I used to do such work using Excel VBA Macro: scan search signature strings, then selectively importing certain character/string/row near that location, and then continue search till the end of the raw data file. I wonder JMP script would do the same?

HeroTom · Apr 4, 2024 07:33 PM

Hi txnelson, well said. Apologize for my laziness. I know I could figure out the answer myself if I spend time (and it does good to me in the long run) but I felt pressured to get the correct answer ASAP.

Will come back and help new comers as you have been doing when I become experience. I think this is a real user community where everyone helps each other. Each answer is customized to the individual needs. JMP really should incentive the active users here. Hope your answer in this thread will also benefit other engineer users.

jthi · Apr 5, 2024 02:46 AM

Continuing with my earlier idea by separating text file into sections and then parsing those sections (this recording partially covers this topic Scripters Club 2024: Session 2 - Preparing Unstructured Incoming Data for Analysis).

Names Default To Here(1);
// Test sections seem to always start with a line starting with "SetupTitle"
// and end to same UNLESS it is last section when it ends with
// line of ",,,,,,,,,,,,,,,,,,,,,,,"
TEST_SECTION_START = "SetupTitle, ";
EMPTY_LINE = ",,,,,,,,,,,,,,,,,,,,,,,";

// Section patterns for regex
PROJECT_PATTERN = "^(.+)_(\d+\.\d*)(W\d{1,2})_(T\d+)_(.+): (\d+ \d+)\,+?";
SUBSITE_PATTERN = "<Subsite>(\d+)</Subsite>";
DATASTART_PATTERN = "^DataName";


find_test_section_lines = function({lines}, {Default Local},

	start_idx = 0;
	end_idx = 0;
	
	TEST_SECTION_START = TEST_SECTION_START; // JMP "feature", 00075244
	TEST_SECTION_END = TEST_SECTION_END; // JMP "feature", 00075244
	
	For Each({line, idx}, lines,
		If(Starts With(line, TEST_SECTION_START),
			If(start_idx == 0,
				start_idx = idx;
			,
				end_idx = idx;
				break();
			);
		);
	);
	
	If(start_idx != 0 & end_idx == 0,
		end_idx = N Items(lines);
	);
	
	return(Eval List({start_idx, end_idx}));
);

parse_test_section = function({lines}, {Default Local},
	data_start = 0;
	
	PROJECT_PATTERN = PROJECT_PATTERN; //JMP "feature", 00075244
	SUBSITE_PATTERN = SUBSITE_PATTERN; //JMP "feature", 00075244
	DATASTART_PATTERN = DATASTART_PATTERN; //JMP "feature", 00075244
	
	For Each({line, idx}, lines,
		If(!IsMissing(Regex(line, PROJECT_PATTERN)),
			matches = Regex Match(line, PROJECT_PATTERN);
			projname = matches[2];
			lotid = matches[3];
			waferid = matches[4];
			time = matches[5];
			notes = matches[6];
			reticles = Words(Trim Whitespace(matches[7]), " ");
		, !IsMissing(Regex(line, SUBSITE_PATTERN)),
			subsite = Regex(line, SUBSITE_PATTERN, "\1");
		, !IsMissing(Regex(line, DATASTART_PATTERN)),
			data_start = idx;
			break();
		);
	);

	data_str = Concat Items(lines[data_start::N Items(lines)], "\!N");
	dt = Open(Char To Blob(data_str), "text", invisible);
	
	For Each({colname}, Reverse(dt << Get Column Names("String")), // drop empty columns
		If(Col Number(Column(dt, colname)) == 0,
			dt << Delete Column(colname);
		, 
			break(); // break on first "ok" column
		);
	);
	
	
	// define order and names here
	dt << New Column("project", Character, Nominal, Set Each Value(projname));
	dt << New Column("lot", Character, Nominal, Set Each Value(lotid));
	dt << New Column("waferid", Character, Nominal, Set Each Value(waferid));
	dt << New Column("time", Character, Nominal, Set Each Value(time));
	dt << New Column("notes", Character, Nominal, Set Each Value(notes));
	dt << New Column("x", Character, Nominal, Set Each Value(reticles[1]));
	dt << New Column("y", Character, Nominal, Set Each Value(reticles[2]));
	dt << New Column("subsite", Character, Nominal, Set Each Value(subsite));

	dt << Move Selected Columns({:project, :lot, :waferid, :time, :notes, :x, :y, :subsite}, To First); 
	dt << Delete Columns("DataName");
	
	return(dt);
);


// Start parsing
filepath = "$DOWNLOADS/Raw data examples_3DP.csv";
txt = Load Text File(filepath);

lines = Words(txt, "\!N");
lines = Filter Each({line}, lines, line != EMPTY_LINE); // drop empty lines

{start, end} = find_test_section_lines(lines);
dt_result = Empty();
While(All(start, end),
	cur_testset = Remove From(lines, start, end - 1);
	dt = parse_test_section(cur_testset);
	
	If(Is Empty(dt_result),
		dt_result = dt;
	,
		dt_result << Concatenate(
			dt,
			"Append to first table"
		);
		Close(dt, no save);
	);
	{start, end} = find_test_section_lines(lines);
);

dt_result << Show Window(1);

-Jarmo

HeroTom · Apr 5, 2024 6:12 AM

Hi jthi, you have my thanks! I will spend time to learn about this regex function for sure later.

Hope this thread not only gives me the in-need answers (really appreciated), but also serve as a starting template for similar scan-and-extract JMP tasks. Will save all the scripts for my future jobs!

Just to contribute to this topic further, the most likely issue when I use such scripts is the Job id format consistency. Pay close attention on how to construct the job ID and don't change it arbitrarily, which would distort the programming if not followed. To give two examples, my job id standard format is: Projname_Lot id_wafer id_personal note: index X index Y

But, since this is a bench test and I manually named all the tests, there were some cases where I myself violated the rule and did NOT EXACTLY follow the style above:

1. missed the "_" between lot id and wafer id, say, instead of "753367.1_W11", I typed "753367.1W11" in some cases;

2. added two personal notes instead of one: e.g. when do retest after the 1st run, instead of using "_2nd_", I used"_1st_redo_". Since I use "_" as info section divider, it would cause trouble there plus anything following that section as the counting messes up.

Thanks again to jthi and txnelson for helping me to make my first step on the advanced JMP scripting!

jthi · Apr 5, 2024 09:40 AM

Depending on your application, you might be able to make the parser more robust by looking for repeating patterns instead of relying on regex patterns matching specific lines. For example it looks like your project name for example is between MetaData lines

Depending on the assumptions you can make, you could extract it based on those MetaData and "empty" lines instead of relying on "line starts with Proj" or "line matches this regex pattern".

-Jarmo

HeroTom · Apr 5, 2024 05:23 PM

Interesting point, jthi. Will think about it.

Note the info is embedded in the job id line, the main trouble is how to extract the info out, not how to locate it. My lesson is to make the "line starts with Proj" consistent. Very very important. To double confirm that, we could add a pattern-verification script and make sure the extracted info is within expectation. Regex is an intuitive tool for that purpose.