Hi, my name's Steve Maxwell.
I am an engineer.
I work in the semiconductor industry.
Today, I'm going to give a presentation on how I use JMP
in the building of the image analysis pipeline.
A little bit of background on the problem.
Two of our module groups at our facility
ran into an issue with a feature
where there was erosion on the sidewall of the feature.
The SEM system that we used to measure
the size of this feature wasn't detecting this feature,
and it didn't have the capability to characterize this.
It's an older piece of equipment, but it's also paid for so we liked it.
The approach that I employed to help them out
was to build a separate analysis pipeline to analyze images,
to classify images that's having damage or not ,
and it produce a metric that could be sent to SPC system for tracking purposes.
I use JMP primarily to analyze data,
to build a model system deploying and manufacturing,
as well as, more recently,
integrating some of the actual image analysis into JSL.
I'll show some examples of that.
This is a sample of what it is that we're looking at.
This is a good image the left- hand side,
and this is a bad image on the right-hand side.
We can see here the like variation and erosion.
That's what we were looking for in these images.
We could see from here that the measurement,
which is done on the inside of this
wouldn't really reset to these features that lands at the same spot every time.
We're not able to tell by just using our normal methods.
This is a little bit more detail about what we were seeing.
This is an example of what some of that data looked like
if we could see from here that though the process was having issues,
we were not detecting it with the measurements that we were using.
We launched into an approach to address this issue.
I use the standard ETL format for most data science problems.
Extract, transform, load.
I got to break it into four categories,
configure, collect, analysis and report.
Configure and collection, standard methods.
Analysis, that's the key to this.
That's how I'm reprocessing the images
and then extracting metrics for those images that we can use to build a model
to determine whether or not, we could detect the issue at hand.
The approach for that analysis is to convert the images from RGB,
which is come off on the system, the measurement system,
and convert those to gray scale.
Then the images are cropped, remove any measurement details,
everything like that out of them.
Then the next step is to apply median to these images.
The reason why do that is we're going to blur the images
and remove some of the noise out of the image.
This is a key step for image analysis and image processing.
The approach was used as an experimental approach different kernel sizes,
different kernel shapes, but in the end, is settled on
using a disk- shaped kernel with a pixel radius of like 5, 7 , 10, and 15 pixels.
For those that are unfamiliar with that terminology,
kernel is basically just an array
that's used to deteriorate the image when you're reprocessing it.
After that was done,
extraction of the metrics from the reprocessed images was developed,
used several different standard metrics,
structural similarity index, mean square error, Shannon entropy,
then a few different ones like gaussian version of SSIM,
as well as looking at the means of the standard deviations of the
SSIM image gradients from the gaussian weighted images.
From there, used dimensional reduction techniques
to extract which of those metrics was most relevant,
to try and build a model, a predictive model
and then convert that into some of that metric
that could be easily interpreted on the manufacturing
while running production.
One of the follow-ups on this is when doing the metric extractions,
the reference image exploits the image that
used a kernel with a pixel radius at five,
and then the 7, 10, and 15 kernel radius is prepared to that image.
This shows a little bit more detail about
what the analysis pipeline approach looks like.
The offline approach, we start with the image acquisition.
Obviously, this is done during the manufacturing process.
After that, initially, this work was done Python,
which was to convert the grayscale crop, de noise, the images,
and then feature generation metric extraction, SSIM,
mean square, entropy, calculations, etc.
JMP JSL has a Python wrapper that you can use for integration.
What's great is you can actually move a lot of this code
in to the JMP environment.
I'll go into a little bit more detail about what that is
and how I use it to help speed this process along.
The final component of this was to develop a quality metric.
This is very JMP-intensive step as well.
Then once a metric is determined,
you move that over into an online process
in which, the manufacturing data is analyzed,
and then that information is pushed to a SPC system.
A little bit more about Python/ JSL Integration,
the Integrated Python analysis code into JSL using Python/ JSL wrapper.
The data transfer occurs by converting the JMP table into Pandas Data frame.
The data frames are passed in an analysis code.
The results are returned from Python to JMP as a data table.
What's great about this is it enables using
common image analysis libraries as skimage, scipy, PIL, OpenCV , etc,
to perform the work while keeping the data within the JMP framework.
This, in my opinion, was key because I'm not moving data
from one system or one platform to another platform and back and forth.
I was able to keep it all in one spot.
This is a little bit more detail about
what the analysis pipeline is doing when it's reprocessing the images.
This is what I was able to move from just running slowly in Python
to integrating it into JSL using, let's say, Python libraries.
You can see, here's a bad, here's a good.
We cropped the image to only show
only take basically this component of it, convert it to gray,
and then applying medium filters to blur the images with different kernel sizes
as you can see as we increase the kernel sizes,
they become a little bit blur.
What I want to show takeaway here is for a good image, you can see
the kernel with disk size 5 versus, 7 versus, 10 versus 15,
look pretty similar to one another .
Whereas disk 5, disk 7, disk 10, disk 15, on a bad image,
you can see that by time you get to disk 15,
this image, or kernel size 15,
this image is starting to look similar to this.
By removing the noise from this image,
and then denoising the images incrementally
using different kernel sizes.
Th is what enables us to successfully extract metrics
that we can use to differentiate between the good versus the bad.
Sorry, hold on here. Let me go back up.
At this point, I'm going to launch demo that shows
how I'm moving this data into JMP
and then what it looks like as it basically reprocessing this stuff.
How I do that is the first thing I need to do is I need to build a data table.
It's a great feature in JMP,
you go to file, import multiple files here.
Then what I've got is I've got basically a sample set of just five images
that I'm going to combine all t ogether into one data table.
What it does is it actually import the images as well as the image file names.
You go here, click that file column name, click import,
and
what we get,
is this.
We've got an image of our picture that we want to analyze.
We've got a file name tag next to it.
We've got the data imported image.
Now, the next step is to run our analysis code
against each of these images and start extracting metrics.
I've got a...
modified version of that code here.
The way this code works is
there's a python initialization that occurs JSL.
And then once that's done,
the data table that you want to analyze
is past eight to Python as a data frame.
At that point under the Python submit,
basically, you convert over to Python code.
And then you're using standard approaches,
standard coding procedures for Python
and analyze the image, as you can see here,
we've our library imports, etc.
How high this code was, basically to find a bunch of functions.
Then those functions will return an output
that would basically get added back into the data frame
that is pushed back out to JMP into the data table.
That's done by using this lambda function,
th is lambda function applies this fu nc tion all rows,
this specified column in the data frame or the JMP table.
When we run this...
What we're going to do is take that table I just built
and we're going to extract,
but what we're going to do is reprocess the images and extract the metrics.
What we see here is the images that are showing earlier.
We've taken our picture, we've converted it to gray.
Then we started learning it.
We've got 5 pixel radius kernel 7, 10 and 15.
Then from there, we're extracting all of our different metrics.
SS IM for 7 compared to 5 SSIM.
SSIM is structural similarity index,
7 compared to 5, 10 compared to 5, 15 compared to 5.
That's basically what we're doing, is calculating structural similarity index
between this image and this image,
this image and this image
and this image and this image.
If we go back to our presentation,
from here, I used several different approaches for interrogating the data.
First was multivariate
lots correlations to work with here.
The red indicates bad images, blue indicates good images.
An outlier analysis, just looking at the Mahalanobis distances,
Jackknife distances.
That's just to see whether or not we're heavily skewed,
you can see that we've got more outliers
in the bad population compared to the good.
But I think if you separate these two out,
this comes down a little bit.
I'm showing this just to show you can go in here,
and you can start trimming some of these out if you want to,
if you're not seeing good fits.
But for the purpose of this exercise,
I didn't have to remove or hide any data in the analysis.
I was able to use all as it is.
But how I did that was here.
This is the main data table.
This is all the data that I looked at, here all the images.
I would just go to analyze, multivariate methods, run the multivariate.
What we want to do is just go with that,
actually, sorry, in the second here.
When classifying,
I did use conversion on the classification from a character base t o turn.
There's a great feature in here.
We can go to column, utility, and then make indicator columns.
You can go here and click append column name.
What it does is a little say, "Okay, that is one, zero."
For this row, this is bad, not good.
Then, down here to go vice versa.
You can use that information
to feed into some of the different modelling platforms
that we're going to look out here in a few minutes.
Back to the multi variate.
Going here, going to run all these.
Let's run this platform.
That's interesting.
Let's go here, my bad.
W e can see our correlations.
Then to get the outlier analysis,
go here, click,
sorry, go into the multi variate and you go down to outlier analysis.
You run Maha lanobis Distances, you run Jackknife Distances .
There's different approach is based on.
I've had to get successful for this.
This is first pass.
Looks like we've got something to work with here.
The next step is to break these down a little bit more
to look at how each of the different metrics is responding
to different kernel sizes.
What we're really looking for here is
how to get...
We start to see larger changes in the kernel sizes.
Are the lines still overlapping with one another?
We're starting to see separation.
Looking at 15 on bad , versus 15 are good,
15 are bad, 15 are good for structural similarity in MSE
and we can start to see that, we're seeing separation here.
I think that is a good indicator
that we should be able to extract something from this.
That was done using the graph builder demo.
I've got data table built
specifically for this
graph builder, great feature here.
I'm dumping in kernel size.
I'm looking the value the network,
these into groups based on good versus bad classification codes.
Then my image metric is going to use page.
Here I just go create a linear fit
that look at advanced for prediction , R².
It will just pop the equation is there.
You can see the goodness of the fit.
I do like looking at cheer points.
For this, when I used centered grid and move this along.
It's just easy around the eyes for presentation purposes.
You can scroll through here,
and it's got for each of these really nice looking graphs to preview.
Based on the multi variate and this linear fit,
I'm feeling pretty good about the data
that we extracted from these images.
The next step is, we've got a lot of metrics that we've extracted from these.
Do all of it matter, do some of it matter?
The next couple slides, we go over dimensional reduction.
Specifically, I used a PCA and Partial B squares for this PCA.
With this is classification, bad versus good.
This goes back to what I was demoing earlier
and the data table.
When you're building a PCA, you want to have, obviously,
your inputs and your outputs recast as vectors in that analysis.
The reason why you want that is because
you're looking for non-o rthogonality with regards to your inputs and your outputs.
The more orthagonal, they are to one another,
the less likely they're playing a role in what's going on.
Recasting
a character- based classification into a number makes that easy to work with.
We could see here, we've got some here that look like they could be interesting.
Some of these are interesting.
I highlighted these to run this analysis
and simply took my data table that I built,
go here, analyze, predictive modeling.
Sorry, multivariate, principal components.
What we're going to do is cast good, bad classification,
and all of our different metrics into the analysis.
See here, these look interesting.
These were particularly interesting.
What's nice is when you highlight these,
and you go back and you look at your data table,
it'll highlight which problems you selected.
Oftentimes PTs can get really busy,
so it's handy to be able to go references, and say,
"Okay, so these are the ones that I'm actually interested in."
Let's get back to here.
After this, in a PLS, this is more for
targeting how many factors we think are playing a role
in the data that we've got,
as well as helping us identify additional metrics
that might be interesting.
For this particular data set,
it looks like it settles in and about
probably minimum of five factors that we would expect
that comes from this prob greater than
van der Voet T² metric based on the job manual,
they referenced a academic research that indicates that
anything goes about 0.1 is typically where you want to start at.
From there, we run the PLS factor identification.
This is your VIP versus Coefficient plots,
and from there, you can highlight the different
metrics that seem to be driving this, the ones that are most relevant.
The way that works looking at these charts is you want to go out to the extremes.
These are interesting.
As it gets closer, closer to the zero coefficient,
it's less interesting.
I'm not going to run the PLS platform
because this will actually take a while to run.
I don't think anyone wants to just watch my computer run
while doing this.
Based on our dimension reduction,
I ended up with these factors being the most relevant
with regards to what it is that we want to feed into the neural network.
Some experimentation, I found that it was mainly the PLS results,
and then the standard 10 metric from the PCA as well
when that was added in,
it really helped pull the model together.
Training versus Validation.
Obviously, I'm using a neural network to build the model for the classifier.
The reason why I'm using that platform specifically for this,
it has K-Fold validation booked in on its standard in the JMP platform,
which is great.
I've found that K-F olds helps to compensate
when you're teetering on class imbalance.
That's a really common problem when you're dealing with,
studying manufacturing processes.
Y ou're not going to have a lot of the defective data to work with.
You've really got to make the best of what you've got.
I find that sometimes K-Folds helps with that.
The output from the neural network was pretty good, actually.
The way I do this is I'll run this model, like about five times,
and then I'll compare all five of them to one another.
I'll take basically the median, this classification rate,
and then go down here and look at our false positives and false negative rates.
With regards to,
the false positive rates.
I look at the ROC Curve.
This is a secret operator characteristics curves.
You want to look at these,
they want to be described as high and tight,
the closer these lines get to this middle line,
the closer your getting to the coin pass and your the prediction.
Obviously, that is not what we're seeing here.
Then the prediction profiler has some great features
because you can basically reorder these based on
which are playing the biggest role in the prediction,
as well as visualizing, like what each individual metric is doing
with regards to playing a role in the prediction itself.
In this case,
we're looking at transitions of the dominant dimensions here.
Here's a little bit weird.
I'm not sure what's going on with that.
A s you get back here, where the less dominant dimensions,
you can start to see they become more pushed.
I'll run the demo for the neural network.
We go here.
Analyze.
A ctually, I've got this one built already.
Let me...
We launch this analysis.
We've included our factors, our responses are classification.
Neural network doesn't care
if you're using a numerical or character- based response.
That will work with either one.
Here, for validation method,
you've got a K-Fold, using five folds.
The gold standard for most like AI-type analysis is ten.
I found this five was fine
so no reason to push it.
I use three hidden nodes.
Hit go and then here's the output.
Like I said, I'll run this probably
five times, take the reading value you can see here.
Our classification is pretty good.
We'll see a little bit variation on the validation set.
But it's all within reason for what it is that I'm looking for
with regards to the profiler and the ROC curve.
You get down here, which ROC curves you can see here,
as I described earlier, this height characteristics on the curves.
Validation, may be a little bit less, but we can work with it.
Then with regards to the profiler.
The profiler outputs looks like this.
It doesn't arrange them and which one is most relevant.
You go here
and you go to assess variable importance, independent uniform inputs.
Go back up here, we can see a summary report.
Here, you go down to variable importance, menu.
From here, you can reorder factors by main effect importance.
For this model, it reordered this way,
Colorize.
You can see here we've got the darker color is most relevant
and then it moves to the right.
Let's see.
Converting the model to an SPC metric, to make the data easier to interpret,
we need to take the result from the model converted into metric
that can be posted to SPC system and plotted on SPC chart.
The activation component of the neural network model is a soft map function
that's a standard in JMP.
This calculates the probability of an image being good or bad,
with the higher the two resulting and with the images classified as
and what we're going to do here is going to introduce a third metric unknown.
I'll show a little bit about how that is done.
One of the nice things about the neural network platform
is the SAS DATA Step output file is actually like a Pythonic model script.
Typically for the metric system, you want this in python,
that'll be really cool,
forgot how to do it all in JMP.
This code from the neural network here,
the actual model itself can be outputed in this format.
You just go to model and make SAS DATA S tep.
Here we go.
I think in JMP Pro, you could just output it by python directly,
but not everyone has access to JMP Pro.
It gets either 80 % of the way there,
not 100 %.
It just requires a few modifications in the actual code.
On the left hand side, we've got the SAS DATA S tep format.
On the right-hand side, it's the conversion of it to the python.
It's just importing it as importing the math library,
and then applying that so we can take the tangent to things.
Then you get that here.
What we want to do is threshold,
our S oftm ax probability calculation we're doing this,
rather than just saying, whichever of these,
is highest is your actual results.
If it's 0.5 1, bad, 0.49 , good, then it's going to be bad.
Is we go and we say, "Okay, if the Softmax output is greater than 0.75
then it's that classifier, because it'll be equal and opposite."
It'll be acceptable one.
If it's 0.75 bad then it's bad,
but if it's a number less than that, it won't classify it as good or bad.
It'll just classified as unknown.
We'll pass that into our system.
What that tells us to do is someone look at those images,
because the classifier can't decide whether or not it's good or bad.
The output, and this is the finished product,
is to convert, like I said,
the final metric into something that could be reported in our SPC system.
Basically, what we're doing here is just actually reporting
what our Softm ax probability is,
taking the inverse of that, and then applying the threshold
to determine whether or not things are known or unknown.
Based on that, this is our initial data
that I've showed at the beginning of the presentation
that shows this is the chart look like for the measurement of the defect.
Then this is what we get when we go in
and we reanalyze that data and apply this new metric,
which tells us whether or not it thinks that there's side wall feature,
you can see looking at the same data set
that yes, it very much so identifies that as either good or bad.
The green indicates that's a good.
The red indicates those were classified as bad.
Then you can see here on SPC Data,
they would instantly get flagged as bad as well.
These blue ones hanging out here
are the ones that are unknown that it is telling us so much,
"Go look at this."
That is my presentation.
I've posted the information to the user community site
if anyone wants access to the code
that's used for the image analysis that is there.
Thank you very much.