Pythonless Python Integration for JMP® (2023-EU-30MP-1265)

8 Kudos

Jarmo Hirvonen, Data Integration and Data Science Specialist, Murata Electronics Oy
Philip O'Leary, Data Integration Manager, Murata Finland

Challenges with a JMP® and Python integration resulted in a search for an alternative solution that would allow for the evaluation and testing of the various Python libraries and powerful algorithms with JMP. This would enable JMP users to work with Python from a familiar JMP environment. After a few different iterations, a RestAPI service was developed, and when JMP calls this service, it dynamically creates a user interface based on the options the service currently provides. The JMP user can then utilize this user interface to employ different algorithms such as HDBSCAN, OPTICS, and UMAP by sending data directly from JMP in one click. After the algorithm has finished its operations on the server side, it will return data to JMP for further analysis and visualization.

Welcome to the Pythonless Python Integration for JMP presented by Murata Finland. My name is Philip O'Leary. Shortly about Murata, we are a global leader in the design, manufacture, and supply of advanced electronic materials, leading- edge electronic components, and multifunctional high- density modules. Murata innovations can be found in a wide range of applications through mobile phones to home appliances, as well as from automotive applications to energy management systems and health care devices.

We are a global company, and as of March 2022, there was approximately 77.5 thousand employees worldwide, just under 1,000 in Finland, where we are located. Our product line up here in Finland include accelerometers, inclinometers, gyroscopes, and acceleration and pressure sensors. Our main markets are the automotive, industrial, healthcare, and medical.

Today, we have two presenters, myself, Philip O'Leary and my colleague, Jarmo Hirvonen . I've been working in the ASIC and MEMS industry for over 40 years, 32 of which have been here at Marata. I've had several roles here and have come to appreciate the importance of data within manufacturing. Most recent years have been devoted to supporting the organization take benefit from the vast amount of data found from within manufacturing. I currently lead Murata's data integration team.

Jarmo, perhaps you'd like to give a few words on your background.

Yes, sure.

Hi, I'm Jarmo Hirvonen and I work in Philips' team as a data integration and data science specialist. I have been using JMP for four and a half years, approximately the same time that I have been working at Murata. I'm a self- learned programmer. I have been studying both programming besides a couple of basic courses at university.

In my position, I do a lot of JSL scripting. I write adding scripts, reports, automatisation, basically almost everything you can script with JSL. If it stays mostly inside JMP. I'm active JMP community member. I'm also a super user there. B ecause due to my background with the JSL scripting, I'm also steering committee member in the community scripters club. I have also written, I think at the moment nine add- ins that have been published to JMP community. Feel free and try them out if you are interested in that. Thank you.

Thank you, Jarmo. This is the outline for the presentation that we have for you today. As this session has been recorded, I will not read through the outline as you can do so yourselves afterwards. Why do you have the need for a JMP Python integration? Well, basically, we are very happy with the performance and the usage we have of JMP. It doesn't require any programming for the basic usage, and we see this as a big advantage. JMP's visualization and interactive capabilities are excellent. T he majority of people performing analysis at Murata in Finland are already using JMP. W e have a large group of people throughout the organization using JMP, and we want to maintain that.

However, on the Python side, we see that Python has powerful algorithms that are not yet available in JMP. We already have people working with Python in various different applications, and we have models within Python. We want to support these people and also help others understand and take advantage of the Python world. B asically, we want to take advantage of the wide use of JMP here at MFI and offer JMP users access to some common Python capabilities without the need for themselves to program.

I'll continue here. Share. JMP already has Python Integration, but why we are not using that? Basically, there are two groups of reasons, JMP and us or our team. My experience regarding JMP are from JMP 15 in this case. JMP update at least once broke this integration and it caused quite a few issues for us because we couldn't use the Python JMP scripts anymore unless we modified them quite heavily. Getting JMP to recognize different Python installations and libraries has been quite difficult, especially if you are trying to work on multiple different installations or computers.

Also, JMP didn't, at least that then support virtual environments that are basically necessary for us. Then our team side, we don't have full control of Python versions that JMP users are using or the libraries and packages they are using. Because not everyone is using JMP as the main tool. They might be using Python and they have some versions that don't work with JMP and we don't want to mess with those installations. Also, in some cases, we might be running Python or library versions with JMP doesn't support yet, or maybe it doesn't support old versions anymore.

What is our current solution for this Python JMP or JMP Python Integration? We are basically hosting Python server using a web framework. We can create endpoints to that server, which are basically, behind them, there are different algorithms. We communicate with Rest API between JMP and the server. This is the biggest benefit. This test we can use JMP with the server, but we also have a couple of additional benefits. We can have centralized computing power for intensive models. For example, we don't have to rely on the laptop to perform some heavy model calculations. The server is not just limited to JMP. We can also call the endpoints from Python or for example, R. We are not dependent on the JMP supported Python and library versions anymore. We can basically use whatever we want to.

Next, I will go a little bit away from the PowerPoint to jump and show a little bit of the user interface. First, I will explain some terminology which might appear here and there on this presentation. W e have endpoints, basically, this path here is endpoints. These come directly from the server. Then we have methods. It's the last part of the endpoint, DSME and XT boost in these two here.

Then we have parameters, this column, and this is basically the inputs that we will send to the server. Then we have what I call stack or we call stack. It's the collection of stack items. O ne row is the stack item that we can send one after another to the server. Quickly jump here. W hat features we have? We have easy to add new endpoints. Basically, we write at the end point to the Python server, we built the server, we ran the JMP add- in, and this list will get updated. This adding support dynamic data table list. I f I change the table here, it will update here. Also, if new table is opened, the other screen, but it doesn't really matter. You can see it here, the untitled3 tree was opened.

Then we can send data directly from here to the server, but they're pressing multiple different options for sending. I can send these selections that I had here basically immediately. I will show the results here. After getting the data back, we join it. These are from the server. We join the data to the original data table we had and then we have some metadata we can get from the server between the from the communication. Notes, column, properties telling what method and parameters were used to get these two columns. Then we group them. I f I have run multiple models or methods, it's easier to see which are from which runs.

Then we have table scripts which are also grouped. This is different screen, let's move them around. We have stack. What was sent? HTTP response from the sent that comes from the server. Then in this case, we also receive from the endpoint an image. In this case, it's a scatter plot from the t-SNE components. I said already earlier, we can send multiple items from the stack one after each other. You can build, let's say, HPP scan with different input parameters used in the, let's say, 20 here and then 20 to 40, add them to stack and just send them there and come back when they're done and you can start comparing if there are some difference between those.

T hen endpoints have instructions how to use them. Documentation link, if we have one short description, in this case, very short description of the endpoint, and then what each of the parameters do. Minimum values, maximum values, default values, and descriptions of those.

Then we also have user management. In this case, I'm logged in as a super user, so I can see these two here experimental endpoints that basic user would not be able to even see. Then back to PowerPoint. This may be a partial implementation, partially how the adding works. When the user runs the adding, the JMP will ping the server, and if the server is up and running, JMP will send new request for the JSON that we will use to build the interface. The JSON is passed and then the interface is built and it is using JMP- type classes that I will show a bit later. C ustom class is created in JMP.

A t this point, users can start using the user interface. User fills the selections, parameters, data tables, and such, and then sends the item from the stack. We will get the columns based on the inputs, get the date that we need and convert that data to JSON. In this case, I call it column JSON because there's a demonstration. Basically, normal JSON would always have the column name duplicated. Each row will have all the column names here. I n this case, we will have column name only once and then list of values. This makes the object we send much smaller.

Before we send the data, we will ping the server again. This is done because we have different timeouts for ping and the request. Otherwise, JMP will lock down for a long time if the server is not running and we are using two minutes timeout, for example. T hen when the server gets the data, it will run the analysis, return the analysis results, and we join them back table at the metadata table scripts and so on. A t this point, users can start to continue using JMP, send more items from the stack, or maybe even JMP to graph builder and start analyzing the data that he or she gets back from the server.

T his is the JMP- type classes. W e have different classes for different type of data we get from the server. We have booleans. I n JMP, this is checkbox columns, enumerators, this would be combo box type number, TypeS tring, and not implement that. This is basically used to check that the server is correctly configurated. This is a quick demonstration of one of those Type Column.

On server side, it has been configured like this. When we request the JSON, it will look more like this. Then this type column class will convert it into an object that will look in the user interface like this. From here you can see that for example, minimum items is one. It's the same as minimum here. Max items, same thing. Then modelling types have also been defined here. We can limit minimum, maximum values, and so on based on the schema we receive from the server. A ll of these are made by the custom JMP classes. T his is enumerator, some options, then number boxes, and here is the boolean. N ow, Phil, we'll continue with the couple of demonstrations of the Pueb interface.

Thanks, J armo. All demonstration is done today will be performed using standard JMP 16. There are three demonstrations I'd like to go through, each having a different task in mind. The first one, I'll just open the data set. This is a data set which contains probe or test data from five different products. It's a rather small data table just to ensure that we don't get caught for time.

W e have 29 probe parameters for five products within the same product family. T he task at hand is to try to determine quickly, do we have anomalies or do we have opportunities for improvement. Looking simultaneously at these five different products, 29 different parameters, such that we could identify something that could help reduce risk or something that perhaps could reduce cost and improve yield.

O ne possible way to do this, of course, would be the one factor at a time whereby we would just manually march through all the different data, all the different parameters and look for patterns. Very inefficient for 29 parameters, it's okay, but some of our products have thousands of parameters, so it's not the best way to approach the task at hand.

Another possibility would be to take all of these parameters and to put them through some clustering algorithm to see, could we find groups naturally from the data that we have? I want to use the JMP- PyAPI interface that we have here. Jarmo already explained briefly how these work, but I will demonstrate it.

T he intention that I have now is to make a HDBSCAN . I'm going to make the scan on all the probe parameters. I'm going to use the default settings. Default settings are typically already quite good. And I'm going to send this... I'm not going to make a big stack. I'm going to send this setting straight for analysis. W e can see rather quickly, the algorithm came back and suggested that I have a cluster. There are actually three clusters and one grouping of wafers which do not, in fact, belong to any of the clusters. Knowing that I have five products, I'm going to go with this for the sake of demonstration. I can see from here a histogram of the number of wafers in each cluster, but it doesn't really give me a good visualization of what's going on.

I'm going to also do a dimension- reduction procedure. I f I go back into the same interface, and now I'm going to do a teeth knee dimension reduction on the same parameters and send it immediately. Wait for the dimension reduction algorithm to do its job, and it will return back two components for teeth knee, one and two, against which then I can actually visualize the clusters that the HDBSCAN gave me such that if I now plot teacher 1, teacher 2, and colour code them in accordance with the clusters that have already been identified.

As I said, we have three clusters and one grouping of wafers which don't necessarily belong to a cluster. Maybe somewhat disappointing knowing that I have five different products. T hankfully, I have an indicator of the product. It's here. I said, this is actually frustrating because now I have two different products being clustered as being the same. I n actual fact, this is the medical application of the same automotive part. I n fact, the parts are identical so them being in the same cluster is not a problem.

This part is rather unique. It's different to the other products in the same family, such that it got its own cluster with a few exceptions, so it's quite good. T hen the B2 and the B4 versions basically have the same design. W hat I'm concerned is that the B4 has been allocated a cluster 1 and also a lot of minus ones for wafers in the same product type. I'd like to further investigate what this might be due to so that I have scripted to the table, I want to make a subset of this SENSORTYPE NR SA AB4, and then I'm going to plot the differences for every parameter by cluster minus one and cluster one.

H ere we see the parameters in question, and the biggest differences are observed for Orbot 1 and Orbot 2. I'm not going to get into the parameters themselves, but just suffice to say that some parameter differences are bigger than others. Now that I know that these exist, I'd like to check across all the wafers in this subset, how does Orbot 1 and Orbot 2 actually look? H ere we see, in fact, that the ones which have been allocated minus 1 are not belonging to the cluster itself have a much higher value of Orbot 1. In fact, this anomaly is a positive thing, because the Orbot value, the higher it is, the better. W e see that there's quite a large group of wafers having exceedingly larger values of Orbot than what we would typically see.

T he next step, of course, would be then to do a commonality study to figure out how has this happened, where have the wafers been, what has the process been like, and look for an explanation. Well, we can see that very quickly, a multi product, multi- parameter evaluation of outliers or anomalies can be very quickly performed using this method. I will now move on to the second demonstration.

Just need to open up another file. T his application is very different. I t's very much trying like... Or actually, it is a collection of functional data. In fact, there are bond curves, curves which occur in our anodic bonding process when we apply temperature, pressure, voltage across a wafer stack to have the wafer, the glass and the silicon bond together. If we look at individual wafer curves, we can see that each wafer has a similar but still unique curve associated with it. We can see the bonding process time and the associated current.

T he task I would like to... The goal I would have, if I just remove the filter, I would like to know, without having to look through, in this case, 352, but we would have thousands of these every week, how many different types of curves do I actually have in my process? T hen tying that in with the final test data, can this curve be used to indicate a quality level at the end of the line?

In order to do this, I'm going to split the data set. N ow I put the time axis across the top and the current through each column. The first thing that I do after doing this splitting then is to again go back to our PyAPI interface and I'm going to look at Split Data. W hat I want to do is to make a dimension reduction because you can see that I have many, many columns, and it would be much better that I can reduce the dimension here.

A gain, I'm going to do a teach- me analysis. I'm going to send it straight to the server, and we can see that the algorithm has come back with two components. I can demonstrate them very quickly what they look like. T he 352 wafers which were represented by functional data, curve type data a few minutes ago are now represented using a single point for each wafer.

Now, having reduced the dimension of the data, I'd like to perform a cluster analysis next. A gain, I'll go back to my AyAPI. I'm now going to do a HDBSCAN on the titanium components. I just need to check on this analysis what would be a suitable level. If I send it immediately, I get, colour code, the cluster, you can see that.

Now clusters have been allocated to the teach-me components. This is the first level analysis using the teach-me, sorry, using the HDB defaults, I could, of course, try another setting. I could perhaps run, maybe, if we think out loud, 25 wafers, a batch of wafers, and half- wafer batches are things that would be of interest to me, and look to see what would this cluster now look like. N ow all of a sudden, I have much more clusters. O f course, it does take some subject matter expertise.

You need to know what clusters you would expect. In this case, I said, okay, a natural rational group for us within the manufacturing would be a bunch of wafers, a lot of wafers, wafers are posted in 25 wafer batches. S ometimes we have halfway for batches, which we do experimental runs on and so on and so forth. N ow we can see that we have clusters associated with the different types of curves. I'm going to shorten this demonstration rather than you watching me do joins and so on and so forth. W hat I'm going to do is I'm going to take from the original data, I'm going to put this cluster into the original data. I t's of course, opening on another screen.

I f I do cluster overlays, we can see... T his is the original data where at first I showed you each individual wafer bond curve. Now we can see that we were able to identify the distinct differences between seven clusters and one group of wafers which don't belong to any particular tester. W e can see that very quickly, we've been able to go through large numbers of wafers, determine similarities between them, and come up with clusters.

If you bring this even one step further, we can take a look at the actual teach-me components, the coloured clusters, and have a quick look at what do the actual contents... W e can see this is cluster minus one. They seemingly have something which has a very high bond current at the very beginning, cluster zero, very high bond current at the end. Y ou can see that if we were to spend enough time on this, you would see lots of similarity between bond curves within each cluster. A short demonstration on how to take functional data from hundreds of wafers, cluster them, and with them the various visualization techniques within JMP, how to clearly identify and present so that people understand the different groupings that exist within the data sets.

This concludes my demonstration number two. I have one more demonstration. This is maybe in some respects, for some, maybe a fun demonstration, so that longer to take... Again, it's not a real wafer, but I'm playing with the idea that I have a silicon wafer and there are some noise. This is a defect layout from an automated an inspection tool, this data has been simulated.

The purpose of having this simulation is to look for scratches or patterns found from defect data layout. This is rather easy and straightforward if I don't have noise. I can see that there's noise associated with this data set. W hat I want to determine is, can I find a way to identify these three spirals, assuming that they simulate some scratch. In fact, they're not very similar to a scratch, except they are patterns having high- density defects in a small area. T hat's the main purpose of using it, rather than showing you actual wafer automated visual inspection data.

The idea is that the task at hand is try to identify the spirals from this data set. I'm going to use, again, a trust ring method. A gain, it will be... The table I will use spiral data with noise. As Jarmo pointed out, we can run because if I don't know, obviously putting the number of wafers in here, 25 and 12 won't help me because I'm looking at a single wafer. T he numbers I put in should be somehow representative of how many defects are typically seen within a scratch and what are the smaller sample sizes associated with clusters and so on and so forth, minimum samples.

Being a complete novice, I don't know. I'm going to put in some numbers to play with. Twenty five would be minimum cluster size with a minimum sample size of zero. Add to stack, and then I say, Okay, well, this is rather inexpensive to do so I'm going to add...

You're missing the columns.

Oh, sorry. Thank you. This will help. Let me clear stack in my enthusiasm to move forward. I did not include what I should have. L et me start again. Thank you.

Twenty five minimum cluster size, minimum sample size, add to stack. Fifty minimum cluster, add to stack. Seventy five. I'm allowing the scratches to be bigger and bigger. Add to stack, 100. Are not necessarily bigger and bigger, but they would have more and more defects associated with them. Add to stack. And then I'm going to add another combination of 75 too, add to stack. I could just take one of these and run it. I could select one and run, but I'm not. I'm going to be greedy. I'm going to run the whole stack at the same time.

I'm going to run one, two, three, four, five cluster analysis against the data that I've represented, I've taken it from this wafer. I send the whole stack, and cluster, something has gone wrong. All my clusters are showing minus ones. Let me try this again. To make a long story short, and also the fact that this is being recorded and we don't want to start again from the beginning.

I know that at the end, that if I take this... I'm not sure why this has disappeared, but let me try it one more time. The table I need is the noise table. I'm taking HDBSCAN X, Y features, X, Y, 20. I'm going to make a shortcut, 75 and two, send immediately. Now, thankfully, I don't know whether I had selected incorrectly last time, the table or whatever. Now that we're here, put up a few thumbies, send it immediately, and so on.

As I said, we could have run quite many. The idea then is to look then at the layout and try to determine. I s it with this particular setup, finding good clusters and it's a minus one? I t says, no, you're not finding anything there. T hen if I colour code by the other clusters, it has in fact found quite well lots of points that don't belong to any cluster. T hen three individual spirals which are very well identified. Y ou think, what's the benefit of this? Well, now that I know what typical scratch content looks like, then I could in fact, then open up another wafer.

If I open up data from another wafer, make the plot of the layout, we can see that there are no scratches on this wafer, it's only noise. W hat would happen then if I run the same setup? My wafer is another wafer. I'm doing it on X, Y. I'm looking to determine based on my best settings of how I should be able to find scratches 75 and two, send immediately and plot with clusters. We only have minus ones, so nothing has been detected has been a scratch.

H aving this possibility to be able to run this algorithm against wafers on the database, then I could make a collection of wafers that have scratches, don't have scratches, or spirals in this case, and then use that data for an input to a commonality study to try and determine which machines in the production line are coming, are resulting in the scratches on the wafers. This concludes the third demonstration. Now I'll hand it back to Jarmo.

I'll take that. W e have a couple more slides to left. Here is a couple of ideas we have for possible future development using DoE approach for the stack building, basically what Philip did by hand, but used DOE, so I had middle max values and so on, and then sent that whole stack. Then metadata viewer, so you can compare the results, try JMP 17's new multiple HTTP requests, local server, so we don't rely on the server being up. Try the new hopefully updated native JMP Python Integration. This would allow us to have faster data transfer, possibly the more... W e could start testing with this application, then try, for example, running from graph builder, we could trigger the functions, combining different endpoints.

F irst, we could input the data from t-SNE based on the input that data and then automatically cluster the t-SNE . T hen, of course, we're always adding new endpoints if we find out what we want to have. Last slide is that we will be sharing small sample of the code. There will be a JMP file with the JMP script, Python script, and installation instructions there. Y ou can try to be quite simple user interface which will send data to local server and you will get the data back. It also has some ideas in the instructions sheet that you can try to implement if you're interested in trying this approach for the JMP Python Integration. That's for us. Thank you.

Thank you also from me. If you need to contact us, you can do so via the community.