Data Mining for Solutions in Root Cause Investigations

1 Kudo

Attention all Manufacturing Engineers!

This is a companion activity to the Mastering JMP session Getting a Handle on Optimizing Day-to-Day Process Manufacturing Operations.

During the presentation, a method is presented for using Data Mining to find solutions for root cause of process excursions. From personal experience I can tell you this method works very well and can help save time and resources investigating possible root cause, helping the team focus on the most likely prospects. As a follow-on to the discussion and a DIY exercise, here is another example you can try.

So here is the scenario. During a routine process review, an unusual pattern was discovered between Process Monitor 2 and Process Control 5. There is a diagonal cluster of points that looks very unusual and the team wants to find root cause for this pattern. The distributions for Monitor 2 and Control 5 look completely normal, so root cause remains a mystery. Your job is to use data mining to find root cause for this issue. The steps to do this are similar to the Mastering JMP discussion, with a couple twists to work through.

Please feel free to enter replies to this post and I will accept the first correct solution. However, as a spoiler alert, please don't read the comments until you have tried to solve it. Also, if you have used this technique to solve real world problems, it would be great to hear about that. Keep in mind, the technique may not uncover an unambiguous answer, rather narrow the list to the most promising prospects, which is still a win in my book.

Good luck, have fun and happy sleuthing!

Hyde

Visualize the Issue

Individual Distributions Look Normal

jthi · ‎02-01-2025

This post isn't really meant as "answer" but rather just as a post for discussion.

When I did work as manufacturing quality related role I did follow similar process: visualize the situation and then identify some sort of groups, mark them and try to predict/explain them. Luckily it is pretty easy to just try different methods in JMP such as clustering or partition analysis after you have manage to create the groups.

I also like Sankey Plot for visualizing purposes for such situations if there aren't too many columns. Also by default in JMP it is a bit difficult to change columns, but you might already have a good idea of the potential cause before you even start with Sankey

I did end up creating Enhanced Sankey Plot add-in to make using Sankey Plot in graph builder easier for EDA (exploratory data analysis) in cases like this and it also "supports" using partition or predictor screening/bootstrap forest to assist in the analysis by auto-ordering the columns

After auto-ordering you would most likely end up with something like this
Usually good idea to set some sort of ordering to immediately see it isn't anything time related, just Batch in this case

I did end up creating Enhanced Sankey Plot add-in to make using Sankey Plot in graph builder easier for EDA (exploratory data analysis) in cases like this and it also "supports" using partition or predictor screening/bootstrap forest to assist in the analysis by auto-ordering the columns After auto-ordering you would most likely end up with something like thisUsually good idea to set some sort of ordering to immediately see it isn't anything time related, just Batch in this case

In the case of this Hands on Activity picking the points can be a bit difficult by just utilizing lasso tool but luckily we managed to visualize the as a line -> we can create new column which is ratio between Monitor2 and Control5 just by using quick formula to select the values easier.

And we can immediately see high spike in the column header graph

At this point we can also take a quick look over our other column header graphs to see if anything looks alarming

Maybe factory but still we have values in multiple categories so it won't explain all of our results.

If we want to have more accurate selection we can use JMP's interactivity with Distribution + Graph Builder for example

We can select those and create new column with name selection in column, or just use that new ratio column as our response in partition analysis / prediction screening (it won't be as obvious if we just use our ratio column but it is very quick way of checking if we have something). I think Predictor Screening and Partition are good starting points, but if there aren't too many columns we can just also just go with visual option using Column Switcher in Graph Builder

These do give indication of taking more accurate look on Factory and especially Internal.

Response Screening can also be pretty nice platform from time to time (even better if you utilize script to run fit y by x automatically based on selections on the table my script doesn't work well in this case)

These do give indication of taking more accurate look on Factory and especially Internal.Response Screening can also be pretty nice platform from time to time (even better if you utilize script to run fit y by x automatically based on selections on the table my script doesn't work well in this case)

chrsmth · ‎02-04-2025

I didn't do anything beyond what was taught in the webinar as I just highlighted the Control 5 Monitor 2 "strange line" data from the Graph Builder chart using the Lasso tool, then used the highlighted rows in that selection and made a new column, naming those highlighted rows OOC and the other columns as good. Then did the Partition analysis of all of the other remaining factors and found out most all of my OOC data points came from the Internal Factory. I went back to the Graph Builder and did a Local Data Filter for "Factory" and verified that the cause of the Control 5 Monitor 2 "strange line" was due to the Internal Factory when the Control 1 Measurement was >57.3.

gail_massari · ‎02-04-2025

@chrsmth Thanks for trying this so fast!

Hands-On Activities

Recommended Articles