cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
Get the free JMP Student Edition for qualified students and instructors at degree granting institutions.
Choose Language Hide Translation Bar
View Original Published Thread

Data Mining for Solutions in Root Cause Investigations

Attention all Manufacturing Engineers!

 

This is a companion activity to the Mastering JMP session Getting a Handle on Optimizing Day-to-Day Process Manufacturing Operations.  

 

During the presentation, a method is presented for using Data Mining to find solutions for root cause of process excursions.  From personal experience I can tell you this method works very well and can help save time and resources investigating possible root cause, helping the team focus on the most likely prospects.  As a follow-on to the discussion and a DIY exercise, here is another example you can try.  

 

So here is the scenario.  During a routine process review, an unusual pattern was discovered between Process Monitor 2 and Process Control 5.  There is a diagonal cluster of points that looks very unusual and the team wants to find root cause for this pattern.  The distributions for Monitor 2 and Control 5 look completely normal, so root cause remains a mystery.  Your job is to use data mining to find root cause for this issue.  The steps to do this are similar to the Mastering JMP discussion, with a couple twists to work through.

 

Please feel free to enter replies to this post and I will accept the first correct solution.  However, as a spoiler alert, please don't read the comments until you have tried to solve it.  Also, if you have used this technique to solve real world problems, it would be great to hear about that.  Keep in mind, the technique may not uncover an unambiguous answer, rather narrow the list to the most promising prospects, which is still a win in my book.

 

Good luck, have fun and happy sleuthing!

Hyde

 

Visualize the Issue

HydeMiller_0-1738347487102.png

 

 

Individual Distributions Look Normal

HydeMiller_1-1738347487763.png
Comments
jthi

This post isn't really meant as "answer" but rather just as a post for discussion.

 

When I did work as manufacturing quality related role I did follow similar process: visualize the situation and then identify some sort of groups, mark them and try to predict/explain them. Luckily it is pretty easy to just try different methods in JMP such as clustering or partition analysis after you have manage to create the groups.

 

I also like Sankey Plot for visualizing purposes for such situations if there aren't too many columns. Also by default in JMP it is a bit difficult to change columns, but you might already have a good idea of the potential cause before you even start with Sankey

jthi_13-1738414543542.png

View more...
jthi_0-1738414890285.png
View more...
I did end up creating Enhanced Sankey Plot add-in to make using Sankey Plot in graph builder easier for EDA (exploratory data analysis) in cases like this and it also "supports" using partition or predictor screening/bootstrap forest to assist in the analysis by auto-ordering the columns
jthi_0-1738408722325.png

After auto-ordering you would most likely end up with something like thisjthi_1-1738408830688.png
Usually good idea to set some sort of ordering to immediately see it isn't anything time related, just Batch in this case

In the case of this Hands on Activity picking the points can be a bit difficult by just utilizing lasso tool but luckily we managed to visualize the as a line -> we can create new column which is ratio between Monitor2 and Control5 just by using quick formula to select the values easier.

jthi_2-1738408959025.png

And we can immediately see high spike in the column header graph

jthi_3-1738408976489.png

At this point we can also take a quick look over our other column header graphs to see if anything looks alarming

jthi_4-1738409140641.png

Maybe factory but still we have values in multiple categories so it won't explain all of our results.

 

If we want to have more accurate selection we can use JMP's interactivity with Distribution + Graph Builder for example

jthi_5-1738409654150.png

We can select those and create new column with name selection in column, or just use that new ratio column as our response in partition analysis / prediction screening (it won't be as obvious if we just use our ratio column but it is very quick way of checking if we have something). I think Predictor Screening and Partition are good starting points, but if there aren't too many columns we can just also just go with visual option using Column Switcher in Graph Builder

View more...

jthi_7-1738409932485.png
jthi_6-1738409900263.png
jthi_8-1738409969894.png
jthi_9-1738410420171.png
These do give indication of taking more accurate look on Factory and especially Internal.

Response Screening can also be pretty nice platform from time to time (even better if you utilize script to run fit y by x automatically based on selections on the table my script doesn't work well in this case)

jthi_11-1738411847648.png

 

chrsmth

I didn't do anything beyond what was taught in the webinar as I just highlighted the Control 5 Monitor 2 "strange line" data from the Graph Builder chart using the Lasso tool, then used the highlighted rows in that selection and made a new column, naming those highlighted rows OOC and the other columns as good.  Then did the Partition analysis of all of the other remaining factors and found out most all of my OOC data points came from the Internal Factory.  I went back to the Graph Builder and did a Local Data Filter for "Factory" and verified that the cause of the Control 5 Monitor 2 "strange line" was due to the Internal Factory when the Control 1 Measurement was >57.3.

@chrsmth Thanks for trying this so fast!

Attachments