Choose Language Hide Translation Bar
charles_chen
Level II

Developing a JMP-Based Six Sigma DMAIC Training Curriculum for Data Scientists (2021-US-30MP-872)

Level: Intermediate

 

Charles Chen, Master Black Belt Q&R, Applied Materials
Mason Chen, Black Belt Student, Stanford OHS

 

Traditional Six Sigma curricula that include DMAIC, DFSS and Lean were not developed specifically and effectively for today’s AI data scientists. This paper demonstrates an innovative Six Sigma training curriculum for data scientists, using several objectives:

  1. Adopt modern JMP Text Explorer and datamMining techniques for root cause analysis and problem solving.
  2. Integrate various JMP platforms holistically to analyze pattern recognition or discover the insights.
  3. Enhance predictive modeling capability through neural, partition, and principal component analysis.
  4. Utilize modern quality and process platforms such as goal plot model driven multivariate SPC.
  5. Map these modern JMP platforms into the Six Sigma DMAIC/DFSS/Lean framework for Six Sigma Project execution.  

This innovative Six Sigma curriculum is applicable to industrial professionals, both for managers and individual contributors who make critical decisions in big data AI business. This curriculum is not just for data scientists; it is also a powerful tool for design and process engineers, quality and reliability engineers, supply chain engineers, business analysts, statisticians, and marketers who want to become reliable decision makers and true project leaders.  

 

 

Auto-generated transcript...

 


Speaker

Transcript

Jason Wiggins, JMP Okay, go ahead when you're ready.
  Okay.
  Hello everybody.
  My name is Charles Chen. Today we are going to talk about an interesting topic regarding the STEAMS and the DMAIC curriculum for data scientists using JMP 16.
  We're three authors. Mason Chen is the first one. And I'm Charles, the second one, and Patrick is the third one. We are from the Stanford Online High School STEAMS Club. Mason Chen is a high school junior and is the club leader. And I and Patrick are the advisors for the club.
  The project overview, the opportunity statement, the traditional Six Sigma DMAIC process, combined with the interdisciplinary STEAMS methodology
  can help data scientists make a greater contribution in the field of big data. And our project objective try to develop a Six Sigma data science training curriculum for high school schoolers
  all the way to the industry professionals by mapping the JMP 16 platform onto the DMAIC phases.
  Before we start, the audience may be curious, hey, the Six Sigma is for the professional, and the data scientist is also professional.
  So how come we mention this also for the high schoolers? So here we want to use the case study for our first author, Mason Chen.
  His many experience first start when he was 10 years old. He started to receive the data scientist training and also try to certify the IBM SPSS and Minitab.
  When he was 11 years old, he was IBM Modeler Data Mining certified. Then he moved to the JMP, studying about 12 or 13 years old.
  And then JMP STIPS and the DOE and certification exam and also the JMP 16 text mining. And the reason that this year, he also now finished the linear algebra
  college courses and also data science through our program. So today, we are going to talk about based on his experience, also the advisors, we try to introduce, how can we use JMP 16 to
  create the Six Sigma data science program. And also Mason is moving to the R in the future. He tried to learn R, so he can also learn the JSL language.
  And that's Mason's picture, when he was attending the ASA conference two years ago.
  So our project tries to connect the STEAMS and the DMAIC. So STEAMS is the Mason founder STEAMs program about 2017 from the
  traditional STEM program. So what's the difference? So the science, technology, engineering, mathematics still the same, understand?
  But we added artificial intelligence and also statistics, [so it] becomes STEAMS. And the data science is a popular combo of artificial intelligence, mathematics and statistics.
  And the traditional Six Sigma DMAIC map fairly well to the STEAMS methodology, fairly well. You can see a lot of close thinking. (??) Some other problem solving tools
  from the DMAIC. They also map fairly well to the science field, or even engineering, problem solving. With the data science, they
  may even promote more from STEAMS to the DMAIC. So that is how we think we can develop the Six Sigma data science program for the high schoolers based on the STEAMS methodology.
  So the first step, once the JMP 16 released earlier this spring, that we look at all the JMP 16 platform and identify
  what are the platforms that could be good for the data scientist. So we based on the big data the three Vs - volume, variety also velocity.
  And, based on the three Vs, which are the tools or the platforms that may fit well to the data science program, so such as a Graph Builder,
  Text Explorer, tables, predictive modeling, screening, multivariate methods, clustering, quality method, problem solving. Now we are going to map all these tools to the DMAIC phases.
  Also through learning the JMP 16 platforms, we also try to understand also [inaudible] behind these JMP tools.
  And we find out the data science statistics may cover the following modules. The first one, the traditional DMAIC for the quality and reliability engineering, such as MSA or SPC.
  And the next will be the Design for Six Sigma, most for the modeling, DOE modeling, the Monte Carlo simulation, robust tolerance.
  Then we find out linear algebra is very important for the data science, such as principal component analysis or the singular value decomposition. And next one will be the data mining, including the classification, neural network, partition or random forest.
  And time series or forecasting is also very important for the data science to handle the time (?) data, so we find out that such as the ARIMA model.
  And text mining also is good for the data not structured. And survey and the consumer research also very important to understand because of the voice of the customers.
  Also, how to do the marketing segmentation. So all these statistics we found out all are critical to know in order to utilize a JMP platform.
  Then we try to map
  all these JMP 16 platforms to the traditional Lean Six Sigma BB modules. So we identify the traditional BB modules, and we develop three
  training programs. The program A is the more traditional DMAIC Black Belt training program, and these are the JMP 16 platforms associated with. The second one is the more modern data mining
  [inaudible] categorical data. So we try to speed (?) the JMP 16 platforms to each different training program. And because the students or the training, they may have a particular interest, so we can customize
  their particular interest and field.
  From now on, we'll try to map all the tools or the concepts to the DMAIC phases. The first phase is the Define phase. The main concept for the Define phase is to try to define the problem statement, including the voice of the customers or the voice of the business.
  And also, we need to define the project goal and objective, especially about the Critical to the Quality CTQ, in Six Sigma language.
  They also need to define the success criteria and any spec limit.
  Also in Define phase, team building is very important for forming, storming, norming and performing.
  And the associated JMP 16 platforms, like build database. Query Builder is very powerful. Data visualization is important.
  Data mining try to cluster the customers. Market research, like consumer research, try to do the marketing segmentation; it is very important. On the right hand side is the simple example about how we do the marketing segmentation using the clustering.
  Clustering affinity analysis. We probably can group the customers and then try to set up a strategy about what's the marketing priority.
  The second phase is the Measure phase. The main focus is on process capability and the process stability, especially for the larger-scale manufacturing production.
  We try to find three
  powerful tool from JMP. The first one is a goal plot. So goal plot can plot the lot-to-lot process capability and into the two dimensions.
  Through the different colored zones, we know that [inaudible] and it's not suitable.
  The middle process performance plot is also very powerful because they combine the process capability and stability. If it can pass both criteria, you'll be on the upper left, so that means in green. If you fail both criteria, you'll be in the lower left or lower right.
  On the right hand side is a process history explorer so they can list all the historical
  past performance. And through this kind of list, you may identify what kind of factors associated with the poor yield.
  The next phase is Analyze phase. In Analyze phase, root cause analysis, summarize complex data sets, visualize and discover patterns and insights, isolate and screen for important factors.
  Based on these Analyze subjects, we also identify the fishbone diagram, Tabulate, Text Explorer, multivariate based methods, clustering, also the categorical response analysis.
  For this slide, we want to introduce the Pareto plot. The JMP 16 Pareto plot, they can do the two-dimensional, so they can find more combination (??) or the pattern recognition among different kind of factors.
  And the fishbone diagram was always a very powerful and useful for the real time, work on section, you can.
  design or customize the fishbone diagram.
  For the data summarization, Tabulate is the one we highly recommend. It is like
  the Excel pivot table, [inaudible] convenient and powerful
  because they can also add descriptive statistics.
  In the middle, the Text Explorer is so powerful for analyzing the text database, so they can search the keyword and the phrases, then we can even convert the text mining to the data mining.
  On the right hand side is for categorical data to find associations between different kinds of factors.
  And the different colors, they present different kind of variables, so for any pair they are close to each other and they are far away from the origin and that's the focus about a high association; that may tell you something or some kind of insight among these variables.
  The next phase is the Improve phase.
  For the Improve phase, we try to build predictive models. We try to design new experiments as needed. We try to improve the production quality.
  Regarding the JMP 16 platforms for predictive modeling, we highly recommend the Prediction Profiler or custom profiler. And DOE, we can
  have different kinds of custom DOE, mixture DOE, or even the DOE augmentation. For the specialized models, like machine learning, right, we have the JMP neural network, partition model, also different kinds of screening.
  For the survey and consumer research,
  JMP has the choice model, also the maximize difference design model, too.
  For the design optimization,
  On the left hand side, the Prediction Profiler, we can do the sensitivity analysis. We can do the Monte Carlo simulation so we can simulate the non-conforming [inaudible]
  percentage.
  In the middle, if we have many factors in the model, the Custom Profiler will be the good choice to find the optimal model amongst so many factors.
  On the right hand side, for group orthogonal supersaturated DOE. This is the very powerful DOE to analyze for an option to the downstream. And they use a blocking factors or concept in order to minimize the number of the DOE runs, to reduce the cost.
  For the predictive models, we will recommend a neural network, a very powerful
  transformation and try to find the best model to make it
  get a higher training and validation
  fitness. For the partition, this is a binary split. So they can split the data set into different kinds of categories, so they can
  highlight also conclude (??) the major contribution.
  For the design optimization, we also recommend some kind of screening platform available in JMP 16. The first one on the right hand side is response screening.
  JMP uses the
  FDR, false detection rate, to determine how good is your prediction modeling. So if your curve keeps lower, okay, and
  and that means you have good prediction modeling. When you're going up, and that means this portion, you don't have good prediction capability.
  For the middle one, it is process screening. So you can see all the process parameters. It could be input parameter. It could be output deliverables.
  And JMP uses the green color and the red color to identify the up shift or down shift. So if you see more green or red, that means your policy is not very stable.
  For the right hand side, also very powerful. It ranks all the predictor contribution, so they'll give you first the top few, or what we call the vital few, the predictors. So you can find root causes or find the solutions.
  For the consumer research, JMP had choice design
  to help the consumer, how to pick their best product or the choice.
  For the maximize difference design, and this is also the other survey design. For the survey, you pick the most also the least preferred items. So JMP can run the model to rank your preference.
  For the last Control phase, [inaudible] scale up process control, sustain improvement over long period, upstream to downstream, multivariate process control.
  And then the JMP platforms will be classical control chart, time sensitive control chart, multivariate control chart. Consumer research will be the multiple factor analysis. Also the time series analysis, like decomposition, smoothing and ARIMA model and forecast.
  For the multivariate control charts on the left-hand side, we can find change point detection. So this is the point that will give you the biggest contrast before and after. So these are [inaudible] the root cause analysis about what happened or when is the most,
  biggest change point. In the middle, is the T Square.
  Model driven control chart. So this will give you the the failure mode,
  decomposition about what are the parameters that contribute most to the OOC point. On the right hand side is a multiple factor analysis.
  Based on the eigenvalue, eigen factor analysis. So this is like a affinity diagram. So they can group the similar factors together, based on the eigenvalue, eigen factor.
  For the time series analysis on the left-hand side, they can do the model diagnostics. So they can identify the trend seasonal and cyclical components.
  In the middle, they can use the ARIMA models to fit the data using the seasonal or non-seasonal models. For the forecasting, they can find the optimal model to predict the future point.
  So the takeaway from this talk...Traditional Six Sigma DMAIC and interdisciplinary STEAMS method can help develop data scientist on leadership and team building.
  Modern JMP 16 platforms are mapped to the DMAIC phases to help deploy Six Sigma projects in data science fields. Database management, applied engineering, statistics, data mining and text mining are all critical to today's data scientific analytics.
  This will conclude our presentation today, and thank you very much for your time.
Comments

Presentation PDF file uploaded.  Welcome any feedback from JMP Professionals.