Modeling of Product Quality Based on Certain Product Measurements (2022-US-EPO-1...

Welcome, and thank you for joining my poster presentation

at this year's JMP C onference.

My name is Kaitlin Shorkey ,

and I'm a senior statistical engineer at Corning Incorporated.

How do you get a glimpse of a product quality

before it completes the production process?

We chose to build a model that will predict the product quality outcome

before it has completed the entire process.

There are two major benefits of this approach.

One is the operations team

receives instant feedback on how the parts will perform

and can adjust the process in real time.

The second is that we can dem the product acceptable or not.

Like I just mentioned, the main objective of this work

is to build a predictive model using a few modeling approaches

to understand and predict product quality

based on certain product measurements.

Our major steps in building this model

are data collection, model development, testing and implementation.

First off, for the data collection phase, parts are collected at the end

of the production line and appropriate product measurements are completed.

The parts are then subjected to the quality test.

The product measurements and quality measurement results

are combined into a data set and used for building the model.

In this case, the quality measurement results

are combined into a data set and used for building the model.

In this case, the quality measurement is the response

and all the product measurements are the predictors.

The dataset consists 767 predictors and 990 observations or parts.

This stuff can take a long time to execute.

Since we're building a model, it's important to get as large of a range

of product measurements and quality measurement results as we can.

If we leave this the accuracy and model predictions

are more consistent across the range.

Essentially, this allows the model to accurately predict at all levels

of the product quality results.

Once the dataset is compiled,

it is thoroughly examined to ensure it is as clean as possible.

After the data collection and cleaning,

the second phase of model development is started.

For this, we begin with variable clustering.

Step by regression, remove highly correlated variables

and select the most important ones.

With so many predictors we first apply variable clustering.

This method allows for the reduction in the number of variables.

Variable clustering groups the predictors,

the variables into clusters that share common characteristics.

Each cluster can be represented by a single component or variable.

A snippet of the cluster summary from JMP is shown,

which indicates that 85 % of the variation

is explained by clustering.

Cluster 12 has 49 members,

and V 232 is the most representative of that cluster.

The variables that are identified as the most representative ones

are then used in the next method of stepwise regression.

Stepwise regression is used

on the identified clustered variables to select the most important one

to us in the model,

and further reduces the number of variables.

For this, the forward direction

and minimum corrected AIC stopping rule is used.

The direction controls how variables enter and leave the model.

The forward direction means that terms are entered into the model

that have the smallest p-value.

The stopping rule is used to determine which model is selected.

The corrected AIC is based on negative two, law of likelihood,

and the model with the smallest corrected AIC is a preferred model.

From this, 51 variables are entered into the model

of the 99 available variables

from the variable clustering step.

At this point, we have reduced the number

of variables from 767 to 51

using variable clustering and stepwise regression.

The final method is to fit a generalized regression model.

For this, the law of normal distribution

is used with an adaptive lasso estimation method.

For this, the long normal distribution is used

with an adaptive lasso estimation method.

The law of normal distribution is the best

fit for the response, so is chosen to use in the regression model.

The adaptive lasso estimation method

is a penalized regression technique

which shrinks the size of the regression coefficient

and reduces the variance in the estimate.

This helps to improve predictive ability of the model.

Also the data set was split into a training and validation set.

The training set has 744 observations, and the validation set has 246.

From this, the resulting model produces a .81 generalized R-square

for the training set and .8 for the validation set.

These R-squares are acceptable for our process that we will now evaluate

the accuracy and predictability of the resulting model.

Now that we have a model, we need to review its accuracy

and predictability to see if it would be suitable to use in production.

In doing this,

a graph is produced that compares a predicted quality measurement

for a specific part to the actual quality measurement.

In the graph the xx shows the predicted value,

and the yx shows the actual.

Also, the quality measurement is bucketed into three groups

based on its value, which is shown

by the three colors on the graph.

In general, the model predicts the quality measurement well.

It does appear that the model may fit better

in the lower product quality range

than the upper, which may be due to more observations in the lower range.

As mentioned, the quality measurement

was bucketed into three different categories based on its value.

This was also done for the predictive quality measurement.

For each observation, if the quality measurement category

is the same as a predicted measurement category,

it is assigned to one.

If not, it is assigned to zero.

For both the training and validation sets, the average of these ones and zeros

is calculated and is used as the accuracy to measure.

We see that training set has an accuracy of 87.5%

and the validation set has an accuracy of 84%.

For the model to be moved to the testing phase,

accuracy must be above a certain limit,

and both of these accuracy values are.

This will allow us to move to the testing phase of the project.

In addition, we look at the confusion matrix

to visualize the accuracy of the model

by comparing the actual to the predicted categories.

Ideally, the off diagonals of each matrix should be all zeros,

with the diagonal from top left to bottom right containing all the counts.

The matrices show on the poster that the higher counts are along

that diagonal with lower numbers in the off diagonal,

but discrepancies still exist among the three categories.

For example, in the training set, there are 29 instances where the actual

quality measurement of the three was predicted as a two.

In the same case for the validation set, there are 12.

The confusion matrix helps to understand where these discrepancies are

so further investigations can be done and improvements made.

Overall though, the model has an accuracy above that requires limit,

where our next step would be the testing and implementation phases.

Now that our model is through the development phase,

it's time to test it in live situations.

For this, the model is used under engineering control

to determine how well it predicts the quality measurement

in small, controlled experiment.

This is done by the engineering team

with support from the project team when necessary.

Once the engineering team is satisfied with this testing,

the model is fully implemented into production and monitored over time.

In conclusion, this model development process

has allowed us to build

predictive models for the production process.

The methods of variable clustering,

stepwise variable selection in generalized regression

were the most appropriate and best students to use

for this application.

With further research and investigation,

other methods could be potentially applied

to improve model performance even more.

From a production standpoint, the benefit of this model

is that the operations team will receive instant feedback on how a part

or group of parts will perform, and can ingest the tune

and tune the process in real time.

We can also deem the product acceptable or not.

If rejected, the product is disposed of and will not continue through the process,

which over time reduces production costs.

Lastly, I'd like to give a huge special thank you

to Zvouno and Chova and the entire project team

at Corning Incorporated.

Thank you for joining and listening to my poster presentation.

Welcome, and thank you for joining my poster presentation

at this year's JMP C onference.

My name is Kaitlin Shorkey ,

and I'm a senior statistical engineer at Corning Incorporated.

How do you get a glimpse of a product quality

before it completes the production process?

We chose to build a model that will predict the product quality outcome

before it has completed the entire process.

There are two major benefits of this approach.

One is the operations team

receives instant feedback on how the parts will perform

and can adjust the process in real time.

The second is that we can dem the product acceptable or not.

Like I just mentioned, the main objective of this work

is to build a predictive model using a few modeling approaches

to understand and predict product quality

based on certain product measurements.

Our major steps in building this model

are data collection, model development, testing and implementation.

First off, for the data collection phase, parts are collected at the end

of the production line and appropriate product measurements are completed.

The parts are then subjected to the quality test.

The product measurements and quality measurement results

are combined into a data set and used for building the model.

In this case, the quality measurement results

are combined into a data set and used for building the model.

In this case, the quality measurement is the response

and all the product measurements are the predictors.

The dataset consists 767 predictors and 990 observations or parts.

This stuff can take a long time to execute.

Since we're building a model, it's important to get as large of a range

of product measurements and quality measurement results as we can.

If we leave this the accuracy and model predictions

are more consistent across the range.

Essentially, this allows the model to accurately predict at all levels

of the product quality results.

Once the dataset is compiled,

it is thoroughly examined to ensure it is as clean as possible.

After the data collection and cleaning,

the second phase of model development is started.

For this, we begin with variable clustering.

Step by regression, remove highly correlated variables

and select the most important ones.

With so many predictors we first apply variable clustering.

This method allows for the reduction in the number of variables.

Variable clustering groups the predictors,

the variables into clusters that share common characteristics.

Each cluster can be represented by a single component or variable.

A snippet of the cluster summary from JMP is shown,

which indicates that 85 % of the variation

is explained by clustering.

Cluster 12 has 49 members,

and V 232 is the most representative of that cluster.

The variables that are identified as the most representative ones

are then used in the next method of stepwise regression.

Stepwise regression is used

on the identified clustered variables to select the most important one

to us in the model,

and further reduces the number of variables.

For this, the forward direction

and minimum corrected AIC stopping rule is used.

The direction controls how variables enter and leave the model.

The forward direction means that terms are entered into the model

that have the smallest p-value.

The stopping rule is used to determine which model is selected.

The corrected AIC is based on negative two, law of likelihood,

and the model with the smallest corrected AIC is a preferred model.

From this, 51 variables are entered into the model

of the 99 available variables

from the variable clustering step.

At this point, we have reduced the number

of variables from 767 to 51

using variable clustering and stepwise regression.

The final method is to fit a generalized regression model.

For this, the law of normal distribution

is used with an adaptive lasso estimation method.

For this, the long normal distribution is used

with an adaptive lasso estimation method.

The law of normal distribution is the best

fit for the response, so is chosen to use in the regression model.

The adaptive lasso estimation method

is a penalized regression technique

which shrinks the size of the regression coefficient

and reduces the variance in the estimate.

This helps to improve predictive ability of the model.

Also the data set was split into a training and validation set.

The training set has 744 observations, and the validation set has 246.

From this, the resulting model produces a .81 generalized R-square

for the training set and .8 for the validation set.

These R-squares are acceptable for our process that we will now evaluate

the accuracy and predictability of the resulting model.

Now that we have a model, we need to review its accuracy

and predictability to see if it would be suitable to use in production.

In doing this,

a graph is produced that compares a predicted quality measurement

for a specific part to the actual quality measurement.

In the graph the xx shows the predicted value,

and the yx shows the actual.

Also, the quality measurement is bucketed into three groups

based on its value, which is shown

by the three colors on the graph.

In general, the model predicts the quality measurement well.

It does appear that the model may fit better

in the lower product quality range

than the upper, which may be due to more observations in the lower range.

As mentioned, the quality measurement

was bucketed into three different categories based on its value.

This was also done for the predictive quality measurement.

For each observation, if the quality measurement category

is the same as a predicted measurement category,

it is assigned to one.

If not, it is assigned to zero.

For both the training and validation sets, the average of these ones and zeros

is calculated and is used as the accuracy to measure.

We see that training set has an accuracy of 87.5%

and the validation set has an accuracy of 84%.

For the model to be moved to the testing phase,

accuracy must be above a certain limit,

and both of these accuracy values are.

This will allow us to move to the testing phase of the project.

In addition, we look at the confusion matrix

to visualize the accuracy of the model

by comparing the actual to the predicted categories.

Ideally, the off diagonals of each matrix should be all zeros,

with the diagonal from top left to bottom right containing all the counts.

The matrices show on the poster that the higher counts are along

that diagonal with lower numbers in the off diagonal,

but discrepancies still exist among the three categories.

For example, in the training set, there are 29 instances where the actual

quality measurement of the three was predicted as a two.

In the same case for the validation set, there are 12.

The confusion matrix helps to understand where these discrepancies are

so further investigations can be done and improvements made.

Overall though, the model has an accuracy above that requires limit,

where our next step would be the testing and implementation phases.

Now that our model is through the development phase,

it's time to test it in live situations.

For this, the model is used under engineering control

to determine how well it predicts the quality measurement

in small, controlled experiment.

This is done by the engineering team

with support from the project team when necessary.

Once the engineering team is satisfied with this testing,

the model is fully implemented into production and monitored over time.

In conclusion, this model development process

has allowed us to build

predictive models for the production process.

The methods of variable clustering,

stepwise variable selection in generalized regression

were the most appropriate and best students to use

for this application.

With further research and investigation,

other methods could be potentially applied

to improve model performance even more.

From a production standpoint, the benefit of this model

is that the operations team will receive instant feedback on how a part

or group of parts will perform, and can ingest the tune

and tune the process in real time.

We can also deem the product acceptable or not.

If rejected, the product is disposed of and will not continue through the process,

which over time reduces production costs.

Lastly, I'd like to give a huge special thank you

to Zvouno and Chova and the entire project team

at Corning Incorporated.

Thank you for joining and listening to my poster presentation.

Modeling of Product Quality Based on Certain Product Measurements (2022-US-EPO-1085)

Presenter

Files

Advanced Statistical Modeling

Basic Data Analysis and Modeling

Consumer and Market Research

Content Organization

Data Blending and Cleanup

Data Exploration and Visualization

Design of Experiments

Predictive Modeling and Machine Learning

Quality and Process Engineering

Sharing and Communicating Results