Advanced Statistical Methods Applied to Glass Viscosity Prediction With JMP® (2021-EU-30MP-743)

1 Kudo

Level: Intermediate

Damien Perret, PhD, R&D Scientist, CEA
François Bergeret, PhD, Ippon
Carole Soual, MS, Ippon

Muriel Neyret, PhD, R&D Scientist, CEA

JMP software was implemented at CEA in 2010 by R&D teams who develop nuclear glass formulation. A first communication occurred at Discovery Summit 2011 in Denver, when we explained how we use JMP statistical analysis platforms to compare glass composition domains with a high degree of complexity.

Then, many improvements were made by developers to provide JMP with powerful methods for generating mixture DOEs, in order to investigate highly constrained experimental domains. During Discovery Summit 2014 in Cary, we showed how all these efforts enabled us to build even more accurate property-to-composition predictive models.

A very innovative methodology was recently developed by glass formulation scientists at CEA in collaboration with Ippon statisticians to predict the glass viscosity. Our approach is based on an automatic and intelligent subsampling of the data, and combines techniques of optimal designs and several predictive methods in JMP and JMP Pro. Predictions appear to be very accurate, compared to those obtained from other statistical models published in the literature.

Auto-generated transcript...

Speaker	Transcript
Damien PERRET	Hello welcome, and thank you for watching this presentation for the Europe Discovery Summit conference online.
	My name is Damien Perret. I am an R&D scientist at CEA in France, and I am along with my colleague and friend Francois Bergeret, statistician and the founder of Ippon Innovation in France.
	My name is Damien Perret. I am an R&D scientist at CEA in France, and I am along with my colleague and friend Francois Bergeret, statistician and the founder of Ippon Innovation in France.
	So with Francois, we are very happy to be here today, and we would like to thank the Steering Committee who gave us the opportunity to go about this one, which is about advanced statistical methods applied to glass viscosity prediction with JMP.
	So let's start with a few words about the French Alternative Energies and Atomic Energy Commission. CEA is a French government organization for research, development and innovation in four areas
	defense and security, low carbon energies, technological research and fundamental research. The CEA counts about 20,000 people on nine locations.
	We have strong relationships with universities through various joint research units, high amount of patents and start-up creation, with a budget around 5 billion euros.
Fran?ois Bergeret	statistics and data science, including studies, consulting and training.
	We are very proud to be general partners
	since several years. I'm also very happy to present with my friend Damien today.
	Ippon also proposes advanced solutions for zero defect and process control. I'm personally a JMP user since 1995 with JMP 3.
Damien PERRET	So our main objective in this work is to create statistical models to predict the glass properties, and for this talk today, we focus on the glass viscosity.
	To do that, experimental data are coming from both commercial database and from our own database at CEA. We wanted algorithms to be coded in JSL and implemented in JMP Pro 15.
	The response of the model is the glass property of interest, so viscosity for this example, and the factors are the contents of the different glass components.
	So, here are some background information. Glass is a non-crystalline solid. It is obtained by a rapid quench of a glass melt, and from a material point of view, a glass is a blend, a mixture of different oxides.
	So the number of oxides is variable, from two or three in a very simple glass to about 30 and even more in the most complex compositions.
	There is a long tradition in the calculation of glass properties and we think that first models were created in Germany at the end of the 19th century.
	Since then,
	the amount of published literature in the field of glass property prediction has tremendously increased, so that today we have a huge amount of glass data available in commercial database, which also offered and used to predict the
	glass properties.
	But despite of all efforts that have been made in the past to predict the glass properties, challenges remains for the prediction of the glass viscosity.
	And this, because the glass viscosity is a property that is difficult to predict. First the viscosity is very dependent of physical mechanisms that can occur in the glass melt, depending on the glass composition, like phase separation or crystallization, for example. And also
	the viscosity is the only property having such a huge range of variation up to certain orders of magnitude.
	So here is an example that shows this difficulty. We have selected three composition of SBN glass, which is a very simple glass, with only three ???.
	And we applied the best known models from the literature to calculate the viscosity.
	And then we compare the predicted values with the experimental value we have measured with our own device. So you can see that even for a very simple glass, it is not easy to obtain one reliable value for the predicted viscosity.
	So here is a picture, a good picture we like to use to give a view of the database, where each dot is one glass in a multidimensional view of the domain of compositions. So a data may come from isolated studies
	or we can have data coming from studies using experimental designs or we can have data
	obtained
	with the valuation of one component at a time.
	We spent a lot of time in the past to apply different machine learning techniques by using the part of the data found in the entire database.
	And a classical approach was used on a validation set but at the end, no statistical model with an acceptable predictive capability was found to predict the viscosity.
	So we have decided to use a different approach.
	So instead of using all the data, we think it's better to create a model by using data close to the composition where we want to predict the viscosity. So, for example, if we want to predict
	here on the
	red dots, one model will be created from the data we have in this area and a different model will be created if we want to predict
	the property
	on another composition. So that's why we say that this technique is dynamic. It's because the model depends on the composition. It is related and fitted where we want to predict.
	And we say it's automatic because we don't have to do this manually. Every step is done by algorithms implemented in the tool.
	So one of the most important point is certainly the determination of the optimal subset of data to create the model. For that we have implemented two methods of subsampling.
	So in the first method, a theoretical or virtual design of the experiment is generated
	around the composition of interest. And then each run of the design is replaced by the most similar experimental data present in the database, leading to the final training set.
	And the second method we have implemented in the tool is based on different sizes of data sets created around the composition of interest.
	A small data set is generated by the tool, and models are created on this small subset to predict the viscosity. And then bigger and bigger data sets are generated, and the optimal size is evaluated by statistical criteria associated to each subset.
Fran?ois Bergeret	Glass viscosity is not easy to predict, so we decided to use different statistics and machine learning method.
	Polynomial ??? models with transformation, generalized regression using a lognormal distribution. This method is very powerful using JMP Pro and can be give better results that the ??? models with transformation.
	We also use neural networks, very powerful in terms of prediction. As we have two data sets, as mentioned earlier, we have six predictions for each response.
	Next slide is a schematic, a view of the tool.
	Inputs are the composition of the glass and the temperature at which the viscosity has to be predicted.
	If we look here the code and the algorithm have been implemented for the two method we described just before.
	The strength of the tool is that, instead of getting only one prediction, six values are calculated with a statistical criteria associated with all data that can be evaluated by the user.
Damien PERRET	So, here are some of the key parameters. It is very important to take into account as many inputs from the glass experts as possible.
	For example, we had to create specific algorithms to enter with nature and the role of oxides on viscosity.
	Another point of major importance is related to the origin and so reliability of the data. For this, a significant amount of time in this project has been spent
	to the constitution of a reliable database. So we had to implement weights and we had to study different ways of calculating the distance...the distances between the glass compositions.
	So now it's time....
Fran?ois Bergeret	Okay I'm going to show the screen now.
	To show you a demo
	of the code, so you should see my screen now.
	And I'm just executing the code so it's a complex JSL program. We have been developing it during several months with CEA.
	So I just executed the code and now I'm going to show it to you.
	Discovery...so here I'm opening files for the code.
	And it's running, okay. The code is executing, so I will comment, a little bit.
	We have several loops in this code. Of course, the first step is to identify the data and the functions. And after that we have a loop, first of all, we have what we call the adaptive iteration and the reason for the database.
	So and
	because it is adaptive, as mentioned by Damien, you're looking for the best subset of data. And we have also here the design of experiments approach, whereby optimizing design, we are getting the right data.
	After that and it is running actually, we are predicting the glass transition temperature. Okay, and we have, as I mentioned, three models and for each models, we have two database, so we have a total of six predictions.
	So, as you see, it's a little bit long to execute, but it's lasting something like one minutes, and when it will be done, we will have all the output of the programs of the glass transition temperature and, of course, the viscosity.
	Using JSL has been very, very useful and, in addition, in terms of users, as you will see with Damien, it's very easy to use and to
	to use for the experts. So Damien, you can talk and stop my sharing.
Damien PERRET	Okay, so.
	Just one.
	Can you see my screen now?
	Yes, okay so.
	Okay, so this is a general statistical report created by the tool. So first, we have the composition of interest of the glass.
	And then
	we have
	on this graph, the predicted values. So on the Y axis,
	we have the predicted values of the viscosity calculated by the three algorithms and for the two methods. And on the X axis, we have the number of the enlargement for the second method.
	And in red, we have the median of the predictions, which can be sufficient for a non statistician user, but if we want to investigate the statistical details, we have a lot of information in this report to study quality of each
	each model. For example,
	we can check the values
	of the PRESSS or the different model. Here is for the multilinear regression BIC F model.
	So here we see that the PRESS values tell us that the prediction with method number one is a little bit better than for the second method, and we also see the model liquidation with the enlargement of the training set on.
	We also have
	different statistical values. For example, we have the R squared value for the different algorithm and for the different models.
	And here we have even more details on the model.
	For example, for the first method we can compare the theoretical and the actual design of experiment. We have the predicted...
	prediction formula for the different...for the different model. Also we have some information on the...on the estimates, and we also have many information for the...for the second method. So at the end, we have a lot of statistical
	details and information that are very, very useful to the user. And here,
	at the end, we have the composition of the most ??? of glass in the database, for which we have an experimental value of the viscosity, so this is very, very useful also.
	Okay, so
	let's go back to the PowerPoint.
	Okay, so this is the results we obtain. The
	two predictive capability was evaluated by extracting 230 rows that forms a global database. And in this table, we have the relative error of the prediction...of
	the viscosity prediction
	for different types of glass and for the global subset of data. Three quantiles are given as a median, meaning that 50% of the predicted values have a relative error that is below the value indicated in the table.
	And we also have the 75 and 90% quantiles in this table. So when we talk about glass viscosity, traditionally we consider that predicted error around 70% is very good. So we can see that for the majority of the data
	the model capability is fine and we were very happy with the results we obtained there.
	As a comparison,
	here are again
	the results we obtained for the very simple glass, SBN glass, with only three oxides when we applied the models available in the literature. So we can see that the value...values for the relative error of prediction were much higher and could vary a lot from one model to another.
	And again, this was for very simple glass with only three oxides.
	And in some case we have errors that are more important,
	but if we look at this data in detail, so here on this graph, we have
	the ??? values on
	x axis and experimental values on y axis, we see that this biggest error of prediction is obtained for for these two glasses
	coming from the commercial database SciGlass from ... the same reference, which is a patent and for which the experimental error of the equipment is not mentioned.
	And also for all these compositions that are high aluminum content, we think that crystallization is very likely to occur, and then we can't be totally sure that the experimental values were correct.
	And finally, we applied our methodology to predict another glass property, the glass transition temperature, which is an important property in glass technology.
	Here are the results we obtained, which are even better than for the viscosity. And here, so the overall relative error of prediction is below 5%, which is really good, because we know that this property
	can vary a lot, depending on the thermal history of the glass and depending on the experimental device. So here are the two capabilities, very close to the experimental error, which is very nice.
Fran?ois Bergeret	Okay, as a conclusion, one important feature of our approach is a dynamic subsampling of the global database. We address the right information around the composition of interest.
	In addition, using JSL and JMP Pro, we have automatized the machine learning models, general regression and neural networks are very performing.
	According to the CEA expert, accuracy is good and reveals some unexpected issues. We plan now to expand the models on a bigger database and also to work with Bradley Jones and maybe write a joint publication. Thank you.