Driving Product Development Through Modelling Historic Data in JMP® (2021-EU-30M...

Stuart Little, Lead Research Scientist, Croda

This presentation will show how some of the tools available in JMP have been successfully used to visualize and model historic data within an energy technology application. The outputs from the resulting model were then used to inform the generation of a DOE-led synthesis plan. The result of this plan was a series of new materials that have all performed in line with the expectations of the model. Through this approach, a functional model of product performance has been successfully developed. This model, alongside the visualization capabilities of JMP, has allowed for the business to begin to embrace a more structured approach to experimentation.

Auto-generated transcript...

Speaker	Transcript
Stuart Little	Hi everyone, and welcome to this talk around how JMP is being used at Croda to help drive new product development.
	So what we're going to cover today, and firstly we're going to cover some context to Croda, how we are using JMP and where we are on that journey, and a summary of the problem we're trying to solve.
	Once you've been covered the problem, we then are going to move to JMP, look at how the tools and platforms in JMP have allowed easy data exploration and easy development
	of a structure performance model.
	And then finally we'll wrap this up by discussing the outcomes of this work and how
	by doing this kind of research, we've been able to increase buy in to the use of data and DOE techniques in and...in the research side of the business.
	So firstly, who we are as Croda. It's a question that does come up quite a lot,
	because we're a business-to-business entity. But as a business, Croda are the name behind
	a lot of high-performance ingredients and technologies, and behind a lot of the biggest and most successful brands across the world, across a range of markets. We create, make, and sell speciality chemicals. From the beginning of Croda, these have been
	predominantly sustainable materials. So we started by making lanolin, which is from sheep's wool, and we continually build on that sustainability. Last year we
	made a public pledge to be climate, land, and people positive by 2030 and have signed up to the UN Sustainable Development Goals as part of our push to achieve this and become the most sustainable supplier of innovative ingredients across the world.
	So, in terms of the markets we serve, we have a kind of very big personal care business where we deal with skincare and sun care and
	sort of hair care, color cosmetics, and those kind of traditional personal care products. Life sciences business, our products...our
	products and expertise help customers optimize their formulations and their active ingredient use. I mean, most recently, we in an agreement with Pfizer
	to provide materials that are going into their COVID-19 vaccine. Our industrial chemicals business,
	that's...that part of the business is responsible for supplying technically differentiated, predominantly sustainable materials to a huge range of markets. A lot of markets aren't quite...don't quite fit into anything else on this slide.
	And then finally we've got our performance technologies business. This covers a lot, again, a lot of similar areas,
	providing high performance answers across across all of these. And then today, in particular, we're talking about our energy technologies business, and specifically, kind of, battery technology in high performance electronics.
	So where we are in Croda and JMP is we've been using JMP for about two years and we've had a lot of interest internally. It's been harder to build confidence that these techniques have real value
	to research. And so to prove this, we've gone away, we've created a number of case studies that have been
	pretty successful on the whole. We've demonstrated the potential and some of the pitfalls within that. And then all of that has then led to a slightly bigger set of projects, one of which is the one we're going to talk to about...talk to you today.
	how do we improve the efficiency of electrical cooling systems? The primary driver for this project is
	sort of transport electrification, so that's battery vehicles. How do you maintain the battery properly? How do you make, make sure the motors are working at their optimum level?
	And how'd you do that without electrocuting anyone?
	So currently there's a set of cooling methods for these things, that our customers are certainly looking at how that can be improved, because the better the control of your battery cooling, for instance, the better battery capacity you have, and the more consistent the range will be.
	And because, you know, this is critical and there's lots of different applications that are broadly similar, the really useful thing for us would be to build
	an understanding of these fluids, by having some sort of data led model and that's where JMP came in. So how can we do that?
	Well, the first thing we we looked at, was what are the current cooling methods? So for batteries, predominantly they're air cooled or coldplaee cooled in the previous generation.
	And the electronics in the car, you have the opposite problem of the battery but thing at tend to get too hot so, then we have heatsinks, to try and take that energy away.
	And in electric motors, we're trying to minimize the resistance in there, they tend to be jacketed with fluid.
	In all three of these cases, the incoming alternative method of cooling
	relies on fluid, so that's direct immersion for batteries and electronics, and then in terms of the electric motors, that tends to be more of a flow.
	So what does that fluid look like? Obviously, we're dealing with high voltages so we have to have something that's not electrically conductive.
	It also needs to have a really high conductivity of heat, so that it can pull heat out of the electronics.
	And because these fluids need to be moved around the system, the viscosity has to be low. So we have kind of practical physical constraints that have been introduced by the application itself.
	If you look at it in a bit more depth,
	the ability of the fluids to transfer heat is based predominantly on this equation. And what this tells us is there is a part that we can control
	by the fluid, which is the heat transfer coefficient, and then there is a part that is controlled by the engineering solution in the application to that. What's the area for cooling, and what are the temperatures of the surfaces that you're trying to cool?
	But for in all cases, to get an efficient heat transfer, we have to have a high heat transfer coefficient, and as that's the thing we can effect, that's where we looked.
	That heat transfer coefficient is defined
	by this equation in a simplistic way, there are other terms in there, but predominantly,
	it's a function of density, thermal conductivity, the heat capacity, and then having an effect of the viscosity of the system.
	So, if we look specifically for the applications were interested in,
	if we want to optimize our dielectric fluid, we need to increase the density, increase the thermal conductivity, and increase the heat capacity, but alongside that, we need to reduce the viscosity
	of the...of the fluid.
	And these match up pretty well with the engineering challenges that we have, which is helpful.
	So
	from that, what we really wanted to do was, we knew what the target was.
	And we really wanted to understand what the relationship was between structure and product performance as a dielectric fluid.
	So initially we proceeded in kind of a fairly traditional way and we started conducting a large-scale study measuring the physical properties of a lot of esters and a lot of other materials.
	And then, when when we saw that and looked at that, we thought, well actually, this data exists so why don't we use these data sets to try and build some models, and say, can we really understand that physical property to structure to performance relationship.
	So that's where...we're just going to pop into JMP so just bear with me one second.
	Okay.
	So the first thing that we we did
	was we collated that mix of historic data, data that was being obtained through targeted testing by the applications teams.
	And once we've got that into one place, we kind of examined that in JMP to say is there...to understand, is there a relationship, but at a really simple level between the physical properties we're measuring.
	So, if we look at that that data set, the first port of call for me, as ever, is the distribution platform in JMP.
	And it's a really easy way just to see if something that you want has any kind of vague pattern anywhere else. So if I, in this case, if you say, oh, we want everything that's got a high thermal conductivity, what we see is the properties that are pretty stretched out across
	the other...the other properties we've measured. So it doesn't really say, oh, there's brilliant relationship, what you need is this, which is kind of what we expected. But it's nice to have a check.
	Similarly, if we then plot everything as
	scatterplots, what we see is a lot of noise. I mean, these lines of fit are just there for reference to show there isn't really any fit.
	In no way am I claiming your correlation on these.
	And all of that was disappointing, that there isn't an obvious answer
	was expected.
	Where it got interesting to us is, we said, well, we were expecting that there isn't a clear...
	a clear relationship between any of these factors, because if there was, it would have been obvious to the experienced scientists doing the work, and we would have known that.
	So, then, we said, well, what we do know is these properties all have
	an understanding...a relation to their structure.
	What happens if we calculate some physical parameters for these things and combine that with a number of, sort of, structural identifiers and ways of looking at these molecules? What happens if we take that and add that data to the test data?
	Do...you know, can we then build some kind of model?
	This starts being able to estimate structure and performance, so that's exactly what we did.
	You know, in this case,
	what we see
	is that,
	again, if we use the multivariate platform, just as a quick look to see if there's any correlation
	on some of these these factors,
	this
	clear differentiation in some cases, between up and down, and maybe a little hints of correlation, but nothing clear that says, this is the one thing that you need. Again, this is what we expected.
	So then, what we did
	was we used the regression platforms in JMP
	to try and understand
	whether we could build a model, and what that
	relationship looks like. So to do this, we randomly selected a number of rows for the row selection
	tool in JMP. Generally, pulling out five samples at random, which weren't going to participate in the model, then it's a relatively built up these models
	and refined them that way, so we always had a validation set from the initial data, just to...just to check that what we were doing had any chance of success.
	So then, if we just look at the 80 degree models, the first model that we we came to
	was this one.
	Clearly, as we can see, there are a number of factors that were included in this model that make no sense from a statistical point of view, because they're just overfitting and they are just non significant.
	However, these are fairly important in terms of describing the molecules that are in there, so as a chemist, we created this model. So this is a model that allows, you know ,molecules, if you like, to be designed for this application,
	even though we know it's over fitted. And we know that
	it's not...it's not really a valid model because these terms are just driving the R squared up and up and up.
	We also
	built the model without those terms.
	This is a far better model in terms of estimating the performance of these things, the R squared is a touch lower,
	but all the terms that are in there have a significant impact
	on the performance.
	The downside of this model is it doesn't really help us design
	any new chemistry.
	But, in both cases, when we look at
	the predicted values against the actual measured values, we see a reasonable correlation between them. Certainly when we expect things to be high, they are.
	So that gave us some confidence that this model might actually perform for us.
	In terms and...then in terms of when we looked at how good this might be, we just simply looked at what's the percentage difference between
	the measured value and the actual value. And what we see is they are almost universally within 10%, predominantly within 5% for either model.
	Again across a range of different types of material, this gave us confidence that what we were...what we were seeing might be a real effect.
	All of which is very nice,
	is this just an effect of the data we've measured?
	So what we did was that was we used the profiler platform in JMP, produced a shareable model that we could send around the project team, and essentially set up a competition and said look, whoever can find the highest...
	the highest thermal conductivity in this model from a molecule that could actually be made,
	wins.
	You know, from that we had a list of about 14 materials back that looked promising. We had to cut a few out because they were impossible to source of raw materials, so we ended up with about nine new materials that were synthesized and tested. Now these materials were almost exclusively
	made up of materials the model hadn't really seen before, you know. In some cases, part of the molecule would be the same, but they were quite distinct from the original materials.
	So once we'd made them and, once they had been tested, we put them back into the model
	and to see, just to see what the predictive power of this model was like.
	So if we have a look at that data, you know, I think, given the differences of these materials, I was fully expecting
	this...
	this to break the model.
	However, when we...
	if we look at the predictions again,
	what we see is the highlighted blue ones are the new materials that were made. You know, we deliberately picked a couple that were lower just out of curiosity, just to check. And all the ones that we picked that we thought would be high were high.
	So in the overfitted model that had value from a designing a structure point of view, what we see is one outlier. In the...in the model that was statistically reasonable, we actually see a much better fit overall.
	And that was
	edifying, that we can start to be able to not, sort of, design a single molecule and say, oh, here you go, off you...off you pop; here's the one thing you need to make. But certainly be able to direct synthetic chemists to the right, sort of, types of materials to really drive projects forward.
	So then, if we just look again at these residuals, what we see is for the
	you know, for the statistically good model with no overfitting, what we saw was everything was within 10% of all the new materials,
	which, for what we were trying to achieve, was good enough. There's a few in the overfitting model that were a little bit under 10% but, again, this is kind of what I would expect to see.
	And it's, you know, it was it was nice that they were all in the right range, because it shows that this approach was was having value, but it was also quite to find that they weren't all at exactly right, because I tlhink, had we produced nine materials and they'd all been within 1%,
	I'm not sure that people would have believed that either.
	So
	the fact is, we were getting a similar...
	a similar level of difference to the predictions from the materials we started with and the new materials that we made. So we started having some real confidence in this in this model.
	And then, if we just go back to
	the slides a second.
	So what we can say then
	is that the structure performance relationship of these materials
	has been created in JMP using the regression platforms.
	We've used the visualization tools in JMP to be able to see that there's real benefits to do this, and that the model itself is being used to direct this emphasis of new materials in this project. It's being used to screen likely materials to test from things we already make.
	And it's a, you know, there's an acceptable correlation in the results between the model and the new molecules we're making,
	all of which has given real confidence to this approach, and it's really allowed us to, kind of, push this project further and sort of split it out into specific target materials.
	So, in terms of new molecules,
	we've directed this emphasis of molecules with higher thermal conductivity. So as you can see in this plot, you know, all the new molecules are sort of medium to high on that range of thermal conductivity, which is kind of what we wanted to achieve from them.
	We demonstrated that we could target an improvement, using data and then verify that in the lab and make it.
	Where this project then becomes harder still is, we're now trying to build similar models for all the other factors
	that influence the performance of these dielectric fluids, and then we will be trying to balance those models against each other to find the best outcomes.
	So all of that further development is ongoing, but that momentum has come purely by the ease of use of JMP and the platforms in it to take a data set
	and with a bit of kind of domain knowledge, really push that forward and say, yep here's a model that will help direct this emphasis for this project and subsequent projects in this area for Croda.
	So then, just in conclusion, data that we've obtained from testing has been used to successfully model the performance of these these materials. It's not absolutely perfect, but it's good enough for what we want.
	The model...
	the model demonstrates
	that there is a structure performance a relationship of esters (sorry, not sure why my taskbar is jumping around). The model has been used to predict materials of high thermal conductivity.
	Those predictions are then verified initially by just exclusion and then laterally by making new materials, and really showing that this this model holds for that type of chemistry.
	It's also demonstrated the possibility of tailoring properties of,
	in this case, dielectrics but other materials, if you build similar models, so that you can start being able to create specific materials for specific applications.
	And I think most importantly for me,
	the real success of this work has built internal momentum to sort of demonstrate that JMP is not a nice to have, it's a...it's a real platform to develop research, to very quickly look at data sets and say, is there something there?
	And with that,
	I just like to say thank you for for watching. Obviously I can't answer any questions on a recording, but if you want to get in touch, feel free to comment in the Community. Yeah, thank you very much.

Presented At Discovery Summit Europe 2021

Presenter

Stuart Little

Files

2021-EU-30MP-780 - Driving Product Development.pdf

Driving Product Development Through Modelling Historic Data in JMP® (2021-EU-30MP-780)

Presenter

Files

Advanced Statistical Modeling

Basic Data Analysis and Modeling

Design of Experiments

Mass Customization

Predictive Modeling and Machine Learning

Quality and Process Engineering

Sharing and Communicating Results