Speaker

Transcript

David.WongPascua 
Okay, so. 

Sequential and Steady Wins the Race. So, hello everyone I'm David and I'm a senior scientist at CatSci. So what is CatSci? 

CatSci is an innovation partner for endtoend chemical process, research and development. We collaborate with organizations like JMP UK to ensure that all of our decisions are driven by data. 

For us it's about getting it right, the first time and getting it right, every time; formulating bespoke perfectforpurpose solutions for every project; and our absolute focus is delivering to the expectations of our clients. 

We work in a number of areas, including early and latestage development. We possess process expertise, a long history of catalysis experience, which has always been a core feature of our identity, and we have excellent capabilities in analytical chemistry and material science. 

So at CatSci, one of the challenges that we face is the selection of categorical variables upstream of process optimization. 

This difficulty is most prominent in our catalyst screening work. 

In the images on the slide, we can see some of the apparatus and instruments that we rely upon to perform catalyst screening. 

On the left, we have premade catalysis kits that contain 24 predosed combinations of metal precursor and ligand. 

In the center is a hydrogenator that houses, a 48 well plate, which can be seen on the right. 

This allows 48 individual experiments to be conducted simultaneously. 

These instruments are designed for high throughput and small scale, and it isn't uncommon to run 10 or more of these plates for our clients, which totals up to almost 500 experiments. 

So why do we need to run a relatively large number of experiments and what are the benefits for improving efficiency? 

So, first of all, the chemical space that needs to be explored is extremely large. 

For any reaction, any chemical reaction, there are many factors that can have an effect, such as temperature, concentration and stoichiometry. 

However, what makes exploration particularly challenging for a catalytic reaction is that there are many important categorical factors; these include metal precursor, ligand, reagent, solvent, and additive. 

Another characteristic of catalytic reactions is that there are frequently high order interactions, so what this means is that, for a catalyst system (a particular combination of metal precursor and ligand), 

there needs to be extensive solvent and additive screens to gauge performance. 

It isn't correct to rule out to say a certain combination of methal precursor and ligand just before...just because it performs poorly in a certain solvent. It might be the case that you don't have the right solvent or the right additive for it. 

Lastly, one of the easiest ways to explain the benefit of a more efficient screening method is to look at the price of materials. 

Many of our catalysis projects involve precious metals, such as platinum, palladium, ruthenium, iridium, and rhodium. Some of these metals are among the rarest elements in the earth's crust and that's reflected in their price. 

Just a quick Google search of their prices, you can see here that the price of rhodium is more than an order of magnitude more expensive than gold. 

Another large expense of catalysis screening is the ligands. While there are cheap ligands that exists, many of the top performing ligands for asymmetric synthesis are proprietary ligands, such as Josiphos from Solvias, 

and one of the most commonly available and cheapest, J001, is still twice the price of gold. 

So, reducing the number of experiments can therefore decrease the cost of consumables but the biggest impact is created when we find the optimal solution for a client with a multiton manufacturing process. 

In these cases, a tenfold reduction in catalyst loading or the discovery of an alternative cheaper catalyst say, for example, exchanging a rhodium catalyst for a ruthenium one, can save millions of dollars per year. 

And now going to hand over to Ben, who will discuss how the JMP UK team have tackled this problem. 
Ben Francis 
So thanks very much, David. 

So it's clear that 

CatSci and many other companies similar to CatSci face big challenges with large formulation spaces, 

which can have very high order interactions. So the science is very, very complex and what we do to experiment to unravel that complexity is important. 

Unfortunately, the option to just test every single combination isn't really a sensible or viable one, due to the costs that David outlined there. 

And ultimately, we need to do some form of approach, which involves data science to understand the relationships numerically. 

So, in order to proceed and applying the chosen formulation to a manufacturing...a multiple manufacturing scenario, we need to know it will work each time, as David said. 

So we searched literature, because at JMP we're very interested in application of DOE, and we felt this was a particular situation of DOE. 

And as we aptly named it, it's a big DOE. So we searched for a case study to apply a big DOE approach to and we found a paper by Perera et al, 

published in 2018, and they looked at a platform for this reaction screening. And what was interesting in this scenario is they had five categorical factors. 

And this may not sound like too much. They had to look at Reactant 1 and 2, ligand, reagents and solvent but what the complexity of what that was presented in this paper was the levels of each factor. 

Four, three, twelve, eight and four levels, respectively, all combined together come to 4,608 distinct combinations. 

Now, in the Perera et al paper, they tested every single one, which obviously is not a viable solution in terms of every company in the world, just testing every combination, as David outlined before. 

You can see here on the heat map that they were able to get an idea that there are some high yielding regions 

and some less high yielding, and some very low as well. So there was a lot of testing to ultimately end up at not really much process understanding 

and a very high experimental cost. 

So this ultimately ended up in a situation where they found a combination, they they got to a combination, which they were then able to process in the paper and say look, this is, these are the general 

it was right, the first time, but was it right every time, from that point onwards, if they put that into a multiton manufacturing scenario? Would it continue to provide the specification that was required? 

does the big DOE solution reduce resource requirements? 

Does it get it right first time, and does it get it right every time? And the way we are able to measure these questions and how we performed 

was firstly in terms of reducing resource requirements, we only allowed to design an experiment with 480 runs in total. And this was mimicking 10 48 workplates, and this is around 10% of the runs performed in the paper. 

Does it get it right first time was measured by looking at how many of the high yield combinations identified in the paper, the 80, were either tested or predicted 

by the model generated. And, ultimately with 10% of runs, we were expecting to get at least eight high yield combinations out of our...out of our DOEs 

(and high yield is indicated here by a value of more than 95%). 

Finally, and crucially, does it get it right every time was measured by looking at the general diagnostics of the model, the R squareds. And, ultimately, can we look at this and see if we have process insight? Do relationships make sense? Do combinations and interactions makes sense? 

So what we did to give ourselves a par for the course here as well, was looking at using the full data set of 4,608, we found that the R squared of this was .89, and this was a with a boosted tree method. 

So now I'm going to hand over to Phil, who will outline the first big DOE approach that we took. 
Phil Kay 
Thanks, Ben; thanks, David, for setting the scene so nicely. So we looked at three different approaches 

to tackling this to seeing if we can take a reduced number of those runs that were actually tested in that paper and still gain the same useful information as Ben as outlined. 

So I'm going to talk through two of those three approaches, and these two are kind of the extreme approaches. There's a conservative approach and a greedy approach and then Ryan will talk through the more hybrid balanced approach that we thought was probably going to work best ultimately. 

So by the conservative approach, this is a standard DOE approach. We design a 480run custom design with 10 blocks, so that's representing 10 plates in the catalyst screening equipment that that David showed us. 

And the headline results were, from that 480run experiment, we were able to correctly predict that 13 of the highest yielding combinations would be high yielding ones. 

And we were able to build a model from that 480 run data set that, when we tested it against data that hadn't been used in that experiment, 

the R squared was about .73, so pretty good actually. We get a fairly good model, fairly good understanding of the system as a whole from that smaller experiment on...out of the big experiment that was published. 

Some really interesting insights we got from this were that we needed to use machine learning to get useful models from this data, and that was a bit of a surprise to us. I'll talk through that in a bit more detail. 

And we got some good understanding from this data that we collected using this conservative approach. It gave us a good understanding of the entire factor space, why certain combinations work well. 

It enabled us to identify alternative regions, so if we find a high yielding combination, but it's not practical because it's too expensive, 

the catalyst are too expensive, or maybe they're dangerous or toxic and not amenable to an industrial process, then we we've got lots of alternatives because we've got an understanding of the full space. 

So, how did we do that? 

Custom design in JMP. We 

designed a 480 run, I optimal design, in this case, in 10 blocks. So again, the 10 blocks representing 10 48 wellplates that David showed us the equipment for. 

Of note here, we specify a model in custom design that is for the main effects. 

for the second order interactions and the third order interactions. And with the third order interactions and all those levels of these categorical factors, 

there's just a huge number of parameters that need to be estimated, so we couldn't actually design an experiment to estimate all of those explicitly. We need to put the third order interactions in as, if possible, so this is a bayesian I optimal design in 480 runs. 

You can see from this visual, we're just looking through the different rounds of the experiments, of the different...10 different plates. 

And you can see how, with an addition of the next plate in the experiment, we're adding to our exploration of the factor space as a whole. 

We're not focusing on any specific regions; we're just filling in the spaces in our knowledge about that whole factor space. 

So that's 480 runs that we used in our experiment and the remaining 4,128 rows that we have from that published data set. We can use this test data to see how well our models compare on a heldback data set, this test data. 

Now we expected from all our sort of DOE background and our experience that 

when we design an experiment for a certain model then, standard regression techniques are going to provide us with the most useful models. 

So we were using JMP Pro here and we use generalized regression to fit a full third order model. 

Now we actually use a beta distribution here, which bounce predictions between zero and one, so we convert percent yield to the scale of zero to one. It's just a little trick that when you're modeling yield data like this, you might might want to use that when you've got JMP Pro. 

And we use forward selection, with AICc as the validation method, so that's going to select the important effects. 

And we repeat this after each round, so with the first round of 48 runs, then the second round of 96, then the third with 144, all the way up to the 10th round where we've got the full 480 runs. 

Here you can see the the filter set on gen reg there just to look at the first three rounds there. 

So for that first three rounds, this is the model that we get using gen reg, forward selection, AICc validation. 

And it's not a great model, I think, just intuitively from our understanding of what we'd expect from the chemistry. We're only seeing two of the factors involved in the model so it's only selected two main effects as being significant. 

So we can save that model to the data table, and then we can see how well that model predicts the heldback data, our test set, 

the remaining 4,128 rows that were not in our experiment. And we can do that for each round. And we did that for each round, so we've got 10 models that we've built using gen reg. 

Now, as an alternative, we thought it'd be worth looking at some machine learning methods. And here, we're just looking at the first three rounds again. 

And as a really quick way of looking at lots of different modeling techniques, we using model screening in JMP 16 Pro. 

We just fill in the yield and the factors in the Y and the X's there. We've got holdback validation column set up to hold back 25% of the data randomly for validation. 

Click OK, and it runs through all these different modeling techniques, so we get a report now that tells us 

how well these...all these different modeling techniques, including some gen reg models, neural nets and different tree methods. 

And we've selected the two best there. And in fact across all the rounds of the experiment, we found that neural...boosted neural net models and bootstrap forest were the best performing models. 

So again, we can save the prediction formulas back and use those on the test data set that wasn't involved in the experiment to compare them and see how well our different models are doing against that that test data set. 

And we use model comparison and JMP Pro here, so we compare all our gen reg models, our neural net models, and our bootstrap forest models. 

And now you're looking at a visual of the R squared against that test data set. How well those models fit against the heldback test data set. 

And you can see the gen reg, the standard regression models using forward selection in red there, starting off 

in the first round, that's actually the best model, but not really improving, and then suddenly going very bad around five, before becoming really good again in the next round. So very unstable and towards the end getting worse, probably overfitting. 

The machine learning methods, on the other hand, are, 

after round three, they're they're performing the best, performing better than our standard regression techniques and they're improving all the way. And in fact, 

you can see that neurals start off better in green there, and towards the end, the bootstrap forest is outperforming the the boosted neural net models. 

So the take home method here is even with this, what you might consider a relatively simple situation, you really need the machine learning methods in JMP Pro to get the best models, the most reliable models for these kind of systems or processes. 

Now we're just looking at plots of the actual yield results for all those 4,608 runs that were published versus the predictions of our best model at the end of round 10, after the full 480 runs there. So that was a bootstrap forest on the right. 

And it's not bad, you know you can see, the important behaviors it's picking up. It's identifying the low yielding regions and it's identifying the highest yielding regions. So there's, 

particularly up in the top right, they're using an iodine(?) reactant and the boronic acid and boronic ester 

reactants there. 

So, then, we compare this with the other extreme approach. So the first round of 48 runs was exactly the same; we start with the same 48 runs. Then we build a model and we use that model to find out which, out of all the remaining runs, 

out of the full remaining 4,560 runs there are, which of those does the model predict would have the highest yield, and then we keep doing that in all the subsequent rounds up to 10. 

And in this case we correctly predict more of the high yield and combinations, but when we get the final model after 10 rounds and compare that on the data 

That we haven't used in the experiment, the R squared is terrible. It's .33. So really the model we get is not making good predictions about the system as a whole. 

So we're getting good understanding of one high yielding region and it's finding combinations, finding settings of the factors that work well, but it's not really enabling us to understand the system as a whole. We're not... 

we're not reducing our variability in our understanding of the system as a whole. 

So just to show you how that system, that approach works, this is our full 4,608 runs, all the combinations that are possible, And, as I mentioned in Round 1, that's the same as as Round 1 using the conservative approach so that's an I optimal 

set of points there, and you can see they're evenly distributed across the factor space. 

We fit a model to the data from those 48 runs. 

And we use that model to predict the yield of all of the other possible combinations and then we select out of those the 48 highest yielding predictions, 

which is those ones. So you can see, they are focused on the higher yielding region of the...we're not...we're no longer exploring the full factor space. 

And if we go from Round 1, 2, 3 and up to Round 10, you can see, each time we're adding...we're adding runs to the experiment that are just in that high yielding zone up in the top left. So we're only really focusing on one region of the factor space. 

And if we compare the actual versus predicted here, you can see that it's doing a pretty good job in that region in the top left there. 

But elsewhere it's making fairly awful predictions, as we might expect; that's probably not that surprising. We haven't experimented across the whole system. We really have just focused on that that high yielding region. 

And if you compare it against our conservative approach on the left hand side (so you've already seen the plot on the left hand side there) if you look at how the model 

improved through Rounds one to 10, and then each round we were using the model screening platform again to find the best model. 

And it turned out that at each stage actually, the boosted neural net was the best model. Well, you can see again it's kind of unstable like the gen reg from the conservative approach. 

It's, at times, it's okay, but it's never...it's never competitive with our best models from the conservative experiment. 

So those are the two extremes, and I'm going to pass over to Ryan, who's going to talk about a more balanced or hybrid approach to this problem. 
Ryan Lekivetz 
So thanks for that Phil. And 

just actually in the middle of a presentation, gotta say thank you to Phil, Ben, and David for allowing me to take part in this. As a developer it's actually kind of fun to play around a bit. So I'm a developer on the DOE platforms in JMP. 

So as Phil kind of mentioned here, the approach that I'm looking at here is a hybrid approach. 

Alright, so the idea here is we're going to be looking at this idea of optimization versus exploration. 

So, if you think of the greedy approach that Phil talked about, that's really that optimization. I'm going strictly for the best, that's all I'm going to focus on. Whereas that conservative approach, all that that cared about was exploration. 

And so the idea here is that, can we get somewhere in the middle to 

perhaps optimize, while still exploring the space a little bit? 

So I'll say what what I had done here...so I actually started with kind of three of the well plates set ahead of time, right. So I did those three blocks of size 48 and I just did main effects with, If Possible 2FIs. 

The reason for that is that sometimes, I didn't really trust the models with just 48 runs. I said, well, let's give it a little bit of burn in. Let's try the 144, or three well plates, so then maybe I can trust the model a little bit more. 

Because in some sense, even if you think of that greedy approach, if your model was bad, you're going to be going to the terrible regions. It's not going to help you an awful lot. 

And again so...so as before, I'm still going to be doing 10 total well plates or 10 total blocks in the end. 

And I'll talk about this a little bit in the future slides, but so I was actually using XGBoost throughout. So the way I was predicting was using XGBoost and there's a reason i'll get to 

for doing that. 

But, so the idea here was to essentially take what the predictive value was. So instead of going strictly after the best 48, I said, I'm going to set a threshold. 

Right, so after the first three, I started at 50. So I said, give me the predictive values that are greater than 50 and as I keep going on further well plates, I'm going to increase five each round. 

I mean that was pretty arbitrary; part of that was was knowing that we only had 10 rounds and total, and that's also because I could kind of see based on the prediction how big was that set. 

Right, so depending on the data, if it looked like that set was still huge, that the predicted was about 50, you could modify it that. 

And so the idea here is really just to to slowly start exploring what should hopefully be the higher yield regions. 

And so, were we right? Well, so in this, we actually got 27 out of the 80 highest yielding combinations, those above 95, which wasn't that far off of of the greedy approach. 

And if you think of the R squared, again, with all that holdout data, to be that the final test R squared there was .69, so which really wasn't even that far off from the from the conservative approach. So just the... 

I mean, you may be...you may want to be cautious, right now, just looking at those two numbers, but based on that, this hybrid really actually does seem to be kind of the best of both worlds. I mean, you're trading off on it, but, but you are getting something good. 

So just some insights on this approach. Really, I would say, in the end, that we did get a better understanding of those factor spaces associated with high yield. 

I think the greedy, if you can think back to where those points were, they were really concentrated in that upper left so it didn't really do much more exploration of that space. 

And you can imagine if... 

if something in there suddenly quadruples in price overnight, now you don't know anything else about the rest of the region. 

Right, and so with this hybrid approach, we get some more of that how and why, some of the combinations, and it gives us better alternatives 

again that, maybe if you have other things to factor in, you're willing to sacrifice a tiny bit of the yield because there's better alternatives, based on other measurements. 

And so, really, the idea here...yep, so so let me just get into how this was actually done in the custom design, right. So the approach I took was 

using covariates in the custom designer, which I think it's kind of underappreciated at times. 

But, so the idea is I just took that full data set, so the whole 4,608 runs, 

and for each round...so you can imagine, at first, I took those...I took the 144 run design, and then I said, well, I'm going to take a subset of that full data set. If i've already chosen it in a previous design, well, I need to make sure that that's included. 

And then also, give me the predicted yields above that certain value, so again, for the fourth design, that was started at 50 and 55, 60, etc. 

So in the data table 

now, what I do is I select the design points that were used in previous experiments. So I make sure that those are highlighted. And then custom design, when you go under add factor, there's an option there called covariate. And again, I think that it's... 

A lot of people have never really seen covariates before and, I mentioned here, because I think I could probably spend an entire talk just talking about covariates, 

but there's going to actually be a couple of blog entries (if they're not up by the time you see this video, probably within a few days by the time you're watching this) 

where I'm going to talk a little bit more about how to really use covariates. But the idea is, you may have also heard something called a candidate set before, so the idea is... 

here's this data set that was the subset of the previously chosen and everything like that, and it's telling the custom designer you can only pick these design points. 

You can't do the standard way of, we're going to keep trying to switch the coordinates and everything like that. It's telling the JMP custom designer, you have to use these runs in your design and, in particular, 

one of the options using covariates is that you can force in previously chosen design points, so in this way it's actually like doing design augmentation 

using a candidates after using covariates. 

And, of course, because in this...in this case, I can't really throw out those design points that I've previously experimented on. Really, all we're wanting to do is to pick the next 48. 

And I should just mention as well here, though, if you think of the the custom design... 

So in each of these, what I was doing was...so the two factor interactions and then three factor interactions if possible. 

But there was...that was really more of a mechanism to explore the design space, right. The model that I was fitting 

was using XGBoost after the fact, so I mean that design wasn't optimized for an XGBoost model or anything like that. 

This was more as a surrogate to say, explore that design space where the predicted yield look good and, especially, because this was dealing with the categorical factors, it's going to make sure that it tries to get a nice balance. One other thing to mention, 

if you wanted to...again, because it's not using that underlying model, the XGBoost or anything like that, you could use the response. So in this case, they predicted yield as another covariate, and what that would do is to try and get a better balance of both high and low yield 

responses when it's trying to make the design. 

So we didn't explore that here, but I think that would be a valid thing to do in the future. 

So, to the predicted yield. So, as I mentioned in...so I think when Phil was talking about some of the other machine learning techniques he had tried 

For my ???, I actually focused mainly on XGBoost. 

And if you look in the on the left side, one of the reasons I had done that was to try 

using K folds cross validation, but where I use the design number, which would essentially be the well plate as each additional fold. So instead of randomly selecting the different K folds, 

it would say, I'm going to look at the well plate and I'm going to then try to hold that for out for each one. So if you imagine, for the first one, when I only had three well plates, 

it was going to take each of those well plates, hold it out, and then see how well it could predict, based on the remaining two. And so by the time I would get to the tenth...the tenth well plate, it would actually be kind of like a tenfold. 

So part of my reasoning for that was to think about trying to make that model robust to that next well plate that you haven't seen. 

And, in reality, this would actually also perhaps protect you if something happened with a particular well plate, 

then, maybe get a little bit of protection from that as well. 

And another thing I had done with XGBoost is when you launch the XGBoost platform, like through the addin, you'll see there's a little checkbox up at the top for what we call a tuning design. 

And so, with that tuning design, 

it's actually fitting a space filling design behind the covers and running all of that for the different hyper parameters. 

So what I had done for for each time I was doing this, 

I would pick a tuning design for 50 runs, but just kind of explore that hyper parameter space. And so the model I would choose that each round 

was the one that had the best validation R squared. And so again, the validation R squared in this case is actually using only the design points that I have, right. So this was doing that K fold with the... 

with those different well plates. So in this case, the validation R squared is not...I'm not using that data that hasn't been seen yet, because again, in reality, I don't have that data yet. 

But this was just saying using that that validation technique, where I was using the well plates. So again, that's kind of the reason that I was using XGBoost for most of this, I was just kind of trying something different that 

using that K folds technique, and as well, I really just like using the space filling as well, when it comes to that XGBoost. 

And so let's take a look here. So so we see here, these are the points that get picked throughout the consecutive rounds. Now, 

if you think of what was happening in the greedy approach, I mean, it was really focusing up in that top left. 

Right, but now, if you see with this hybrid approach, 

as it starts to move forward into the rounds, you see it fill in those points a little bit more, but we also get a better... a better sampling of points, especially on that left side. 

Right, so if you think of from the perspective, now you have a lot better alternatives for those high yield regions. 

I think so...if we take a look now at the the actual versus predicted, 

I actually think this was kind of telling as well, I think, because we did better exploration. 

In those high yield regions that maybe 

didn't get quite as high, I think, if you look to the lower left, I think we've done a much better job in the hybrid model of picking up some of those which may be viable alternatives. 

One of the nice things with this method as well that was that I can focus on more than just the yield. 

Right, so if you think in... 

I could have a secondary response, whether it be some kind of toxicity, it could be cost, anything like that that, 

I can focus on more than just the yield when I'm looking at those predicted values. And so, if you think of something like the greedy approach, 

if you start adding in secondar...secondary criteria that you want to consider, it's a lot harder to start doing that balancing. Whereas with this subset approach, it's really not difficult at all to add in a secondary response in doing that. 

I think I'll hand it over now to Ben for conclusions. 
Ben Francis 
So, 

thank you very much to David, Phil and Ryan for giving a background of why we're approaching this, and then ultimately, giving us two, three very good solutions and ways to 

view how we tackle this problem. And in all of these approaches, it's clear that JMP is providing value to scientists. So we've got straightforward, upfront, we've got tenfold reduction in resources for experimentation. We only used 10% of the runs. And 

ultimately, what we were enabling in each situation was the ability to meet specifications and provide a high yielding solution from the experiments. 

But in addition to that, we also showed how we could have different levels of deep process understanding, depending on the goal and strategy of the company employing this approach. 

Now, you may notice that, we three from JMP, we're very interested in how CatSci approach this problem. We weren't necessarily taking 

the problem away from CatSci and solving it ourselves; we're providing the tools in order to tackle this in terms of DOE. And this was a fantastic collaboration between David as a customer, 

myself and phil on the side of technical sales, and Ryan within product development, and it's key here that we're all looking at this from a different perspective 

to enable that kind of solution, which, as I said before, we can look at kind of deep process understanding in different ways, depending on the objectives of what needs to be achieved. 

We found out some things along the way, which is really exciting to us at JMP and will lead to product improvements, which we then hand back to yourselves 

as customers. We learned that machine learning approaches are applicable to this big DOE situation and, as you know, with JMP they are straightforward to apply. And we also learned caution in terms of the validation approaches, and we can look into that further. 

So, ultimately, we presented here a good volume of work in terms of big DOE challenges, and we're sure there are many companies out there, similar to CatSci, taking on this sort of problem. 

So we all invite you to explore this work further that we've done, and we have two links here in terms of the Community where we'll be posting resources and the video of this, 

as well as a JMP Public post, which enables you to get hands on with the data that we utilized within this. So from everyone in this presentation, we want to say thank you for watching. 