Dynamic DOE: A Novel Methodology for Time-Dependent Design of Experiments in Che...

Dynamic DOE: A Novel Methodology for Time-Dependent Design of Experiments in Chemical Development

In this presentation, we introduce a novel design of experiments (DOE) methodology developed in-house. Named "Dynamic DOE," it is specifically tailored for time-dependent DOEs in chemical development using kinetic reaction data. The development of this innovative approach addresses the challenges faced in traditional DOE methods when dealing with time-sensitive chemical processes.

We present benchmark data comparing different DOE designs and their performance in combination with various regression techniques. This comprehensive analysis demonstrates the advantages of the Dynamic DOE methodology in terms of accuracy, efficiency, and adaptability.

Furthermore, we showcase real-life application examples from late-stage chemical development at Boehringer Ingelheim. These case studies illustrate the successful implementation of the Dynamic DOE technique in combination with high-throughput automated lab reactors, highlighting its practical benefits and potential for widespread adoption in the industry.

Join us to learn more about chemical development advancements through the Dynamic DOE methodology, an innovative technique that seeks to change the way we utilize time-dependent experiments in the field.

Welcome, everybody, to our JMP Discovery Summit Talk with the title, Time Dependent Design of Experiments in Chemical Development. Before I start, some words about the company we're working for, Boehringer Ingelheim. We are a global pharma company, which has been founded quite since a time ago in 1885 by the family Boehringer, and we are, till today, family-owned, so not listed on the stock market.

Our business focuses on three areas, first on human pharma business, that's our biggest business, then animal health, and we have a small biopharma business. Worldwide, we have more than 50,000 employees distributed over 16 sites worldwide. In 2022, we made a revenue of nearly €25 billion. With that, to what Jonas and I are doing, we both are working in chemical development, and we develop chemical processes for supplying clinical studies and later on market.

In more detail, that means we develop new chemical routes to synthesize our active pharmaceutical ingredients. We develop robust, scalable, and sustainable processes, first in the lab, and later on, we scale them up to pilot plant scale and then to plant scale. Doing that, we supply drug substance for all clinical phases from one to three, and we generate all data necessary for later market submission.

As we always try to accelerate timelines to meet patient needs faster, we actively develop new technical and digital solutions to speed up development process. With that, let's come to our main goal, chemistry. Chemistry is more or less the science or art of combining or changing molecular fragments to build up bigger and more complex molecular fragments, which are, and then, finally, our active ingredient.

To do so, we have chemical processes which consists of a large amount of parameters we can change, for example, reagents, solvents, the stoichiometry of all those things, and then physical parameters, like temperature, time, et cetera. You can see a multifactorial problem with quite a large range of target variables we want to influence.

Most prominent always are process yield, so we want to get as much as possible out from our processes. At the same time, we want to have a good purity, so we want to decrease any impurities that might be formed. Furthermore, we are highly interested in reducing the environmental impact of our processes. So we want to use less solvent, less reagents. We want to utilize green solvents and reagents, et cetera.

And then last but not least, we want, or we have to develop robust processes, meaning we have to make sure that we know in which range we can vary our parameters without affecting product quality, and we want to be able to set up a design space.

Statistically speaking, this can be considered as a Multifactorial Multi-Response Black Box optimization problem, which we want to optimize as efficiently as possible while making sure to find our global optimum. Obviously, a perfect use case for DOE. We're actively using DOE in multiple settings.

In early phases, when it comes to screening, we use screening DOEs to efficiently find in a large, large range of parameters our influencing parameters and a possible sweet spot. Later on, we go to optimization, DOEs Response Surface Designs to really hit the sweet spot and the global optimum. Later on, when it comes towards market submission, we characterize our processes using, again, quite efficient DOEs to screen all parameters with respect to a possible influence on drug substance quality.

DOE is a great methodology. We really like it, but it has some challenges we came across. First of all, Design Robustness, especially when it comes to screening DOEs. What we quite often see is that the high low settings for influencing parameters are not set correctly by our experts in the labs, meaning low was set too low, or high was set too high, meaning on those settings, our corrections fail completely, or we are operating in an unstable region where we don't get a stable response.

Secondly, DOEs are highly efficient. However, still, if you consider to optimize a large range of parameters, you tend to need quite a high number of experiments, and that's especially for late development problem, as there we're working on quite a big scale in the labs, meaning we can't perform any reactions at once, and furthermore, reactions are quite costly on that scale. That's sometimes a problem.

We wanted to tackle those problems, and we wanted to address it with multiple things. First of all, the problem of design robustness. The obvious solution for this problem was, if it's a problem that you perform all experiments either on low or high setting, the solution to avoid the problem that 50% of your reactions might fail is just to distribute your experiments more evenly across the complete parameter range. So not only on low and high, but rather in the middle of your design space.

Maximizing efficiency, there we got inspiration from literature. Chemical processes are governed by chemical rate laws, so physical differential equations. Meaning in the time course of a reaction is a lot of information, and the idea was to incorporate this time information better into our DOEs, so sampling multiple time points for one reaction, and thereby utilizing the full information of one reaction and maximizing each and every reaction.

Last but not least, Automation. DOEs are highly repetitive and therefore, the perfect candidate for high degree automation. We wanted to try to optimize the conduction of DOEs in the lab as much as possible.

With that, let's go to our Project Scope. In this project, we wanted to investigate, can we use different DOE types to come across this problem of design robustness? Furthermore, are different DOE types more efficient when it comes to using this new kinetic data?

Second, regression methodology. Obviously, we want to incorporate for us a new kind of data, and we wanted to see whether different regression methodologies work better than others.

Third, how much data is needed. Can we reduce the amount of experiments that are needed for conducting one DOE? Last but not least, obviously, how to bring this approach to our labs in an automated fashion.

When we wanted to evaluate all those possible combinations, we quite early came across the problem of benchmark data. Obviously, we need data to try different combinations. If you just look on one DOE, if you want to consider nine different design types, 18 reactions each, with 10 time points per reaction, you get quite a high number of DOEs you have to perform with even a higher number of reactions you would have to perform. We came into 25,000 reactions.

That's obviously not feasible for us. We can't conduct that many experiments. Therefore, we have to come across a different approach. We got back to the fact that chemical reactions are governed by the underlying rate law, so a differential equation with which you can describe the chemical process more or less exactly. And we use that.

We took a chemical reaction, measured kinetics, and solved this rate law. We had more or less in silico representation of one specific reaction. Now using this rate law, we were able to simulate the outcome of different the DOEs using our chemical rate law. We just performed many different DOEs, screened different DOE types, used the rate law to conduct the outcome of the reaction in a time dependent fashion.

We calculated many responses for multiple time points for one reaction. And the outcome you can see here. We get nice kinetic profiles. They vary quite significantly from reactions that perform well through reactions where nothing happens. This data then was transferred to a normal JMP table, and just let you have a feeling how those tables look like. We just stacked all the experiments.

Experiment One consists of three time points. Our influencing parameters are the same, only time changes and our response in this example yield varies. With this approach, we were now able to perform in silico approach. Obviously, we had to use some experimental reactions as a basis. We selected two different reaction types depicted here.

First reaction is a reagent mediated coupling. What DOEs that mean? We have two substrates shown in green. Substrate A is activated by the reagent. Activated complex is formed. This complex then reacts with our starting material, too, forming the product. We have the formation of impurity here depicted in gray.

In this reaction, we investigated five different parameters, temperature, concentration in stoichiometry, plus the time. Second example is an enzymatic reaction, where, again, we couple two substrates. Substrate A is activated by the enzyme, forming an activated complex. Substrate two is activated by a chemical reagent, and we form product and impurity.

All in all, we generated four different reactions, and two reactions where we form product and two reactions where we form a yield. All in all, four different data sets.

Then we set up a workflow in JSL. The workflow consisted of two parts. The first part was a selection of different DOEs, compiling those DOEs with different number of reactions. There we screened everything between six reactions per DOE to 24 reactions per DOE. We investigated different number of time points sampled for each individual reaction. There we tried everything from three samples per reaction to 12 samples per reaction.

Those DOEs where we set up. We simulated the response using our pre-fitted kinetic rate laws, and then we took all that data and regressed it with different regression techniques available in JMP Pro. There we not only tried different regression techniques, but furthermore, we added different levels of normal noise to our response to check whether noise has an influence of the combination of regression methodology with DOE type.

The obtained models then were validated using an external validation set consisting of 10,000 samples. We calculated the RMSE for each model. Then to be able to compare the RMSEs between our four different data sets, we normalized the RMSE by normalizing it to the RMSE of a mean prediction model. Then we calculated the mean RMSE for all four different data sets. That's what we show in the next slides.

First we started investigating which regression methodology works best with which type of DOE with respect to performance of prediction. For this exercise, we consider always 18 reactions per DOE with 12 samples per reaction with no noise. The results can be seen in this heat map. What can you see in this heat map?

Here, the RMSE norm mean is plotted. RMSE norm, the mean of the RMSE norm are four different reactions. On the Y axis, you can see the different DOE types we have been investigating, and on the X axis, you can see the regression techniques.

Let's first come to the DOE designs. What can be seen is on the bottom part, the more classical design types are shown, so D-optimal and I-optimal. You can see a lot of gray or dark orange meaning there, we don't have a good performance of the final models.

However, coming to Space Filling Designs where you have more even distribution of your points and the complete parameters space, we see much better and much more consistent performance. Especially Fast Flexible, Uniform, and Latin Hypercube designs show a lot of green or light yellow.

Coming to the regression techniques, we see more or less three different techniques that work quite nice. First of all, Functional Data Analysis, and that's not really surprising as this type of analysis has been specifically made for analyzing dynamic data. That was something we expected.

Second, Boosted Neural Networks perform in combination with IMSE Optimal designs quite nice. However, the top performer, consistent top performer of all four data sets are Gaussian Process regression. We investigated two different types of the fast version and the normal version. They perform significantly better than all other regression techniques. That was quite surprising us, but on the other hand, quite nice as GPs are fitted really easily in JMP Pro.

You can see they work best with Fast Flexible, Latin Hypercube or Uniform designs. We identified a combination of regression technique and design type, which seems to work best, and we now wanted to analyze that in more detail.

First, we wanted to know how many reactions are really needed to get predictive models, and what about the influence of sampling multiple time points for each reaction. Let's first cover the part of how many reactions are needed.

In this plot, you again can see the RMSE norm mean versus in this case, the number of experiments per DOE. What can be seen is not really surprising. The more reactions you perform or the larger the DOEs are, the better the model performance will be. However, what can be seen is for GPs shown in yellow, we see some plateau starting at 20 reactions for those four reactions, and that's something we observe in reality as well.

Normally, four times the number of parameters being investigated is a good indicator for the number of experiments you should perform to get good performing models. For Functional Data Analysis shown in gray and Boosted Neural Networks shown in green, we don't see any plateau, so here we should perform more experiments than plotted in this thing.

Next question was, DOEs time point sampling really improve our model performance? If you look on the yellow line for Gaussian Process in bold, it's shown the use case with 12 samples per reactions versus the dotted line where only three samples per reaction have been sampled, and you can't see any difference, meaning time point sampling DOEsn't improve the performance of our model. What was quite disappointing for us.

However, going to Functional Data Analysis and Neural Network, there you see a difference. There, 12 samples, increased model performance. However, in the second step, we wanted to see if noise influences this analysis. As up to now, we always looked on noiseless data, which is not realistic for our lab use case, and we wanted to see, DOEs that influence the results. And yes, it DOEs.

Now we see a clear a difference between 12 samples per reaction involved versus only three samples per reaction as a dotted line. Meaning using more time point samples per reaction, we can boost the performance of our models or can dampen the influence of noise on the final performance of our model, which is a really good, great news for us as taking more samples per reaction DOEsn't cost anything.

We get that more or less for free. That's an easy way to boost model performance without having to pay anything. Same thing again for Boosted Neural Networks and Functional Data Analysis. However, in all analysis, GPs were always much better than any other combinations. Okay, so we could show in this first analysis that GPs in combination with Space Filling designs seem to work good.

However, the question was, do they really work better compared to conventional approaches? That's why we simulated this approaches as well. We made up to different approaches. First, No Sampling but Time prediction, meaning a classic DOE approach, considering time as a normal influencing factor, meaning we only take one time point sample per reaction.

Second approach would be No Sampling and Fixed Time prediction, meaning setting up a DOE without considering time at all. We set up a DOE model for one specific time point during our reaction. Again, we simulated those approaches for all combinations of regression methodologies with design types. Again, in this case, plotted for 18 reaction with 12 samples and no noise.

What can you see on the very left? We show our new Dynamic DOE approach, and yes, GPs in combination with Space Filling Designs work best. The second approach, time as a normal factor, so only one sample per reaction, works comparably good. Interestingly, GPs and Space Filling Designs work here best as well. Last but not least, the approach, No Time Prediction at all. A DOE model for one specific time point works best. That's not really surprising.

What is surprising as well here, GPs in combination with Space Filling Designs seem to work best, which is surprising as this is a standard DOE approach. The second step, again, we wanted to investigate the influence of noise as that's the final thing we want to have. This was quite nice what we observed here. Now our Dynamic DOE approach works best compared to all other approaches.

Time as a normal factor, DOEsn't perform at all. We don't get any predictive models that outperform a mean model. Even a DOE model specifically made for one time point, is less good than our Dynamic DOE approach, clearly showing that using multiple time points for one reaction really improves model performance, especially when it comes to noisy data. With that, I'm at the end of my part, and I want to hand over to Jonas.

Great. Thanks, Robert. Now, I would like to illustrate the implementation of this Dynamic DOE approach within chemical development at BI from a technical and practical perspective, as well as show the performance of this approach on one example reaction from BI's development portfolio.

First of all, the experimental realization was started in a Conventional Wet Lab in our laboratories, which is the most conventional setup for an organic lab where reactions are conducted in a typical glass reactor or glass flask, where one reaction at the time can be conducted. Every steps are operated manually, including the sampling of the reaction.

But this approach not only limits the throughput of reactions, but also creates deviations in the generated data due to varying operators and equipment. Therefore, we opted for a semi-automated system that automates single reactions like dosing or sampling of the reaction.

Although some of the steps are still done manually and only one reaction at a time can be conducted, this approach still accelerates the whole throughput of this approach and increases reproducibility of the generalized data due to harmonization of some operations, and to further increase turnover and reproducibility, as well as increase or decrease development timelines to meet patients needs quicker.

We opted for a fully autonomous system that can conduct chemical reactions completely autonomously without human or manual interactions and is accelerated even more due to parallelization. It has ensured reproducibility due to harmonized procedures and equipment. How this system works is shown in the next slides. A bit of technical insight into this platform.

This is a fully autonomous system that can conduct parallel experiments at a 100 milliliter scale with altering reaction conditions for all parallel experiments. This system allows us to record reaction kinetics in the way the specific reaction requires. Therefore, the whole system enables reliable Dynamic DOE conduction and reliable data that is highly reproducible.

The system features six 100 milliliter stainless steel reactors here in the front that are heatable, coolable, and can be steered under an inert atmosphere to conduct the reactions in. To set up the reactions and operate those reactors, we have liquid and solid handling tools to dispense reactions, solvents, and reagents directly into the reactors.

The liquid handling system can also be used to take the samples, prepare the samples, and inject the samples into analytical instruments. Both tools directly dispense all necessary substances into the reactors where then the reactions take place.

This is the operative set up now to the realization the whole Dynamic DOE workflow with an example from the BI's development portfolio. First of all, how is the Dynamic DOE set up? First of all, the design of the factors of interest is created based on the results Robert just showed you. We opt for a Latin Hypercube screening that allows us to cover the parameter space sufficiently within a reasonable amount of single runs.

Like Robert said, we usually opt for three to four experiments per parameter. In the example, reaction, those would be 10 factors with these limits. We started with 30 runs and set up a Latin Hypercube design. Next, a separate design for the sampling times is created. The sampling times are distributed pseudo-randomly over the whole reaction duration to avoid sampling every reaction at the same time point, which would lead to overrepresentation of those time points.

Then later on to overfitting, we rather opt for an evenly distributed data landscape over the whole reaction duration. In this example case, the reaction duration was 185 minutes, which was split into 10 samples with windows of 18.5 minutes. Additionally, we defined constraints to avoid sampling time being too close together and set it up in a Fast Flexible filling design.

Afterwards, both designs are combined, and the single runs are stacked by means of their sampling times like Robert already showed. Because this is a very tedious work and very click-intensive, we created a JMP app that takes over all these steps and allows to enter factors like temperature or else add factors, set the low and high factor limits, specify the design for the parameters of interest, specify the number of time points, the number of runs, the minimum and maximum reaction time, and all these steps are now done in the background.

We end up with two tables, one which includes the runs and their conditions, and one that is spread or that spreads every run, including the sampling times. Both can be used to now conduct this DOE on the robotic platform that I just showed you. In the tables, the results will be filled. The time points suggested or defined previously are then replaced with the actual time points, which can vary slightly due to technical reasons of conducting of the reactions.

Same is true for reaction temperature. After those tables are filled completely with all the results, including conversion of starting materials to product and formation of side products.

We can now go to the regression of this whole process to obtain models that predict or describe the underlying process sufficiently and obtain prediction formulas which allow us to do a Design Space Analysis, which I would like to show you on the example of a Suzuki reaction, which was part of one of the BI's current development projects, and it's a very common reaction for pharmaceutical chemistry processes.

This reaction involves the catalytic transformation of two starting materials with a palladium catalyst to product. In this case, one side product was formed from one starting material, and under those catalytic conditions, two further side products were formed from both starting materials.

The Dynamic DOE approach was applied to screen for optimized reaction the conditions and investigate the process robustness. Like shown previously, 10 factors were investigated within 30 runs and 10 samples per reaction were taken, which were analyzed value via HPLC, and the respective area percent were taken for analysis. Those are the factors and their limits, like I already showed in the JMP window.

After setting up this experiment and conduction of the experiment, after setting up this whole Dynamic DOE, conducting the reactions analysis of the experiment, we end up with a data table that looks like this.

We have the factors, we have all the time points, and we also have the final responses and the analytical data. With these, we can do a Gaussian Process regression, which allows us to investigate the model quality as well as study factor significance, but very importantly, study the influence of the single factors on the responses as well as their interactions.

We can see that, of course, time has a very significant impact as well as temperature as well as on conversion and on impurity formation. By defining desirability. We want to maximize product formation and obviously minimize side product and starting material occurrence. We can optimize the reaction conditions, which then leads to optimized conditions, where we can see that we have a very favored time point, reaction, temperature.

From the prediction formulas we obtained from the regression, we can then create a second profiler, which then allows us to do a Design Space Analysis. That helps us to identify factor limits in which the underlying process would furnish InSpec results. For this, we first need to specify specification limits for all responses. After creating a random table and connected to this Design Space Profiler, we can get a plot that shows us in which ranges or factor ranges the process furnishes results.

In this case, 80% of the results would become InSpec, and we can see that there are factors that can be applied on a very broad range, like potassium carbonate, isopropanol volume, the temperature ramp in this case, or the ligand. On the other hand, there are factors that need to be kept in a very specific range to become InSpec results.

Of course, this is time because the conversion only becomes high enough after a specific reaction time. That is in this case, temperature seems to be very sensitive. That allows us to, on the one hand, reliably optimize the reaction conditions and investigate process robustness within a very short time frame and only conducting 30 experiments to get sufficient information about 10 parameters, which then allows us to make reliable statements about the underlying process.

With that, I will summarize that we showed you the method screening based on in silico data generated from chemical rate loss. This in silico data was used to evaluate the combination of different design types and regression methods that ultimately led to result that the best performing combination is to combine Gaussian Processes with Space Filling Designs.

This approach was implemented as the standard method for DOE based optimization of chemical reactions in chemical development at BI.

For the experimental conduction of the statistical approach, robotic autonomous systems are used, and the benefit of this approach was demonstrated on one reaction from one of the current BI's development projects to be an efficient and very accurate method for reaction optimization and process robustness screening.

With that, we'd like to thank for your attention and the possibility to show or to present these results here.

Dynamic DOE: A Novel Methodology for Time-Dependent Design of Experiments in Chemical Development

Presenters