Hello there. My name is Benjamin Deadman. I am the Facility Manager for the Center for Rapid Online Analysis of Reactions at Imperial College London. I'm going to be talking to you today about, well, I guess on the abstract, we say the DoE Role Playing Game, but it' s actually a more general talk about our journey into teaching design of experiments at Imperial College London and chemistry in particular.
I'd like to take this point here today to say there is a co- author on this presentation, which is Dr. Volker Kraft of the JMP Academic Program. However, I've been a bit naughty and Volker hasn't actually seen this slide in its completion yet. While I want to do due credit to Volker for taking me along on this journey and teaching me all I know about design of experiments, any missteps, wrong information, or anything like that today, and that is entirely on me, so please excuse me for that. But I do want to give Volker credit for being central to the story I'm telling you about today.
An overview of the talk. If you'll give me the time, I will tell you a little bit about why we're interested in design of experiments, and particularly starting with the story about what is the state of chemical synthesis, experimentation and data today.
After setting that scene, I'll then give my view on why chemists really should be learning about design of experiments and why it's such a shame they don't. We'll then talk about our journey into teaching design of experiments, how we go about it, how it's evolved over the last few years, and then the final part of the talk will be around this DoE Role Playing Game. It's the latest iteration of our teaching here and some of the lessons we've learned running this over a couple of years.
The State of Chemical Synthesis Experimentation and Data. If you look at chemical synthesis in the 21st century, we have some amazing instruments. These are fancy nuclear- magnetic resonance spectroscopy tools, fully automated, generate amazing data. Benchtop LC systems, so much automation on the analysis side. Now as well, we're also getting into online databases of basically everything that everyone's done and reported previously. There's so much automation, so much data out there, but underneath it all, the reactions are still handmade, we still have in the chemistry lab students and post-doctoral researchers working away in this organized chaos, combination of low tech and high tech doing chemical experiments.
That comes with problems. Basically, our experimental processes, even even though we've got these amazing analytical tools now, our experimental processes have been very slow to keep up. I'd like to tell you about... I've got five exhibits to show you about some of the problems in the kind of data and kind of experiments that chemists are doing.
The first one is what I like to call sparse data sets. This is a type of experiment that a chemist will do. If they're developing a new chemical reaction, we will say, "Here's our generic reaction." We take partner A and we react with partner B to make a new material, a new chemical.
If you've developed a reaction to do this general method to demonstrate the utility of it, you're going to go and do something called a substrate scope. To do your substrate scope, you will fix one of those partners. In this case, let's say we're fixing one of the boronic acids and we go and react there with a range of the other partner and you explore in the single dimension. Because you want to publish it, you need to have a lot of things that work, and you can have some things that kind of work.
Once you've done that, you explore in the other direction. You change the other partner or keeping the other one fixed. It's pretty much a one factor at a time approach and you explore in the other direction. Based on this, we do our substrate scope. By exploring these two lines through this substrate scope space, it's implied that anything in this space here also works because we tested on the individual combinations. It's implied, not necessarily true all the time.
The other major problem with the data, the sin of the synthetic chemist, and I can say that because I've done this as well myself, because we're trying to publish the data, you have to have stuff that works. You can't publish a reaction if you can't actually make much with it. What tends to happen is we only talk about the positive results, the things that have worked, and we don't publish what DoE sn't work. A lot of the negative data, the things that didn't work just gets left out of the papers. That has really biased the data we've got historically to look at.
The other problem, I'd say, is chemistry is very observer- dependent. I talk about this slightly differently for different audiences. For this audience, what I'd say is the problem is that if we run a chemical experiment, a reaction, typically there's certain times that are associated with that. I like to say we've got these three types of reaction times. There's five minutes, there's one hour, and then there's 15 hours or 16 hours, something like that. This really corresponds to what the chemist is doing in their life.
There's a five- minute reaction. That's enough time for them to go make a cup of tea. If it's one hour, they've gone and had lunch. If it's 16 hours, you know what that really means is the chemist has set it up, gone home and they come back the next day and do something with it. There's no acknowledgement of the fact that these are dynamic processes and just because you observe it at 16 hours DoE sn't mean that's how long it takes.
This might seem inconsequential, but the problem is that the data out there in the literature is defined by this, by these problems. That is quite a limitation. It can be problematic because we don't treat it seriously as a factor, this reaction time. It's not been something that's recorded well. Really, I like to say that these reported reaction times. They've got more to do with the life of the chemist than the actual reaction.
Our third exhibit is reproducibility. Chemistry is not as bad as some other subjects, but we do have problems where someone will develop some new chemistry, a new reaction, they publish it, and then someone else tries to reproduce it and it just DoE sn't work. When you're in the field, you can see why. Our procedures, this is an example procedure, there's nothing wrong with it. It's just fairly example of how it's done.
There's just a lot of little details in there, but there's a lot of fuzziness as well. Things like we talk about doing this flash column chromatography, this purification method, and it's just defined as 20 %- 60 % of a solvent mix with some other added solvent. There's no mention of the flow rates used, the time this is done over, the volumes used. It's all kind of left there to be... You have to use your best judgment as a chemist to reproduce something. That is a problem and that is contributing to the reproducibility problem with the chemical data.
Then the final exhibit, which I think is the one I'll be talking more about today is one factor at a time experiments. Chemists love them. I say chemists love them. Chemists don't realize there's anything different. This is how synthetic chemistry is done. I guess once you know, otherwise, you look back on it and go, "Why are we still doing that?" Here's an example paper. It's from the Journal of the American Chemical Society. It's actually a very nice paper. There's nothing wrong with the paper itself, and it is a prime example of just how these optimization studies are done. This is just the done thing.
They've got a chemical reaction, and you can see over and underneath our reaction arrow here, there's a whole lot of different things which are going into that reaction. There's a lot of different factors at play here. There's quite a lot of work that goes into optimizing something like this, like months and months of student or researcher time that will go into optimizing this particular system.
They use a one factor at a time approach. I guess everyone here today is probably well aware of what one factor at a time means. What that means, if we look at this example chart, they're trying to optimize between these two charts here optimizing two factors. One of them is our equivalence of this reagent here, this base. Then the other factor they're optimizing is the temperature. They use 15 experiments to optimize these two. It's just progressively going through testing different quantities of this reagent, and they find an optimum here around what we call 0.8 equivalence. That's their one factor tested, and then they go and test the second factor, and they do another series of experiments to find the optimum temperature.
This approach, everyone here probably realizes, completely ignores the presence of factor interactions, which is a problem. We'll come back to the slide later on and talk more about that. But this approach here, it's the way it's done in chemistry, but there are limitations associated with these OFAT approaches. If anyone is quite new to this and you're wondering what I'm talking about with OFAT, you might also hear about it called one variable at a time.
With that scene set, over the last 10 years or so, there's been a lot of activity in the UK around trying to solve some of these problems. Something quite important to my career has been the Dial-a-Molecule network. This was formed in 2010, and it comprises over 600 members from both academia and industry, and there's a lot of industrial involvements in it. The real drive of this network was to change transform synthesis into a data- driven discipline. The big ambition was, "Can we g et to a future in 10 or 20 years where you can predict a way to make a molecule and send those instructions to a robot, have that do all the synthesis?" Really future thinking.
It's done some amazing stuff. One of the things that came out of the Dial-a-Molecule network was my facility where I manage. This is the center for rapid online analysis of reactions at Imperial College London. W e started in 2018. We first got our foot in the door and started putting together the equipment. The idea is we provide in one location the combination of advanced equipment and a supporting team of expertise to help chemists, not just from Imperial College, but from around the UK, other universities, from small companies, large companies, elsewhere in the world, to come into our lab and gain access to the equipment and do what I like to call data- lead studies of chemical processes.
My team are there. We've done all the hard work, learning how to run the machines, so we can support you on what interests you. We don't have our own research agenda, we're here to enable you to do really good science.
In addition to that, we also provide, obviously, with all this data, you need good software to analyze it. We're here today talking about JMP, and we use JMP routinely for all our design of experiments but also our analysis. We also do a lot of training, which is what I'm also talking about here today.
The general things we cover in the facility, we do something called high throughput experimentation. It's quite an enabling technology in chemical synthesis at the moment, quite an up and coming field. But the idea is instead of a chemist traditionally doing two or three experiments a day in this glass, blasts and things, and it's quite a manual process, we change that up, and now we'll scale everything down to a really small scale and we'll do hundreds of them in parallel and we use robots to set up every part of that process. This allows us to cover a much broader design space, parameter space, which is really cool.
If we're just doing one experiment, then I'd say we do that properly and we recognize that chemical processes are dynamic, things are changing. We use In-situ analysis tools to monitor how the chemicals in that system are changing over time, how the reactions are going on. We can build detailed models of these dynamic processes. It's the thing that if you're in chemical manufacturing, you can take these models and use it to scale up your process from something small in the lab to industrial scale manufacturing reactors.
We do a lot of automation. I talked about the problem with a lot of our methods being non- reproducible. We're actively supporting users using automated reactors. The benefit of that is the recipes, the way you run it is written in a computer code, so it becomes very reproducible. You basically create a recipe and if someone else is running a similar reactor, it will be done in the same way. It's much better about recording the information about how it's actually done. Then finally, we support all of this with our design of experiments expertise and our analysis expertise to really help students do the right experiment first time and learn lots from their data.
Of course, it would also be remissive me not to mention at this point the REACT CDT. This is an initiative at Imperial College London. It's a training program for doctoral students and they come through and they learn about not just chemistry, but they also do engineering and data science. This is the new way of teaching synthetic chemists. It's really cool what they're doing. Since 2019, there have been 47 students come through that program. Why it's relevant here today? T his CDT has been core to adopting design of experiments within chemistry at Imperial College London. Every year, we deliver a workshop for this group here, and they've come along with us on the journey.
Why do I think chemists should learn about design of experiments? We go back to this slide here, I talked about the problems with it. One factor at a time, one factor is not great. We're ignoring our factor interactions. W e're ignoring the fact that you can't study multiple factors independently if there's interactions at play. You just end up going round and round in circles. If anyone's caught in these loop before in a one factor at time approach, it's really frustrating.
You optimize something, you think you found the right setting. You go and optimize something else, you find another setting, and then you realize that you have to go back and re- optimize the first factor. You just go round in circles. That's why it's problematic. I like to show the chemist an example of reaction temperature and reaction time. Because as a chemist, this is something that fundamentally makes sense. These two factors are intrinsically linked. We have a process, and if you optimize in a one factor at a time approach, the idea is, it's optimized temperature. You test a bunch of different temperatures with the time fixed and you find a maximum, and then you go and do the same with time.
What you're effectively doing is traversing your space in two lines and you find an optimum. What you completely ignore is everything that happens out here, everything that happens out here. You got a very pigeoned- whole view of what's happening in your space. These two factors are intrinsically linked. If you have complex chemical processes, if you heat your reaction up to too high a temperature, then you start getting decompositions and things happening. You need to shorten your reaction time to find the optimum.
Likewise, if you decrease your temperature, the whole process slows down, so you need to do a longer reaction time to find your optimum. That's links, a chemist, you can intrinsically understand that there is a factor interaction here. How is this different if we're doing design of experiments? We might do something like a three by three full factorial, in which case you do nine experiments. It's a very similar number of experiments. But in this case, it's allowed us to build a response surface model covering the whole design space. There's no longer these gaps. We can predict what's happening in these spaces because of the way we've built the design. That's very powerful for understanding your system.
I'd like to talk about design of experiments in the context of the experiment cycle. Experimentation cycle and how that differs between DoE and OFAT. Standard cycle, you can't have an idea. Design an experiment, do the experiment, measure, analyze. Hey, you've learned something. You can go back and make a new hypothesis go round the circle again. When we're doing an O FAT approach, what you're effectively doing is do that for one factor, then for the next factor, then the next factor, and so on.
You go round and round and round. DoE has some similarities where it's a iterative process still. But in this case, what we do in design of experiments is we start off. Our early experiments are really something called a screening experiment. We're trying to test a lot of factors at once to work out which ones really matter. To do that, and it's not going to tell you what the optimum time is. But it's going to tell you it's time something you should actually focus effort on. Then when you do your follow up round, here we go. I've crossed, I've made dull, made concentration stoichiometry in this particular example, determined that these weren't actually that important.
In your follow up experiment, you're going to focus more experimental effort on those factors which actually had an impact on your response. You might also start to investigate those factor interactions we talked about earlier. Here we go. The benefit of taking a DoE approach is we're able to screen more factors earlier in the process, and we can do that more efficiently. I can't really show it here today, but we do that more efficiently than if you just screen lots and lots of factors using an OFAT approach.
Importantly, you can really start to focus in on those factors which are most important, and focus your experimental effort on optimizing those. My seven reasons why... If the only thing you take away from this talk as a chemist is these seven reasons, that would be great.
Why chemists need DoE. Factor interactions are present. They do exist in chemistry and we should be looking out for them. If you're doing anything with continuous factors, now that's a bit of a tricky one in chemistry sometimes, but if you're optimizing continuous factors, then once you've seen what you can do using a design of experiments approach, you're going to be so horrified at how much time you wasted doing it the old- fashioned [inaudible 00:20:10] way. It could be very efficient to optimize these experiments in a DoE led way.
You get to focus your experiments on those factors which are most important, which actually affect your system. I like the fact as well, it also helps protect you against outliers and nuisance factors. There's a lot of things we don't think about using those traditional methods which are built into design of experiments. As you learn about DoE with me on our courses, you become aware of these things and you can start to build them into your designs.
Again, similar thing for uncertainties. Chemists are rubbish. Well, synthetic chemists are rubbish at uncertainties. They're not really that numerical about it. DoE handles that for them. It forces you to have a measure of the uncertainty in your response. Whereas a lot of chemists just ignore that and just... Yeah, it's not great.
Maybe a really positive selling point for chemists is that the models you get out of it, they look really cool in your thesis. They really boost your publications. If you can show you've got a model that allows you to predict and understand the system, that really elevates your work. That could be a real positive reason for taking this approach.
Then the final one, this is a bit more of an advanced technique, but when the students get it, it's really cool to see. I like to use design of experiments as a crystal ball to look into my experiment. Before I actually do the experiment, I can look into it and understand what it's going to be able to tell me. I can do thought processes around right for...
If my RMSE, my error levels about this, what do I expect to see? Do I have a realistic chance of seeing this factor or not? I can then modify the experiment to ensure that I've got sufficient power there that basically I don't have to go back and do more tests just because it turned out that something's not active. You can have an educated guess going into the experiment whether it's going to tell you what you need to know or you need to do a different design. Once you get a handle on that, it's really useful.
Teaching design of experiments at Imperial College London. I guess when we started this design of experiments training, it was not covered in undergraduate courses. I think it's probably still the case. If you go over to statistics or mathematics, it's something that might be done there. But in chemistry, no, we don't teach our undergraduate students design of experiments. I guess the last couple of years, we've had one lecture as a bit of a guest lecture from industry. Here's a technique you might like to know about, but we're not quite there yet. We'll keep pushing on that one. Hopefully in the future, that'll be something which is built into more of these courses because this is something they should learn about.
But DoE is considered an essential skill by industrial chemists. There's been increasingly in the UK a movement towards training up our postgraduate research students with DoE skills. An example would be the Dial-a- Molecule Network. Pre- COVID was running summer schools every year at Loughborough University.
Part of that summer school was using design of experiments to do an actual experiment. It' s very cool, the work that we're doing. These dots here correspond with universities, chemistry departments around the UK, which are really coming along with us on this journey about data- lead chemistry and design of experiments would be part of their courses as well.
But here at ROAR, where did we start? Back in 2017, my boss, Professor Mimi [inaudible 00:24:13] , was setting up a case to build the React CDT. As part of that, she went out and consulted a number of different UK chemical companies to find out what taught components should be included in a future proof postgraduate training course for chemists.
Interestingly, 72 % of the respondents said that design of experiments was considered an essential training component for chemists, and this was the highest level they could say. That's a really strong signal coming out of our employers, basically, that design of experiments is something we should be teaching them.
On the back of that, JMP were very early supporters of the React CDT in the R OAR facility. We've been very grateful for their involvement and that's really helped us get to where we are today. In 2018, we ran our first workshops with JMP. These were two one- day in person workshops. The target audience of those was... Both of those were for chemistry graduate students, so we offer these up to our department first, but we also opened it up to other Imperial College departments. We had visitors coming over from aeronautics and a range of different places actually. It was quite surprising to see who else was interested in DoE.
A big thank you to Volker, has been with us from the start of this journey, and has really helped us come along the way. Building on from that early start, in 2019, we did more workshops, me and Volker and the rest of the JMP team. Did a couple more of those workshops. We opened them out further. We did a series, we did one for our local students. We also did another one which we opened out to the Dial-a- M olecule Network, and that was quite well received.
Ever since we did that, we've had more and more requests for when's the next one going to be? Unfortunately, that was 2019. By the time we got organized to run another one, then the world had changed, as we all know. But anyway, in early 2020, just before the world changed, we did modify our workshops. We changed from a one- day workshop to a two- day workshop, and that was the start of this doctoral training program.
We realized at that point that we were having trouble fitting in the level of content we thought the students deserved into one day. W e expanded to two days to give us a bit more time to include a bit of theory with the more hands- on part of the workshop. That was great. We had a great little two- day session. The students did some wonderful stuff.
Unfortunately, I forgot to include some photos here, but ... 2020 then, the world changed, COVID hit, and everything went online. We turned our two- day workshop into a virtual session, three times three- hour sessions for the following years. Where we had people coming to us asking, when is the next open workshop going to be, we've been encouraging them to use the statistical thinking for industrial problem solving online course offered by JMP. I'd still say that it's a very good course. I went and did it myself to boost my own skill levels a few years ago over COVID. It is a really good introduction to the subject. If anyone is looking for how to get started, that's probably where I'd send you.
Okay, moving on. In 2022 then, this is when I feel like I suddenly found my own feet as teaching design of experiments. We developed a postgraduate taught module in design of experiments for our new digital chemistry MSc degree. By the way, I say me and my associates here in Imperial College London. This is taught independently by me with a few guest speakers. That's been great. We're in the second year of that now, and it's really improving the content of our teaching materials as we go.
Then most recently, I don't know, three weeks ago, four weeks ago, we had Volker back over at Imperial College London for our first in person workshop since COVID, and that was for the CDT, and that was great. It was great to get the students back in a room and have them working in small groups again, because there's a level of interaction that is just really hard to replicate in those online sessions.
I would say over the course of since 2018, we've managed to introduce design of experiments to about 200 scientists and engineers. Most of them will be PhD students. There's a fair number of postdocs associated with that, but also some industrial scientists which came along to our externally available workshops. We hope to double and triple those numbers in the coming couple of years as we're looking to open up again now that COVID's passed.
What do our workshops look like? Here's an example from our recent two- day workshop. These things, we always tweak them a little bit every year, trying to find the optimum. This is the most recent one we ran. On the day one, we'll do a bit of an introduction today about why they should know about design of experiments. We teach them about analysis of variance. The idea as they come in, we don't expect any existing knowledge about statistics. We really teach them how to use it and try and give them enough of a background that they can get where it comes from. Same with linear regression. We do a bit of factorial experimentation and screening experiments, a lot of hands- on exercises as well.
A gain, we also spend quite a bit of time that first day talking about evaluating designs. This is probably the most advanced concept we handle, and it's probably the trickiest one for them to get. But when they get it, it's really quite powerful to them. If they can understand the power of their experiments before they do it, that's a very useful tool. Then we end the day with definitive screening designs. There's some great exercises JMP provided us with for that.
Day two, then we're getting in more into response surface modeling using custom design in JMP or optimal design if we want to be a bit more vendor neutral. We teach them about design augmentation, building robust designs. That's the morning session, and then the afternoon, we leave it a bit more flexible, depending on how time is going. We're either catching up on things we've not quite covered yet, or if there's time, we like to do a bit of sign posting of more advanced techniques.
It's simply not possible to cover all of these well in a two- day workshop. We make sure that we make them aware that these are things that are out there so that when they go into their individual research projects, if they see an opportunity to use something, they know the terminology and they can go out and read a bit more about it, or come and talk to me in our facility and we'll help them out.
It's really about sign posting and making sure they've got the right vocabulary so they know what they're looking for. We also include quite a good discussion about tips and tricks, where to get started, and extra reading. All throughout, this is all delivered as a combination of presentations, demos of the software, and most importantly, the thing that really, I think, cements it is these hands- on exercises, doing problems with the JMP software.
JMP has been fundamental in helping us build up our workshops and our teaching modules over the years. In the early days, they developed a lot of the initial discipline- relevant exercises, and then I'll be able to learn from them and build my own exercises as I go on. Same with the slide decks. A lot of my slides now are fully customized, but you'll still see the occasional slide which is one that Volker has provided me with. It's still there in the slide deck, it's still getting reused all these years later. But we're continually developing these things.
It was really helpful to us, the fact that JMP provided us with trainers to support the workshop delivery. In the early days, they really do take a train the trainer approach. Right at the start, Volker, Ian Cox, Robert Anderson, would come along and always they run the workshop. I'm just there hovering around the background, making sure the students have got access to the software, learning what I can, even learning myself at the same time.
As we've gone on through the years, basically I've been able to gain confidence by co- leading workshops with them, and now we're at the stage where I'm still very grateful to have Volker come in and co- lead the workshop with me. But I'm also quite happy to go on and do my own thing as well. That's been great, being able to go on that journey. I really do need to thank, not just Volker, but Ian Cox as well was there. I think Ian was there for our very first workshop, same with Robert Anderson. Hadley Myers has also stepped in as well to help us at times. Big thank you to all of these JMP people for their support over the years.
The DoE Role Playing Game, I guess, what you're here for today. Where did this come from? In our early workshops, we had a Heck reaction, an interactive exercise. This was actually... As far as I know, it was developed by Ian Cox, and it's all written in JMP script. The idea is you got this reaction here and you send it to the students, you say, "Right, these are your starting settings. Here are your sponsors. Here are your factors. Here's the range you're allowed to test them over. Design an experiment, and this exercise will generate the data for you for the design you've chosen."
It's a really useful tool and a guided workflow. The students go through and they select which factors they want to test. They can specify limits. One of the limitations is I think it only works with custom design. Then build the design and then it uses a model under the hood. It's just running a mathematical model and I've [inaudible 00:35:14] a lot of these values here. In the past, I was using the same model for some of my teaching and didn't want the students to know about what the actual factor setting, what the factor parameters were. Also, in case anyone else has got access to this, I'm not giving away the secret.
But this is what's happening under the hood. It's basically using a mathematical model and it plugs in your factor settings and it will calculate the responses for you. It's able to do that and generate a table with the calculated responses. As a tool, that's really useful for teaching the chemist because they can play with it. They can do different designs and see different data sets and do the analysis and it's very hands- on.
When I saw this, it was amazing. It was like, right, this is a really useful tool for teaching design of experiments. This is the way it should be done. Not just, "Here's the data set, go and analyze it." But going through that whole process from design to get the data to analyze. It's a very nice platform. But there were some limitations.
Running it over a few years, we generally found that the setup of the exercise could be prone to installation errors. I guess installing software on 30 individually student- owned computers is always going to be problematic. You got different operating systems, different levels of following instructions, and there's always problems. But the most problems we had were trying to get this particular exercise going. It was an amazing piece of work, but it's not fundamentally a JMP software. It's something that's been hacked together in the scripting language, and so it's very difficult to support that in a short workshop session.
The danger was as well, if it didn't work for a particular student then, they become a bit disengaged from the process and you lose them. That was one of the problems. The other one we had... Two problems that came together is as we're going on, I wanted to start tweaking the model. We had to be very careful because we got multiple responses in this model. In this case, I guess I didn't spend a lot of time showing it to you, but we talk about a product and a by-product and remaining [inaudible 00:37:37] material .
It was possible to get results that didn't make sense. Basically, we could break the mass conservation laws and end up with a situation where you had... Some of the materials came to too much. This is just by way of the way the responses are calculated. Each has their own little model to calculate. Whereas in a chemical system, we've got one thing going down, something else coming up. It couldn't quite handle that as reliably as I'd like to give a real chemistry scenario.
I guess the other one as well is I wanted to be able to start modifying this. I needed to make more examples for student assessment. I could just give 15 students all the same exercise and they'll just all give you the same answer back. You want to give them something slightly different and be able to customize things a bit.
I did also think one of the other limitations was by fixing those factors at the start of the exercise, it did limit the student's creativeness. We almost gave them the solution and said, "Right, here's the five factors you're allowed to play with," and just left it at that. I wanted to try and get away from that. This is where my version of the Heck reaction came in. It's not quite as elegant as the JMP scripting version, but has a bit more freedom in it. What we do is we have a problem and we set this problem in a free text form. I give them a couple of paragraphs of text, I'll show you one a bit later, and a single data point, and it's a starting point.
The idea is a student's working group, we ask them to define the problem, and almost they've got a bit of freedom to define the problem as they want it. We give them a bit of a lead and say, "The fictional client is interested in this," but we really give them a lot of freedom to find a problem that they want to investigate.
We encourage them to ask questions and then as a group, build their designs for their experiments. A lot of freedom is given. They can define their problem as they want. They can define which factors they want to test. How does this work? How can you do an exercise like that?
I guess the secret, the way you make it work is you build the model dynamically. You start off with... Or you start off with that at the start as an intercept. That's the single data point they're given at the start.
Then as the student group comes to you and says, "W e want to do an experiment on these factors here." You quickly go away and build up your own little model and say, A little bit of this, a little bit of that. Here we go. Make the equations and then plug in their design into that and generate the data. Of course, we had uncertainty there.
That's the process. The idea is to get the students to go around this cycle a few times. Because they can't control exactly what you're giving them, there's a bit of uncertainty there. They need to treat it as though it's a real experiment and see what they can learn about the system upfront and then design sensible experiments instead of just trying to run and get to the solution as quickly as possible.
How do we set the scene? This is an example from the course. We talk about our fictional company, our client, ACME Pharmaceuticals, a major manufacturer with their own R&D division. It's all just a bit of flavour text to give the student something to bounce off.
We say that they're interested in Design of Experiments. They want to test out DoE in a range of different projects, but importantly, they have no expertise in this area themselves. They're great chemists, don't know about DoE. You need to treat them, not like idiot s, but you need to take them along on the journey and explain concepts to them really well.
I guess the other thing I got a great links to tell the students is that the client is being a bit awkward and is saying, "Look, we're not going to give you the chemical structures." This might sound like a bit of a bonkers thing to do. Why would you build this into the problem?
The reason is I don't want the students to try and solve the problem through the literature. My fear when I set this up initially was if I give them a substrate, they're going to spend hours and hours reading the literature, trying to find previous examples and solve it that way. What I want to put the emphasis on is the design process, designing experiments, analyzing the data.
I want them to focus more on that than going out and just reading lots of papers. I took a deliberate decision to hide the structures from them. You'll see how that works in a couple of slides. We also tell them they're consultants and they can trust the lab technicians, so don't worry about the quality of the experiments. If you give an experiment a design set, it's going to be run in a high- quality way. Take away those fears.
Again, we say they're enthusiastic about DoE, but they've got no prior knowledge, so they need to have things explained to them. What does the project look like? Here's an example here. We say ACME Pharmaceuticals are developing anti-parasitic agents for the treatment of face huggers. If any of you have heard or seen... If you're old enough to remember theme hospital, that's where that comes from.
But they've got a candidate molecule going into pre- clinical testing and need to optimize the synthesis of their final step. I t's very, very wordy. I'm trying to give them something to sink their teeth into and turn into a problem rather than dumbing it down for them.
Also, give them some information about the equipment they' ve got. This was a bit of a... Last year when I did it, the idea was I was trying to lead them on to thinking about blocking of some of the positions in the reaction, things like that.
They missed the hint last year, so this year I've made it a bit more explicit and said that the chemists have anecdotal evidence that the reaction results can vary between positions. I'm really trying to give them a hint that think about blocking, think about some of these slightly more advanced techniques.
We also include multiple sponsors. Otherwise, it's quite handy to have multiple sponsors. It allows you a bit of freedom to make the problem a bit more difficult to optimize. If you make it too easy, they just charge towards the final answer and then they lose interest. You need to make it a little bit difficult for them to keep them trying to optimize it more.
I deliberately pick slightly complex reactions. There's a lot of factors involved. There's a lot of uncertainty around it. Then I also obscure the chemical structures. The idea being we're telling them that complex structures, there's unusual reactivities.
Go and read the paper. Read the paper for some inspiration about what factors you should be looking at. But don't trust that you can look through what's been done previously and go, well, it's probably going to be these settings and these settings, which is the right ones because these are unusual substrates you're working on. They're going to behave a bit differently. That's the way we set the scene.
We give some general tips to the students. We say you should read the literature for inspiration about the factors that could be affecting your response. A lso, lean into the exercise as a bit of a game. If they go away and they're reading about something and they think, Oh, I'm expecting when I change this factor, it's going to have this effect on the response.
I encourage them to tell me about that. T hen it's up to me as the instructor about whether I go with that and like, Yeah, we'll build that in, build it into the model, or go the other way and do something else entirely. But we bounce off each other a bit. I do go to great lengths to say, "Don't expect to find the answer in the literature. It's not a review in the literature exercise, it's a design of experiments exercise."
Really encourage them to do the empirical experimentation without the hard part of running the actual reactions. Encourage them to ask questions. We talk about the fact that the data is generated with noise, just something I have to make explicit so that they accept a bit of fuzziness in it.
I also tell them as well, a bit of a hint, if they try and make their factor ranges too wide, it's going to break the system and it'll just return them reaction didn't run, this didn't happen. F ind some reason why particular data points didn't work.
What lessons have we learned? I guess more generally first, barriers to teaching Design of Experiments in chemistry. The first one I'd say is buy-in of supervisors. You're in a situation where your senior staff members, they're all used to using one factor at a time, and you need to recognize that.
I guess you need to be very aware as well that all it takes is one bad experience with Design of Experiments and that can put them off forever. We're very aware of that and we're very careful to manage that.
Our solutions, what we've been doing, we encourage the supervisors to come on the workshops. Even our two- day workshop, we'll try and make the first day to be almost an introduction so they can come along on the first day, learn a little bit about how it works and if they're aware, if they know a bit about it, then suddenly it's not so scary. Now understand a bit more about where we're coming from.
Make sure we include relevant case studies. Importantly as well, we have a lot of discussion. We make sure the students know about the limitations of Design of Experiments. I'm very quick to stress, it's not a magic bullet.
The danger is always we talk about the amazing things you can do with DoE and students go away thinking, Oh yeah, it's going to solve all my problems. Well, hang on, hang on. We stress where it works well and where it doesn't work so well.
We also have to deal with the statistical vocabulary of the students. They're chemists. They don't do statistics as part of their undergraduate. It's not in a standard chemistry undergraduate course. Our solution s there, we need to develop our teacher materials in such a way that they work for people coming in at a very basic level.
We focus on a few key statistical measures, things like analysis of variance and linear regression modelling. We try and keep to relatively simple test first and then build on from that if they choose to do go further with it.
Provide lay explanations where we can. These lay explanations might not always be to the statistician the best way, but as a learner, it can be quite a useful way of grasping a concept as a starting point. We highlight a lot of additional resources. It's about getting them enough vocabulary that can then go on their own learning journey, like I've done over the last few years.
Then the final major barrier we encounter is, how do I use this in my research type question. Our new users can sometimes struggle to see how dear we can relate to their research problems. Often this comes down to...
We put a lot of emphasis on continuous factors in our workshops because they are the ones which work quite well. But we also talk about categorical factors. Unfortunately, in chemistry, we're still we still optimize a lot of categorical factors. We do have to handle that and make sure those students know. We have that conversation with them, so they see the strengths and weaknesses of the approach.
It really is about having those discussions, providing context, and also continuous support. My facility is here to support students, not just with the initial training, but we encourage them as they're doing their research projects to come back and talk to us about their Design of Experiments so we can support them with it.
If you want to think more specifically about the DoE role- playing game, what lessons have we learned? We're only in the second year of running this, so it's still a bit premature to be deciding if this is working or not. But my initial take on it is, I think the benefits I've gained, the flexibility of the way I build that model dynamically, it does bring an extra element to the course.
Basically, I can encourage the students to just explore factors and be a bit creative. That flexibility there, it creates a bit of uncertainty for them, encourages to think about it a bit more, discuss it a bit more, rather than just charging on and doing a very simple design.
I get real- time feedback on what the students are picking up and learning. This is one of the things I found last year, why I've modified my course a little bit. When I ran it last year, I got a lot of classical designs.
The first designs were fractional factorials and things like that. I guess that was what I'd just taught them. So it was what they jumped in and did initially, whereas I was hoping I'd see some definitive screening designs and some custom designs and something a bit more modern, but actively modifying the course to make sure that they're learning the relative strengths and weaknesses of different approaches.
As an instructor, I like the fact I can increase or decrease the difficulty dynamically to suit the group. This is something... When I first proposed this way of teaching the course, I had a lot of senior colleagues telling me, "Why don't we actually do the experiments?" "Why don't we go and use all our fancy instruments in raw and run actual experiments?" I'll push back against that a lot.
My concern with that is either, one, we have to dumb things down so much to guarantee it's going to work, or we end up doing really expensive, complicated experiments and then having to hold back the data and not publish it so that we can use it in a workshop. N either situation quite works.
This approach here gives me the benefit of both. I can make things more complex as I need to to give the students more opportunities to grow their skills. But if we're not quite getting them to where they need to be, we can rein it back a little bit and ensure they get something that they can... Data they can analyze and that can be assessed on. Y eah, that control over the level of uncertainty and unknown is really useful.
The challenge then, it's time- consuming. I'm not sure I'd have done it this way if I knew from the start how challenging that was going to be. It requires an instructor that knows both the subject and Design of Experiments. You need to be able to build intelligent models on the fly.
Time- consuming. I guess as we go on, once I've got a model already, it's not too bad. You can lean on your past experience and build models quite quickly. But when you're coming up to a new situation, it takes a while to figure out how to do it.
I think for the students, one of the things we found is it can be a bit of an unfamiliar format for them, so you do have to spend a bit of extra time as well, not necessarily calming them down, but reminding them that the point is not trying to get a good answer. The point is demonstrating your knowledge of the subject and using it as a tool to explore different options around Design of Experiments.
Wrapping up, hopefully, we're okay for time. I guess if you want to learn more about JMP, these are our further info slides that we show all our students at the end of the workshop. We really encourage people to go and look at the JMP Learning Library. The one- page guides are really useful for getting to grips with new topics. That statistical thinking online course is really good. If you want to intro to DOE, go and do that.
Again, Statistics Knowledge Portal. I found that really useful as well for brushing up some of my vocabulary on some of these basic techniques. Then the blogs, the forums, the online documentation, it's all a very useful resource for learning about software, Design of Experiments, in general. It is really helpful, really good. I'd encourage you to go and have a look at it.
I guess this then remains to my own acknowledgements. If you want to learn more about ROAR, I'd encourage you to go to our website, search for Imperial Roar. Don't search for Roar itself, you'll get Katy Perry, add on Imperial find us. Join our mailing list or send me an e-mail, roar@imperial. ac.u k.
Thank you to our sponsors, the UK Funding Councils, jMP, of course, has been with us along the journey, and a bunch of different other chemical companies and instrument vendors that have supported us. T hen particularly at JMP, I'd like to acknowledge Volker Kraft, Ian Cocks, Robert Anderson, and Hadley Myers for their support through this journey. T hank you for your time.