Design of Experiments for Complex Biochemical Systems
Cell-free expression (CFE) systems are a suite of methods that reconstitute complex cellular functions like transcription, translation, and metabolism outside the confines of a living cell. CFE systems have numerous biotechnological uses in sensing, biomanufacturing, medicine, basic research, and education. Most CFE systems are made from combining cellular lysates with a complex blend of excipients that improve activity. While the number of excipients makes exploring the combinatorial spaces challenging, high-throughput experimentation with acoustic liquid handling makes it feasible to optimize formulations if paired with an appropriate statistical framework.
Here we describe our use of design of experiments (DOE) to optimize excipient combinations for specific use cases of CFE. We pair our DOE with functional data analysis (FDA) to collapse activity over time measurements to metrics readily used for analysis. Initial formulation DOE examples range from five to 14 components. We further describe our efforts to push to higher scales, attempting mixture-process DOE designs with as many as 42 components using an experimental set-up that allows for 1,536 formulations to be tested at once.
Hi. I'm Matt Lux from the US Army Devcom Chemical Biological Center. I'm going to tell you about some of the work we've been doing to use JMP software for design of experiments of complex biological systems.
For some background context, we work on something called cell-free expression systems. In much of biotechnology, you use cells, which are great, amazing little machines, but they can be very difficult to manipulate and work with. In cell-free systems, what you do is instead of manipulating the cells, you grow a bunch of cells up, you break them open, you harvest their guts, and you take that molecular machinery combined with the DNA that encodes for the function that you're interested in, and then you can use that to perform the function that you want.
For the applications of sensing, you can put these things onto paperless substrates and freeze dry theme, and then you can rehydrate them at the point of need to perform the sensing reaction that you want to do. We've done this for a number of different applications. You can see on the right, drinking water, chem bio threats, human performance.
We've taken this technology pretty far. We've packaged these reactions in paper fluidic devices with plastic housing, and used them with operational events with soldiers to get feedback on usability and so on. We're now starting to manufacture these with an industry partner.
There's a number of advantages to doing this, but the one I want to highlight and talk about most today is on rapid experimentation. High throughput experimentation usually uses robotics. Using typical robotics, we can scale up the production of these things quite well. I'll mention as an aside, we used JMP software to do a DOE for the robotics parameters for the instrument shown in the video above. But when you're trying to change the different components for research and development of these reactions, this kind of system is not so good.
For some context, these reactions are extremely complex. There's kind of three main families of recipes that are used at 25 components each. The lysate component is the cell guts that has thousands of proteins and different metabolites present. We just don't know what many of them do, how they interact, and so on. On top of that, you have dozens of different additional components that can be added. For example, looking at different cryoprotectants, just at a few different concentrations you're looking at six billion different combinations.
Doing these experiments just in liquid reactions with no paper, in these plastic plate setups that have small wells that allow you to add different components, people have worked on that and made some progress, including us. But no one had really worked on how you do this in the presence of paper, which can change the performance of the reactions in ways we don't totally understand.
We set up a framework to do high throughput experimentation here using JMP software and some experiments to reduce the search space to something more manageable. Then we make these 384 spot paper tickets to allow us to do a bunch of reactions at once, use acoustic liquid handling to dispense these different reactions, and then some custom instrumentation that does the analysis of the color change over time and extracts that information.
What do we want to use this for? The main thing I'll talk about is where we've taken a low-cost recipe that didn't work at all on paper and then use this framework to reoptimize it to get it to work on paper without actually adding any significant costs.
There's two main recipes here. There's one called PANOx. This is sort of a workhorse recipe for us and many others in the field. Then the second one called Cytomim-Cai, which is about 100-fold less expensive, it's a little bit simpler, and it works in certain applications. But we found that it didn't work at all on paper. On the right, you can see trajectories of these happening in liquid reactions. Both Cai and PANOx work. PANOx works a little better, but Cai still works, whereas in the inset, you see some paper-based reactions where the PANOx has got a nice clear purple color, but the Cai reactions stay yellow; you get no signal at all.
What's going on here, and can we fix it? Our approach was to use design of experiments. In JMP, we built a mixture amount design using a quadratic model and the KVC cross. For the first round, we looked at 16 components from the low-cost mix, and just to see if maybe adding more of some combination of them would recover activity.
We found that some combinations do provide more than zero activity, at least. If you look at this heat map, the yellow color indicates increased activity. But only in a few cases. It wasn't sufficient to converge the model in JMP to be able to predict optimal formulations. We then move to a second round where, okay, well, maybe it's something in the PANOx mix, one or more things that is enabling it to work well on paper.
We added some of those components on top of the low-cost mix. Here, we see much better response, and we're able to converge a model and predict some optimal formulations. We test some of these. In the bottom left, you're seeing different trajectories. Upper left, one of those panels is the PANOx, the workhorse one, that we use most.
Then the other ones are these low-cost formulations with different combinations from the PANOx added in. You can see that they work roughly as well. One or two actually work a bit better, maybe. Why is this? That's exciting, but why? Well, we don't know. If you look at the heat map to the top, this is showing how much of each of the different components is added. The navy is nothing, and then the yellow is the maximum we allowed.
For each of the components, there is an optimal formulation that either added nothing or the maximum. No clear indications there. The table in the middle, it's intentionally small because it's complicated. But when we look at the two-way interactions, there are some that stand out as significant according to the model. But using an expert looking at these, trying to derive insight, there's nothing clear that stands out that would make any sense why these things are working. They're not consistent across the different optimal formulations either.
We don't really know why. That points to higher order interactions between more than two-way interactions, which is completely to be expected, because these complex systems have a lot of interactions in them. A lot of complex interactions. Maybe we don't know why, but clearly it helps the performance.
We took one of the formulations, number 4, which basically added an insignificant amount to the cost, and we put them in some of our devices that I mentioned earlier. Excitingly, not only did it work, it worked even better than our default workhorse, SP. The blue and yellow curves are the optimized low-cost mix, and the purple and red are the standard mix. We can see a pretty substantial improvement, but with 100-fold lower cost.
Don't have much time in this presentation, but I wanted to at least touch on a couple things that JMP's enabled us to get to here, like look under the hood a little bit. One is just the ability to easily analyze residuals to check for systematic errors. Our liquid handling instrument is kind of notoriously finicky. Sometimes when it works, it's great. When it doesn't, you don't always know that it's not working, which is a big problem.
Here in the second DOE that we did, we saw that there was a systematic set where the first few reactions, there was no activity. The residual plots show this, whereas the rest is nice and randomly distributed. We assume that that's an instrumentation error, and we can exclude those and proceed, and we're able to get those nice optimals that I talked about.
I also didn't really talk about how we were taking these trajectories, but then optimizing over a single parameter. There's lots of ways to do this, but the way we did it was using Functional Principal Component Analysis in JMP. Very briefly, you're able to take all these different curves and reduce them down to a single parameter that can be used to reconstitute that curve with pretty high fidelity. We're able to kind of keep most of the information, but still do a single parameter optimization.
We've also used this same kind of framework to do other things. I'll just give one example here. This is another DOE that uses 14 components that are used for shelf stability. We wanted to look at, "Can you use these things? If they sit on the back of a Humvee for a day, will they die?" The answer is yes. If you add these additives, you can recover most of that activity in that sort of treatment.
Then the other thing we've been working on more recently is expanding this. I mentioned the three recipes. There's 42 components that kind of made up as the union of those recipes. We tried to do a DOE of all 42 components to look for, "Hey, maybe there's some other optimal region that no one has explored." It crashed the JMP software. Our statisticians' been working with the engineers to see if they can resolve that. But a 36 component, we were able to do that.
Unfortunately, when we go to use that with errors, we're having more of those dispensing issues that we've been working through, because you can see here that this is not nice and randomly distributed. There's regions where there's just no activity. This is work in progress, but it's exciting to be working both at the limits of the JMP software as well as the instrumentation.
With that, I'll just do acknowledgments. David Garcia, he's a postdoc in our group that has done fantastic work and did all the experimental work described here. Then Jay Davies, a statistician, did all the work with JMP and all the statistical analysis.