Modeling Antibiotic Tolerance in Chronic Lung Infection
The airway environment in individuals with muco-obstructive airway diseases (MADs) is characterized by dehydrated mucus due to hyperabsorption of airway surface liquid and impaired mucociliary clearance. As MADs progress, pathological mucus becomes increasingly viscous due to mucin overproduction and host-derived extracellular DNA (eDNA) accumulation. Pseudomonas aeruginosa, a major pathogen in MADs, colonizes this mucus niche persistently. Despite inhaled antibiotic therapies and the absence of antibiotic resistance, antipseudomonal treatment failure remains a clinical challenge.
We used JMP's data visualization and statistical modeling to investigate how mucin and eDNA concentrations – dominant polymers in respiratory mucus – affect P. aeruginosa’s antibiotic tolerance to understand antibiotic recalcitrance. Our findings reveal that polymer concentration and molecular weight impact P. aeruginosa survival after antibiotic exposure. Surprisingly, polymer-driven tolerance is not solely linked to reduced antibiotic diffusion. Additionally, we established an in vitro model that mirrors ex vivo antibiotic tolerance observed in expectorated sputum across different MAD etiologies, ages, and disease severities, highlighting the intrinsic variability in host-evolved P. aeruginosa populations.
Hello. My name is Jonathan Schisler. I'm an assistant professor at the University of North Carolina at Chapel Hill. I'm the PI of the Schisler Lab. My lab uses systems biology to study both healthy and unhealthy aging, primarily in the brain, as well in the heart. We also are involved in a lot of space biology-related research to identify countermeasures that might be helpful for prolonged space flight.
Our use of systems biology has opened a lot of doors to collaboration. Today, I'm actually going to be talking about modeling antibiotic tolerance and chronic lung infection. This was done in collaboration with Matt Wolfgang and his laboratory, also at UNC Chapel Hill. I'm going to be talking about some data analytics that we did. All the data was generated by Matthew Greenwald, who is an exceptionally talented graduate student in Matt Wolfgang's Lab.
The disease I'm going to talk about today that we're focusing on is muco-obstructive airway diseases. This is a family of diseases. Some of them might sound familiar, such as cystic fibrosis, chronic obstructive pulmonary disease, or COPD, as well as primary ciliary dyskinesia, as well as non-cystic fibrosis bronchiectasis. This is a huge problem across the world. Almost half a billion people worldwide suffer from chronic respiratory diseases.
In the United States, COPD alone accounts for an economic burden of almost $40 billion annually. On top of that, people living with chronic bacterial infections have almost a 50% increase in mortality. The biggest problem we face is with treatments. That is because antibiotic failure ultimately occurs in almost 50% of all cases. We need better models to find better drugs.
From a pathophysiological perspective, when we're talking about these muco-obstructive airway diseases, on the left, we have a healthy airway, and on the right, we have a cartoon of what an obstructive airway looks like. It's essentially the deposition of this thick, sticky mucus that lines the airway that causes it harder to oxygenate the blood, breathing difficulty. This is a mucus environment where bacteria can thrive and can generate these individual little niches that ultimately can lead to antibiotic resistance. If left uncontrolled, it can lead to sepsis and multi-organ failure.
Antibiotic treatment, as I mentioned, its failure over time is very common. If you look at what happens with someone with cystic fibrosis, they're normally diagnosed at a young age. On the Y-axis, we have the amount of these bacteria, and we're just going to call it PAO for short.
Over time, the amount of bacteria will build up in the airway. They're treated with antibiotics. This can then lower the amount of bacteria, and then it's this constant cycle of a buildup of bacteria, antibiotic treatment regression, and we go back and forth. What happens over time is that with all these antibiotic treatments, we start going through this clonal selection process, which essentially means we are starting to select out for bacteria that have adapted to the environment and to the antibiotic. This adaptation eventually leads to what we call antibiotic tolerance, where the antibiotics no longer have a very significant effect on clearing the bacteria. Now we're through this chronic disease phase.
We know that through studies over the last 10 years that chronic infection of this PAO is associated with failure of clearing cystic fibrosis. It can reduce disease burden, but it doesn't seem to completely ever eliminate the bacteria.
The other interesting component of this research is that the susceptibility of the antibiotic doesn't predict treatment success. You can remove the bacteria from the patient and treat it with an antibiotic, and it will actually kill the bacteria despite it not working in the patient itself. There's this big question of what's the difference between what's happening in the body versus what's happening in the lab. Can we come up with models and methods to better understand this problem?
To graphically show what we mean by antibiotic resistance, you can imagine treating a bacterial culture with increasing doses of antibiotic. The cultures that are susceptible to antibiotics normally will start to die after a low dose, which is represented by this green line. Tolerant species will have some decrease in growth, but somewhat persist. Then, of course, the worst-case scenario is where we have complete resistance to antibiotics, where growth is not affected by the addition of the drug.
Here's Matt hanging out at a poster, who came up with some really, really interesting questions regarding what actually drives antibiotic response in people compared to what is done in the lab. This is done through a really remarkable clinical experiment where patient samples were obtained, and we call them sputum samples. Literally, when you're coughing up a phlegm ball, that's what we're talking about with sputum.
Patients with cystic fibrosis, or COPD, we can capture these samples. The bacteria within these samples can then be isolated and grown. Then we can actually compare what is it like treating those bacteria with the drug outside of the patient versus the patients themselves, which are treated with the same antibiotics. This is what they're normally treated with. We have a really remarkable experiment. We have patients treated with this antibiotic. At the same time, we are analyzing the phenotype of the bacteria to the same antibiotic outside of the body.
What became incredibly clear was that there was a disconnect between the survival of the bacteria outside the lung versus inside the lung. This MIC level here is telling you the dose of antibiotic that was needed to treat the patient compared to what we see when that culture is grown outside. We have some cultures that were nicely affected by the antibiotic, but obviously, we had plenty of cultures that were incredibly resistant. What makes us so different? What's so different about what's happening in the airway versus what's happening in the lab?
There's a lot of different contributors that could factor in to resistance. We already talked a little bit about antibiotic resistance itself, but if we plot the antibiotic resistance to patient treatment versus in vitro survival, we see that there's really no correlation. There's something missing. There's something that we're not capturing in these, what we call in vitro cultures to tease out what's contributing to antibiotic resistance.
Matt decided to focus on the environment. What's unique in the environment within these clogged airways that could be contributing to this resistance that might be important for us to consider in our models. There are actually two things that became low-hanging fruit in terms of what might be contributing to resistance. That is mucin, which is essentially the protein component that makes up this mucus. It's the predominant protein of mucus, as well as DNA.
In those obstructed airways, we have a lot of mucus, and we also have a lot of DNA that is just floating around, likely from dying immune cells and other dying airway cells. What we know is that in patients, as they get older and the disease gets worse, there's higher levels of both of that mucous protein, and there's also higher levels of this DNA.
Matt did this really smart experiment where he took the PAO bugs, and he treated them with antibiotics. When he treated with an antibiotic in the absence of this mucous protein, you can see it does a pretty good job of preventing any growth. However, as you start to increase the amount of mucous protein, you can start to see more and more bacteria can grow. This is the first clue that, okay, maybe this environment does play a big role in how bacteria can invade this antibiotic treatment. Higher mucous protein, and we have more disease severity.
That covers protein. What about DNA? In that same experiment, now we're holding the mucous protein constant, and we can titrate in the amount of DNA. As we add in more DNA, we start to see higher levels of survival. Both the DNA and this mucin protein can increase the survival or increase the resistance of the bacteria to the antibiotic treatment. Interestingly, once you go up to these higher levels of mucin protein, the effect of this DNA tends to wane a little bit. There seems to be maybe some interaction between how DNA and this mucin protein are working to drive resistance.
The last variable I need to introduce before we get into the data modeling is this variable that's called rheology, or another way to think about it is viscosity. The viscosity of the media that is within the airway, or in this case, within the lab test tube has some relationship to both that amount of mucin protein and DNA. Viscosity is a function in part of both the protein and the DNA content as well. I wanted to introduce those four main variables that we are most interested in modeling.
Now we can leverage JMP, and specifically, JMP Pro. One of the biggest questions was, Matt came to me and said, "Hey, we have this data, and we're really interested in how all of these variables interact." One of the things I like probably the most about JMP is the data visualization. It does such a clean job of getting quick answers through beautiful graphic presentations. We leverage the multivariate platform just to do a quick visualization of the relationships between our four main variables.
We have our two dependent variables, which again, is the rheology of that viscosity I was talking about. We have the survival. This is the ability to bypass the antibiotic resistance. Then we have our two independent variables, because we're controlling the amount of protein, this mucin protein, and we're also controlling the amount of DNA. This is a great graphic. It gives you so much information in a snapshot.
We do the multivariate analysis. We can use this to give us the histogram of each of the variables. We can easily look at the scatter plot across variables to lock in on those strongest relationships. Then we can also graphically represent the correlation coefficient with a heat map. We can also represent the significance of these findings by using the size feature. Within one snapshot, you get so much information. That's one of the reasons why we are big fans of JMP Pro.
For this first objective, one thing I do want to point out, in my world from systems biology, where we're normally dealing with thousands and tens of thousands of data points and samples, this is a rather small data set. It's only a total of 78 independent observations. That's going to be important for talking about limitations.
Now, as should be expected, this graphic quickly tells us that our two dependent variables are not correlated, which they shouldn't be. That was a good sanity check. However, our two independent variables were strongly correlated. That being the viscosity, this rheology variable, as well as survival. You can see the strong positive correlation both by the scatter plot as well as through the correlation coefficient score.
Then we also observe that each independent variable had one really predominant dependent counterpart. If we look at the amount of this mucous protein, we can see a very strong positive correlation with survival, which is represented by the scatter plot here and this upper quadrant for the correlation score. Then for the DNA, we saw the strongest positive correlation with this viscosity factor, which is represented in this box and in this correlation constant score here.
That was great. But one of the questions from Matt and Matt was like, "Well, how can we model all of this together?" They're interested in trying to essentially express these factors in a mathematical model that we could then interrogate and test, and push the buttons, and draw some levers to see what happens.
I at least want to make a big plug for the model comparison feature. It's something that I've used quite often in the software. However, I cut to the chase a little bit, and I was a little picky, and I ended up going with partial least squares regression, part based on experience and with the data at hand.
The fact that we have two Y's, and we have some X variables that also have some association with each other. The fact that I want to account for interaction term, the PLS regression seemed to be a really good fit for this. We can already tell from that initial data that there's likely going to be some interesting interaction between our two X variables of the mucin protein as well as the DNA content.
Again, this is relatively sparse data, so our cross-validation is somewhat limited. At this point in time, when we published this paper, we did not have an independent validation data set, which is something that's in progress. You do have to keep in mind for generating models that are overfit. It does allow us to explore what we think is happening with the relationship between these four variables.
We use the fit model feature within JMP Pro. Again, we use the partial least squares and with our two Y's and our two X's, with three factors and including the interaction term. Both independent factors and the interaction term came back with VIP scores greater than 0.8. We kept them all within the model. I kept the correlations here just as a frame of reference.
If we look at the model coefficients from the PLS regression, we can see when controlling for mucin, this is the protein, as well as the protein DNA interaction, we still see that viscosity, the Beta coefficient is largest for the DNA as it relates to viscosity. For survival, the largest coefficient was with the mucin protein. We were able to take this information and start to learn a little bit more about how all of these variables are related to each other.
We do the usual quality checks of looking at the actual versus predicted. On the left scatter plot, we have our viscosity or rheology plot of actual versus predicted as well as survival. Actual versus predicted. Of course, I always like to have that sanity check. What does the equation look like? It makes my brain happy. This is the output of both our viscosity as well as percent survival as a function of the mucin protein, the DNA, as well as their interaction term.
Now, another great tool with JMP is the predictor tool. This was something that I also really like to play around with to get some ideas of how the model might be working. Of course, our collaborators always find this really intriguing. One of the things I always struggled with was how best to share workspaces. I am happy to say at least I ventured a little bit into the JMP Live and the JMP public domain, which has been incredibly helpful to share our data. I also want to at least make a plug that this is really important, especially for academics, that we have the means to make our data accessible and share in an unbiased manner.
I've been really excited to learn more about JMP Live and how we can incorporate it in our data reporting. When we published a paper, we included a link to this workspace here, so people could actually go in and play with the profile and actually download all the raw data as well. This is a really important component of rigor reproducibility in our research.
The other nice thing about the predictor was trying to see how might this model function over a broad range of inputs. The other nice feature of this is use this to run Monte Carlo simulations. This is, again, another really strong feature of JMP that we use quite often.
Just to show you, in this case, we have our protein component and we have our DNA component. It's great that we can use experimentally derived or clinically derived values to adjust the distribution of where we want to pull samples from. In this case, we actually use the normal truncated distribution because we have experimental lows and highs that we wanted to constrain this to. Then we can use this data to run couple thousand simulations and try to see how the model might perform across a range of inputs.
Again, this is another great plug for JMP and using graph builder, where it's really simple to just look at the one by one plots across all 1,000 simulations, and you can start to see where we might have some interesting relationships across the model. Again, being able to summarize a four-variable model in an elegant pattern where we can capture both the DNA concentration, the protein concentration, the viscosity, as well as survival within reasonable landscape. Nice and elegant interpretability is a great feature.
The other important part I wanted to bring up too is that interaction profiler as well. When we're building models, interaction terms, obviously you don't want to add terms that don't make a lot of sense and add noise to the model. The interaction profile is a really great tool to quickly identify where an interaction may be really worthwhile to include in your model. That's another great feature of JMP.
For this objective, what did we learn? By building this PLS model, what do we learn from the data from Matt's big experiment? First, we learned that the antibiotic resistance, or in this case, we quantify this as survival, was positively associated with mucin across all DNA concentrations. We have these positive slopes of the mucin protein with survival whether we are at low or high DNA concentrations. Again, if you look on this plot up here, you can also see that same relationship.
We also saw a positive association between mucin and the viscosity and how that weakened as the DNA concentration increased. This is most obvious with that interaction plot. We can see that the viscosity increased here with mucin concentration in the absence of DNA. But at higher levels of DNA, we see the slope completely flips.
This elegantly demonstrates that we see a differential effect, especially when we get to these moderate or high levels of DNA on the positive correlation between the viscosity and survival. I think that was a really nice way to link these four variables together in a way that also scientifically made a lot of sense. This mucin and protein concentration really does seem to be the dominant driver of the bacterial resistance, at least in our cell culture model. But it can certainly be modified by the presence of the amount of extracellular DNA that's present.
The last example of how we leveraged JMP in this study was we asked the question… Well, maybe we just need better models. We want to use models in the lab that more closely reflect what's happening, in this case, in the airway of patients. We already demonstrated how, by starting to factor in this mucin protein and DNA, we appear to be recapitulating this antibiotic resistant phenotype. What can we learn about building better models?
In the last experiment, Matt isolated the bacteria from all of these patients. Then he cultured those bacteria as an increasing amounts of mucin protein. The question was, was there a certain condition that we have in the lab that best represents what we actually see in the airways of patients? Remarkably, what we found was that using that mucin protein at a 2% concentration effectively gave us about… I'm going to say 90% of the cultures were most similar to the patient data compared to all the other ranges of mucin. We can also elegantly plot that here, where we're just plotting the difference between what we see in the lab in vitro versus what we see in the patient's ex vivo.
Obviously, the closer we are to zero, the better we're recapitulating what we see in the patients. You can see at this 2% mucin level, nearly all the patient samples fell within this sweet spot. Now we have these two outliers. These outliers are really interesting to see what's different about these. This was a really great way for us to investigate through this approach. Can we identify better models for how we study this disease?
Just for a quick summary, we found that the mucous polymer concentration that consists of both the mucin protein itself as well as this extracellular DNA drives the antibiotic tolerance of PAO. We can use our mathematical models to identify better in vitro models to study antibiotic resistance.
JMP Pro is useful at each stage of data processing, model development, visualization, and sharing and reporting of data. I know I haven't had a chance to go through all these different components, but we leverage JMP across the entire spectrum of our data pipeline, and it's been incredibly helpful. I'd like to thank you all for listening.