Quantifying Sources of Variation in the Laboratory Mouse Gut Microbiome (2025-US-PO-2516)
Growing evidence shows that microbiomes influence diverse aspects of host physiology. As a consequence, taking a holobiont perspective toward animal models represents a potentially powerful approach to improve understanding of how environment, microbiome, and host genetics combine to impact disease onset and development.
However, a major challenge facing animal models is that, beyond exclusion of specific pathogens, most microbiomes are uncontrolled and largely unmonitored across different animal facilities. There is therefore an urgent need to understand 1) sources of microbiome variation in animal models, and 2) the impact of such variation on host phenotype.
Here we sought to address the first of these two challenges. Using mouse chromosome substitution strains (CSSs) combined with different multivariate variance partitioning approaches, we quantified the relative impact of environmental and genetic sources of variation on taxonomic composition of the fecal microbiome.
To do so, we applied Uniform Manifold Approximation and Projection (UMAP) for dimensionality reduction, followed by variance component analysis on the resulting ordination axes (UMAP-VC). All analyses and visualizations were performed using JMP, which enabled interactive exploration of clustering patterns and efficient estimation of variance contributions from factors such as facility, strain, sex, and diet.
Our conclusion is that animal facility exerts an effect on microbiome composition that eclipses the influence of host genetics, as well as other environmental factors such as diet.

Thanks for joining today. Really excited about this project with my colleague, Farzaneh Rastegari. As you know, the gut microbiome is a really interesting and hot topic these days in terms of human health, and there's been a lot of very interesting research going on in different scenarios. Today, we wanted to share some really exciting results studying mice that Farzaneh has done as part of her PhD research and now continuing doing as a full developer in the JMP group. She's also developing software that helps facilitate new routines in JMP itself, including the new normalization and distance matrix platforms. But for this project today, we're going to focus on some real-life data that Farzaneh worked very in-depth from the very beginning all the way through. It's a very interesting design and setup. Farzaneh, if you would please take it away and tell us all about it.
Hello, everyone. My name is Farzaneh, and this is an experiment that I was working in my PhD research, and I'm presenting part of that in this poster. There's a very active relationship and interactions between the microbiome and the host, especially in the gut microbiome, because actually, knowing that we have, for example, for an adult human, we have about 5 pounds of microbes in our gut. It's huge. For doing that we are mostly using the animal experiments to find out these interactions. One advantage of using animals for the experiment is improving the reproducibility. Also, we are introducing a somehow novel way that we were working with for us. That is UMAP-VC, which is the short term for Uniform Modified Approximation and Projection, Variance Components for quantifying the related impact of environmental or genetic variation on mice gut microbiome composition here.
For the experimental design, we are using some mice. We are using mice from different strains and different locations, diet, and also we are using two different DNA extraction for this experiment. But mainly we have three different experiments that we change these factors for each experiment differently. You can see, for example, for experiment one, we have 220 fecal samples from mice from two different strains that have… We have 10 replicates for each strain, and they are all male in the same location with the same diet, but we again have two different DNA preparation for this experiment. Each number here shows the number of categories for each factor in that experiment.
The summary of the workflow is getting that samples from different mice and do the PCR amplification of V1-V2 region of 16S DNA gene, which is a common way for microbiome studies. Using that 16S ribosomal DNA gene. After having all those extracted DNA from those regions, we sequenced them with the high throughput sequencing method. In this experiment, we use MiSeq machine from Illumina. Then after having all those sequencing results from the machine, we cluster all those sequences to operational type so many units that are called OTUs that are similar sequences that are at least 97% similar to each other. After that, we have all these OTUs, and they are barcoded by each sample. By having all this information, we make these tables that we have OTUs in rows here, for example, and samples in the colors.
Each cell is the number of that OTU in that sample. Then, after having all those tables, we can, using JMP, have the UMAP ordination and various component analysis to see the results. Here I'm showing some of the results for each experiment, the three experiment that I introduced in the beginning. Maybe it's better for the slide.
In the first experiment, as you saw, we had 22 different strains, and another factor that was changing was the DNA preparation. Knowing more about those strains, we have two different in-bred strains that are homozygous, actually. They are Black 6 and AJ mice as the backgrounds that are in-bred strains. We made other strains from those by replacing one chromosome from AJ to the Black 6 background to make another strain. For example, the CSS-01 is the strain that has the first chromosome from the AJ in the Black 6 background. The rest chromosomes are from Black 6.
Here you can see a result of a hierarchical clustering for those samples, which are the microbiome composition for each of the strains. This hierarchical clustering is made by CLR transforming the OTU tables, then using the break-out, this similarity distance for making this hierarchical cross-ring graph. I can maybe zoom a little bit. As you can see, some of those strains may be a little bit crossed together. But for most of the time, you can see each strain has a sub-tree in the hierarchical clustering. That shows that the microbiome composition for those mice from the same strain are really close to each other.
In the UMAP table result, you can see for the both components of UMAP1 and UMAP2, we have the total number of 99% and 98% of… This variation is because of the strains. That is a huge number. Here for the DNA preparation, we didn't have that much impact. That was zero. But from the strain, cross-DNA preparation, we have a little bit of somehow variation. We did the same analysis for the other groups.
Here we have the other two experiments. In this experiment, we have location, strain, sex, and DNA preparation factors that are changing. We have different categories for them. For this experiment, they are all in the same location, but we have different strains, and different diets, and cages that are somehow like location, but it's really closer. Let me go here for the result.
For the second experiment that is containing the location, strain, sex, and DNA prep. These two graph, A and B, are the same graph for the UMAP1 and UMAP2, but we are coloring here with different factors. As you can see, we colored this graph by strain and this graph by sex. The shapes are showing the locations. Here, I just added these dashed lines to show the clusters in the locations. You can see that these samples from the same location are nicely clustered together. Between each location cluster, you can see clusters between the strains here in this graph.
Also, here for different sections, colored by different sexes, you can see, again, for each strain, different sexes cluster together. But generally, you can see that the location effect is… Visually, we can see that has the most effect on the microbiome composition. Then inside each location, the strain is the powerful one. For each strain, you can see the sex is visible for different sectors. This is what visually we can see, and also in the actual analysis of variance components, you can see here that for location, 85% of variability is the location for the UMAP1 and 83% for UMAP2, which is huge, and we saw that before.
You can see for the UMAP1, almost 12% of variation is because of strain in total. We have some variability by crossing the strain and location, which is the G by E somehow famous thing here. We also have a strain by sex, by location, and other stuff. You can see that these numbers show how each factor makes variability between these samples.
We have the third experiment that are all in one location, but we have different strengths, different diet. For this experiment, we have some metabolite factors, triglyceride level, this is something that in another level of experimental things that added the metabolite factor.
In this experiment, you can see again in the graph of UMAP1 and UMAP2, we can see, visually, that a strain has the really big effect to separate the samples from each other. In this graph, they are colored by different strain. The shapes are showing the diet, age composition somehow. We can see inside each actual cluster of the strain, you can see all those shapes are clustering together again. It seems like we have the age diet effect really smaller than… Diet effect is smaller than the strain effect.
Then this is the same graph, but we colored here the samples by different triglyceride level that we can see visually, again, in some strains or diet, we can have that triglyceride level higher or lower. For example, for this one, we can see all of them tend to blue. That means they are low. Here, or here, or this one, they are higher. Visually, we can see that those actual factors, like a strain or diet can change the triglyceride levels for different samples.
Again, in the UMAP1 and UMAP2, a variance component analysis, you can see all those numbers showing that effect. Here, strain is the really highest effect. It's like the first experiment that we had all those mice also in the same location. Then we have the cage ID. I didn't put any cage ID coloring or something like that for the graph because that was a little bit messy here. We had too many cages, but we can see that have effect on that. We have an effect of a strain by diet, which you can see somehow here in the circles, you can see higher triglyceride level in the same strain by different diet than, for example, this one, which is the same strain again, but another diet.
Based on all these results, we can say that the UMAP-VC is a novel approach, and that's both visually and also by numbers, show you the variation of different factors, and the variability between the factors, and by dimensioning reduction. Then as a science result, a conclusion, you can see that the location has the most effect on the gut microbiome. After that, the genetic background has the second factor as an effecting factor on the gut microbiome. Then after that, the age that is affecting that. All of these can affect your metabolite factors, like triglyceride level. This is interesting. This is ongoing project in the UK by our collaborator, Jethro Johnson, in the Oxford University. He's mostly continuing to work on the variability in different locations. That's it. This is our references.
Very nice, Farzaneh. This research to me is just so fascinating. We're just starting to finally tease apart these interesting effects of the microbiome on health and seeing a map to things that we want to relate to, genetics and environment. I think these techniques are really a nice way forward. I wanted to mention on a technical note, this analysis is now fairly easy to set up and run and jump with the new work that Farzaneh has done. The UMAP projections we could already do using the multivariate embedding platform, which, by the way, was done by Meichen Dong in our group a couple of years ago. You could alternatively do principal components, but we like UMAP. It seems to be a little bit better fit to the data here. Then the variants are done using mixed models. It's a nice combination of two powerful platforms in JMP. But because the data are compositional, you need to do some normalization. Instead of put in the work to enable that with the evolution platform. It's definitely a few steps you've got to go through in JMP, but fairly doable now. They aren't huge.
I think they're really nice to analyze right directly in JMP, and we can really start to tease out nice insights like this. I also wanted to mention on a pro-note, we wanted to dedicate all this research in memory of George Weinstock, who was just a brilliant researcher who passed away a few years ago, right in the middle of Farzaneh's research. Kudos to you, Farzaneh, for working through that difficult time. I know you had several other difficult adversities as you were putting all this research together. But man, I think it's really come together beautifully. We're looking to publish several different papers on these different experiments that you see here, as well as collaborate a lot of our customers now who have this data. Very promising and super interesting to see it. Nice work, Farzaneh.
Thank you.
Presenters
Skill level
- Beginner
- Intermediate
- Advanced