Advances in Using JMP and JMP Pro for Analysis of High Spatial Resolution Mass Spectrometry Images
At the ill-fated 2020 Summit in Munich, which demonstrated the incredible pivoting ability of JMP Summit organizers, we showed our initial applications of JMP to the analysis of high spatial resolution (NanoSIMS) mass spectrometry images.
In this talk, we give an update on our advances since that presentation. After quickly reviewing our most basic analytical procedures that make use of dynamic linking and Graph Builder, we move on to more advanced analyses that feature cluster analyses, and finally, to applications of JMP Pro Functional Data Explorer, demonstrating how it helps us with interpreting mass spectra.
And now we can do a lot of these analyses much more quickly, thanks to the Workflow Builder recently introduced into JMP 17. We draw upon example data sets from cancer research (cancer tumor tissue) to the greening of soil fertilizers using bacteria. For each of these studies, we set the stage with a brief background as to their importance, and then the majority of the talk consists of a live demonstration of the steps that we take to arrive at the end result.
Good morning, everyone. My name is Greg McMahon, and I am a principal scientist working at the National Physical Laboratory in Teddington, UK. I'm going to be talking to you today about advances in using JMP and JMP Pro for analysis of High Spatial Resolution Mass Spectrometry Images.
This talk is a carryover of what I presented about 4 years ago in that ill-fated Munich meeting, where COVID got in the way and turned it very quickly into an online meeting. The ability of the folks that jumped to pivot and make that an online meeting a very successful event is still incredible. I'm glad to be with you all here today in person and online, and look forward to presenting my talk and talking with you throughout the week.
To begin, I'm going to be speaking about NanoSIMS as our means of obtaining high spatial resolution mass spectrometry imaging. There's an image of the instrument here on the left-hand side. There's only about 55 of them or so worldwide. They're used for various applications. Ours is probably about 80% biological and then maybe 20% physical sciences, metallurgy, material science, semiconductor, if you will.
But in a nutshell, very briefly, how does the instrument work? We generate a beam of ions, and we accelerate them and focus them towards our sample of interest. When this beam of ions strikes the sample, the energy that is imparted into the sample causes atoms and molecules from the sample to be ejected from the sample surface. In that process, a small fraction of those ejected species are actually ionized.
Once they're ionized and carry a charge through application of high electrostatic fields, we can collect them and focus those down what we call secondary column, and then into a magnet. We use a magnetic sector mass spectrometer. What it does is it separates those ions according to their mass to charge ratio in a manner very similar to how a prism separates light into its various wavelengths.
Once that separation occurs, we have what we call a multicollection chamber, which houses seven detectors. These detectors, six of them are on motorized trolleys, so we can move them to any position that we like. The seventh detector is fixed at the highest radius in the magnet. Typically, our highest mass would go on that fixed detector, and then the other six are positioned accordingly.
In front of each detector, we can acquire what we call a high mass resolution mass spectrum, which we see here. It typically contains multiple peaks that are indicative of what we call mass interferences. I'm going to be talking a bit more about this towards the end of my talk. But really, the main advantages of this method are noted down at the bottom of the slide here. We get exceedingly high spatial resolution. We can push it down to about 35 nanometers. 100 nanometers is pretty typical, doesn't require a whole lot of effort to achieve.
We also get high mass resolution, m over delta m of about 15,000-20,000. This enables us to separate peaks, such as the ones that you see in the mass spectra here. Because it's a mass spec technique or a SIMS technique, we call it secondary ion mass spectrometry, we get very high sensitivity down to PPM and PPB levels.
I'm going to talk to you today about a couple of case studies of how we've pushed forward. Last time, I talked about how we apply it to cancer research. I won't be talking about that today. My first one is going to be a pretty simple example of how we have implemented the Workflow Builder into our workflow. This has been really a Godsend almost.
The first case study I'm going to talk about is, it's about basically nitrogen fertilizers that we use to assist plant growth and their effect on climate change. Around the world, it really is the availability of fixed nitrogen that limits agricultural production. Unfortunately, a lot of times use is being made of these synthetic nitrogen fertilizers that can cause atmospheric pollution by ammonia and nitrous oxide in their production. They can also give off smog, find particular pollution also in the production. Once they're applied, they can actually cause ecosystem acidification and nitrification of the waterways and just generally are bad for climate change.
What was discovered, not too long ago, maybe 30, 40 years ago, was that certain bacteria have this amazing ability to take inert nitrogen from the atmosphere, that N₂, and they can actually fix it and convert it into ammonia, which can then be released into the plant for amino acid, chlorophyll, and protein synthesis. Effectively, it can basically replace those nasty synthetic nitrogen fertilizers we were talking about before.
However, no one has ever actually shown this on a very micro to nano scale in plants. There's been all kinds of bulk analyzes done that show it, but we wanted to have a look at this in a lot more detail in a much higher spatial resolution.
This particular demonstration is going to show how we have taken the leaves from a plant that has been grown, in this case corn, that has been grown in an atmosphere of 15 nitrogen, and no other synthetic fertilizer was given. There is no other source of ¹⁵N₂ that could be possible. I should maybe backtrack a bit. ¹⁵N is a heavy isotope of nitrogen. The natural ratio is about 0.0037%. What we're doing is we're growing it in almost 100% ¹⁵N₂ instead of 0.0037 N₂. Our idea is we're going to use the mass spec to hunt for ¹⁵N in the plant.
I think at this point, I'll summarize what our first level of analysis is and how we've used JMP Workflow Builder to get there. Our raw data files are actually quite simple. That comes off as a CSV file that has two columns that define pixel locations, and then seven columns, or more or less, that contain the counts from the various masses that we measure.
Typically, the first thing that we would do is, if we're looking at isotope ratios, we take the two masses that construct the ratio and insert a new column for the isotope ratio value. What we can do is we can go into Graph Builder and basically reconstruct the ion images which we see here. This image over here, this is the carbon plus nitrogen image of a section of the plant. You see these various organelles. This here is a plant nucleus. These here are what we call chloroplasts.
This image over here is the nitrogen isotope ratio image, where we've got a color scale. The darkest blue represents the natural ratio, what we would expect regular nitrogen to show. Everything above that is ratios elevated in ¹⁵N that, in this case, could only possibly come from the ¹⁵N₂ gas in which these plants were raised.
Clearly, we now have evidence that the bacteria, the seeds were inoculated for these bacteria, that they have done their job, they affix the N₂ and distribute it throughout the plant. But we could also generate. It's very useful to look at the distributions of the various counts, especially looking at where the outliers are.
I timed myself using how I used to do it, which is basically doing all these steps manually. It took about 3 minutes to get to where I have shown here on the screen. But now, with Workflow Builder, it takes roughly about 5 seconds. If we're looking at 20 of these odd files per day, that can be a serious savings in time, up to about 45 minutes to an hour per day. Over the course of a month and a year, that all adds up.
I'll just run a quick demo here to show you what this looks like. If we just go open a recent file, and I'll pick this one. As I said, this is what the raw file looks like. Two columns, X, Y. These are the pixel locations and the counts in each of our masses.
Now I'll go File, Open recent, and Part one, Workflow Builder. There's our Workflow Builder. I'll maybe just shrink this a little bit. Let's just run it step by step. A new column is inserted. It's given a title, Nitrogen Isotope Ratio. A formula is given, that is, the ratio is calculated. That is our raw CN image. That was the first image I showed you, except there it's colored on the slide. It was actually grayscale.
There's the nitrogen isotope ratio image. We can finish off with the distributions and then just the simple multivariate statistics with the scatter plot matrix and the heat map. We saw that the ratios were higher in these particular organelles here. These are what we call the chloroplasts. The chloroplasts are an organelle that is responsible for photosynthesis.
That the nitrogen seems to be going there preferentially is almost a double win, because now we have the nitrogen preferentially going to support chloroplast growth and function. But the chloroplasts, those are the organelles that will take CO₂ from the atmosphere and convert it into energy that drives plant growth. From a climate change perspective, this is actually a win-win. We're lessening the need for synthetic nitrogen on our soils, and we're helping, maybe taking up CO₂ from the atmosphere.
But while I'm here, I'm just going to do one other thing. What was interesting was these high outliers in the sulfur signal. We're just going to select them. As you see, the selected ones have turned pink. These are mainly only occurring in the chloroplasts, some more than others. The reason for that, not quite sure how this might be a temporal effect.
But this is significant, because in order for that nitrogen fixation reaction to occur, it needs the assistance of an enzyme that happens to be rich in sulfur. This was an added bonus to the analysis of this particular sample.
That's just a quick first-level analysis using Workflow Builder. It just saves an immense deal of time. What I'm going to do next is, I'm going to close out everything here, including that, and then head back to the PowerPoint, and then go on to the next slide. This is the part where I just showed you with the sulfur preferentially into the chloroplast, assisting that nitrogen fixation reaction to occur.
That was actually a pretty straightforward analysis, but sometimes we get others that require a lot more complexity. The second example I'm going to use is a study that we've carried out looking at what are called selenoproteins. If you go to a health store, you've probably seen a certain section where they have these selenium supplements.
It's a very interesting field, in that it's still in its infancy. A lot is known, but a lot is still unknown about how selenoproteins work in the body. There have been certain claims that have been made and evidence to support this, that they can be protective in the brain, be effective to help your immunity system. They may have antioxidant properties, help reproduction, assist your thyroid gland and its function, and also have these anti-mutagenic effects, which can help prevent damage occurring to your DNA.
The one question we specifically wanted to address in this study was, do selenoproteins really associate with DNA to help protect the integrity of the DNA? Recalling that, we're looking at very high spatial resolution mass spectrometry imaging that this is actually one of the few techniques that can do this and be reasonably effective at it.
Te issue with selenoproteins is that they're already in very, very low abundance in your body. To make it easier, people might say, "We'll just add a little bit more," but you really don't want to do that because you can also get selenium poisoning if you have too much in your body. Studying samples in a Petri dish that have had artificially too much selenium added is not terribly insightful.
What we did is we had some of these selenium precursors, and these would be typical ingredients that you would find on those bottles of supplements. We had culture media with the addition of 300 nanomolar of these isotope-labeled precursors for selenoproteins, 76 selenium-enriched methylselenocysteine, 77 selenium-enriched selenomethionine, or 82 sodium selenite.
I have to stress that, when I talk about isotopes here, these are all stable isotopes. These aren't radioactive or anything like that. These are all perfectly safe.
We added these to the culture media, but we also added ¹⁵N thymidine. This is our friend, ¹⁵N, the heavy isotope of nitrogen. This amino acid will selectively label the DNA. What we're hoping to see here in this double label experiment is that we're going to see a correlation between where there's high nitrogen ratios and high selenium ratios, or vice versa, or maybe no correlation at all.
If we look at the initial images just coming off. This would have been if we used our Work Builder from before. What we find is that there are very low selenium counts. This is probably not too unexpected because, remember, it is low in abundance and even by counting for as long as we did. This image acquisition, this is the selenium ratio image on the left-hand side there.
This was probably a day and a half to two days of image acquisition, just collecting counts from the sample as it's being rastered by the primary ion beam. When we do the ratio image, we see in the gray areas, we have all this missing data and all these gray areas here. There's probably tons in here that we can't see quite as easily.
We have basically two cells in this image. This is the carbon plus nitrogen image, basically looks at the biomass of the cells. Everything else, all this dark matter in between is the embedding resin. These samples are embedded in a polymer resin and then sliced at very thin thicknesses with a diamond knife. We see we've got low counts.
Even before we go any further, we thought, "We might be better off trading spatial resolution for more counts by binning the image." I know now I'm starting to turn JMP into an image analysis software, which was never meant to be. But it's having all this dynamic linking. The marriage between the two is superb.
I'm going to actually zip back and close this back. I have to thank Julian Paris for this. This last year, we were discussing during lunchtime about how to bin images. I'm just going to go over that briefly. We've incorporated this into the Workflow Builder now. Let me grab this image here.
This is the raw data. The trick to binning it, it's a two-step process. I won't derive the isotope ratios here. But what we're going to do is we're going to select the X and Y columns. We're going to go down to Utilities, and then we're going to make a binning formula. I'm going to go to Cut points, and I'm going to bin this into 4 by 4 equal bin widths.
I'm going to go equal bins. Offset zero is fine. Switch that to four, I'll go Okay, and make all X, and then make formula columns. There we have two more columns, X binned and Y binned. We go to tables, go to a summary table and I want to bend all the counts columns as well, so I'm going to select all those.
Under Statistics, I want to sum everything in there. Now I'm going to sum all the counts in the individual pixels into that one 4 by 4 bin. This was the part that wasn't quite so obvious is we're going to group by X and Y. We go Okay. Now we have our binned image, and we can just show that onto Graph Builder. Graph and X, Y. Let's just color for now with our CN, so that'll be… Because it's binned, let's just make our marker size a little bit bigger there.
Now we have four more counts. It's a little busy with the axis scales here. But anyhow, I thought that was worthy of showing people in case anybody else is doing this thing with their images, as it wasn't really quite that obvious when we started to work with it. Let me just close it out of here and then move back to PowerPoint.
Once we had the image binned, what we did was we derived the nitrogen isotope ratio and the selenium isotope ratio. We got all our distributions and our heat map. When we were looking at the heat map, something seemed just a little bit fishy. The selenium isotope ratio seemed to have a negative correlation with both the 77 selenium and the 80 selenium. We've got a negative correlation with both the numerator and the denominator, which it just didn't seem right. You'd think it would be 1 and then the other, but no.
We started wondering. What we were really curious about was all of these outliers on the selenium isotope ratio and the distribution. We also see a lot on the nitrogen isotope ratio, too. But this doesn't bother us as much because we know the count rate is higher. Basically, just through experience, we know we can get that level of isotope ratio through a ¹⁵N thymidine administration and especially at the micromolar level. But this, we weren't sure about. This is uncharted territory.
What we then did was that we decided, "Let's have a look at these outliers a little bit more closely." We did a robust fit outliers examination, and we found out that most of these were actually identified as being outliers through the robust fit algorithm.
Once we removed those, and we redid the heat map and the scatter plot matrix, now things seem to make a little bit more sense. The 77 selenium had a very small positive correlation and the 80 slightly negative1 for the ratio value. That made more sense.
At that point, we thought, "Let's try doing a simple cluster analysis now." We did that. We did a simple K-Means, just looking at three variables. Very simple. Phosphorus, the selenium isotope ratio, and the nitrogen isotope ratio.
What we're hoping to find, or at least according to our hypothesis, there should be a cluster where we have both high selenium and high nitrogen in the instance where the selenoproteins are indeed associating with the DNA to help protect it against some DNA damage.
But we look at the parallel plots, and we never see that. It's always 1 is higher than the other or vice versa, or they're roughly the same, but at a low level. This was our first indication that maybe it's not working, or our hypothesis is incorrect.
When we looked at these outliers on the selenium isotope ratio, we found that they were all associated with extremely low counts of 80 selenium. Actually, it should be down here, 77 selenium. In fact, they were all embedded basically in the matrix or the embedding resin.
This led us to think, "We have to take this another step further because all this embedding resin is biasing our analysis. We got to get rid of it." That's easily done. We can just do another quick cluster analysis, basically, just using CN pretty much and a couple others, and just do two clusters, and it'll be obvious through the high CN counts that we can segment all this out and then just leaving the cells behind. We can do a table subset from that and have ourselves a new data table to work with.
That is what gives us this scatter plot matrix and heat map, where now we look at the selenium isotope ratio. We see at 80, it's a negative correlation; 77 it's much stronger positive correlation. Now that's all making sense. But the thing we're really interested in is the selenium isotope ratio and the nitrogen isotope ratio, which now shows a complete negative correlation. That means that what we thought was happening wasn't. To look at other areas, we would look at hundreds of these cells.
We've now got our full Workflow Builder, which incorporates 4 by 4 bin of the image using three times a Graph Builder for three different masses, the CN, the selenium isotope ratio, the nitrogen isotope ratio, the distributions, the multivariate statistics the first time around, an outlier analysis, the K-Means Clustering and the parallel coordinate plots. K-Means to segment out the cells.
The new data table is created Graph Builder. We can go on with multivariate statistics. At that point, we finish there. But we've added another K-Means Clustering and parallel plot as well at that point.
Might as well go show you how this works because it is quite fun. We'll go File, Open recent, and this is this file. Now we're going to go Open recent with the Workflow Builder. That's part two, that's that one. We'll just launch it and watch it do its thing in real time.
Here come the first set of images, second set, third, distributions, scatter plot matrix, outlier analysis. Next, scatter plot, K-Means Clustering, data subset image. Finally, the last scatter plot matrix. We can run that now with just right on the opening file. That saves an immense amount of time as well.
This is nice because I can develop these things, send it to my colleague, and then when she looks at the data, she doesn't have to do all this. She just presses go, and it's all there for her. Let's close up everything and close that.
One last thing. We're in January today, but by March… Actually, this is the second last thing. Yes, I forgot. After we finished all that, we did more K-Means Clustering with that subset of data. Again, what we see with the parallel plots is there's absolutely no sign of high selenium and high nitrogen when we look at all the different clusters.
I would do three minimum, three maximum 10 clusters and then just use the one that was identified by the CCC criterion as being the best. But in every case, there's no sign of a high selenium and high nitrogen.
What our conclusion is, is that, at least directly, there is no evidence that those selenoproteins protect the DNA just by an association in spatial terms, if you will. But there still could be some indirect mechanism that they do so, but that's well beyond the scope of what we were trying to do here.
Lastly, as I said, we're just the end of January here. Hopefully by March, I'll have a lot more to say about this. But this is how I think we can probably use Functional data Explorer to help us with another type of analysis that we need.
I showed you at the very beginning these high mass resolution spectra that are acquired at each detector. We acquire these because when we set the detector, we need to know which mass we're actually analyzing because we have, in many cases, a lot of mass interferences.
What happens is, I've just blown up this region here at the detector end, the separated ion beam, it passes through a set of deflection plates that scan the focused secondary ion beam through a narrow slit, and then into the detector. The diameter of this beam is narrower than the width of that slit. You can see what happens when the deflection is quite large. It's not really hitting the detector; it's hitting more of the edge of the slit and not getting to the detector.
The deflecting path deviates it more, and it starts entering into the slit and on its way to the detector, so the counts start to rise. At a certain point, you're fully within the slit, and then you can basically have a nice flat top peak, and then it starts going down. There might be a second peak from another mass that you might start seeing. That's what we have here. It goes down again and finally the third.
During an analysis, what we would do is we would pick which peak we would want to be on and set the deflection voltage such that only that peak or that peak gets in through the detector. If this isn't set right, this means that all the data you acquire is basically useless.
For a lot of these, like for the ¹⁵N, and we use ¹³C a lot, we know what these spectra look like. But for the selenium, that was really quite a challenge to sort it out to make sure we were actually on selenium, especially since the count rate was so low.
But when I looked at this, and I started looking at Functional Data Explorer, the first thing I saw when we were discussing the use of wavelets is that these spectra look almost exactly like Haar wavelets. Another thing is, is that there's basically no background because we set our thresholds and our gains of the detectors to make sure that there is essentially no electronic background and that the response is the same.
We have to do this because we were measuring isotope ratios all the time, but we're measuring them using two different detectors. If those two detectors aren't calibrated against each other precisely, our ability to measure at a proper isotope ratio becomes diminished.
Basically, all we need to know is the position of one of these and what it is assigned to. In this case, I know this broad peak is ¹²C and ¹⁴N. This one is part of the ¹²C, ¹³C, ¹H. These are both adding up to 26.
If we know what one is, we can figure out what the other is just by difference in the masses. This scale doesn't matter because we're always looking at the mass differences. We can calibrate this to be precise all the time, but that takes an awful lot of effort and awful lot of work.
If we know what one is, we can calculate what these other peaks are, differences, and then just associate back to a program that we have. Essentially we can put in all the elements that we think is in the sample, and it will fire out all the possible outcomes that add up to that mass and then the associated difference from that particular peak.
If we could use FDE to separate all these out like a deconvolution, this would be a massive boon for us, particularly in some of these heavier masses where we get more interferences, and they're much more difficult to sort out. That's pretty much all I have to say about this at this point. But by the time the meeting rolls around, I'll have hopefully a lot more and maybe some examples as well.
But for now, I would just like to acknowledge some colleagues. Dr. David Dent, who's at the Sustainable Nitrogen Foundation, and Professor Erik Murchie and Professor Ted Cocking at the University of Nottingham. They were important in the first case study on the nitrogen fixation by the GD bacteria. Dr. Diane Handy and Professor Claude Lechene at Harvard Medical School. They were essential for the selenoprotein study as well, and all my colleagues at NiCE-MSI at NPL.
On a sad note, I'd like to just say that Ted and Claude have since passed away. It's a huge loss for the scientific community, but they're remembered on with this talk and many others to come. Thank you very much, and if you have any questions, I'd be pleased to answer them for you.